<div class="alert alert-info"><strong> Note </strong>
Set conda_pytorch_p39 kernel when prompted to set the kernel for this notebook
</div>


# Notebook 2 - Financial document summarization

## Use case

Financial news and reports can contain a vast amount of information, which can be challenging to analyze and understand. AI summarization can help investors make more informed decisions by providing them with a quick and concise summary of relevant news and reports. Summarization can help analysts quickly identify trends and patterns in the data and make more accurate predictions in market analysis. summarization can help financial institutions quickly identify potential risks and respond proactively. Common use cases for Financial documents(news, reports etc)

1. Investment decisions
2. Market analysis
3. Competitive intelligence
4. Risk management
5. Regualtory compliance

## What will you learn?

In this notebook, you will learn how to serve an NLP model with your Python inference script, using NVIDIA Triton's Python backend. For demonstration purposes, we we will use a pre-trained T5 model to financial summarize text. T5 ia a multitask encoder-decoder Transformer model that is frequently used for text generation tasks. One of the details of running text generation is that the model's decoder - the component which predicts the next token in the sequence - is autoregressive, i.e. it needs to be called many times to create a single output sequence, on each run adding the last generated token to the input. 
1. [Install packages](#installs-and-set-up)
 
2. [Creating a deployable Triton Python Model](#t5-transformer-model)<br>
    a. [Python model repository](#create-python-model-repo)<br>
    b. [Model artifacts, dependencies and script](#create-python-model)<br>
    c. [Python model configuration](#create-python-model-config)<br>
    
3. [Export model artifacts to S3](#export-to-s3)

4. [(Inference using deployed T5 model) Inference using deployed T5 model](#t5-inference)<br>


## Install packages <a class="anchor" id="installs-and-set-up"></a>

Install the dependencies required to package the model and run inference using Triton Server. Update SageMaker, boto3, awscli etc

In [None]:
!pip install -qU pip awscli boto3 sagemaker transformers ipywidgets
!pip install nvidia-pyindex
!pip install tritonclient[http]

#### Imports and variables

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
import numpy as np
import tritonclient.http as httpclient

sess              = boto3.Session()
sm                = sess.client('sagemaker')
sagemaker_session = sagemaker.Session(boto_session=sess)
role              = get_execution_role()
client            = boto3.client('sagemaker-runtime')

model_name        = "t5"
python_model_file_name = f"{model_name}_py_v0.tar.gz"

prefix = "financial-usecase-mme"

# sagemaker variables
role = get_execution_role()
sm_client = boto3.client(service_name="sagemaker")
runtime_sm_client = boto3.client("sagemaker-runtime")
sagemaker_session = sagemaker.Session(boto_session=boto3.Session())
bucket = sagemaker_session.default_bucket()

%store -r

## Update below cell with the endpoint_name from lab3 notebook 1

In [None]:
endpoint_name = '<UPDATE ENDPOINT NAME>'

## Creating a deployable Triton Python Model <a class="anchor" id="t5-transformer-model"></a>

This section presents overview of steps to prepare the T5 pre-trained model to be deployed on SageMaker MME using Triton Inference Server model configurations.


<div class="alert alert-info"><strong> Note </strong>
We are demonstrating deployment with a single Python backend model. However, you can deploy 100s of models using SageMager MME support for GPUs. The models may or may not share the same backend/framework.
</div>

### Python model repository <a class="anchor" id="create-python-model-repo"></a>

The model repository will contain the model and tokenizer artifacts, a packaged conda environment (with dependencies needed for inference), the Triton config file and the Python script used for inference. The latter is mandatory when you use the Python backend, and you should use the Python file `model.py`.

```
t5
├── 1
│   └── model.py
├── model
│     └── <model artifacts>
├── tokenizer
│     └── <tokenizer artifacts>
├── config.pbtxt
│
└── mme_env.tar.gz
```

### Model artifacts, dependencies and script  <a class="anchor" id="create-python-model"></a>

We will take the pre-trained T5-small model from [HuggingFace](https://huggingface.co/transformers/model_doc/t5.html) and save it to disk. This will exemplify how you can bring your own model parameters and load them in the Python inference script itself.

In [None]:
model_id = "t5-small"
from transformers import AutoTokenizer,T5ForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = T5ForConditionalGeneration.from_pretrained(model_id)
tokenizer.save_pretrained('triton-serve-py/t5-summarization/tokenizer')
model.save_pretrained('triton-serve-py/t5-summarization/model')

The Python backend doesn't include other libraries by default; we need PyTorch and the Transformers library to run inference. In order to package the inference environment we use `conda pack`, which is the dependency management method recommended in the [Triton documentation](https://github.com/triton-inference-server/python_backend#creating-custom-execution-environments).

<div class="alert alert-info"><strong> Note </strong>
We have pre-packaged the conda env and made it available for this lab, installing and packaging dependencies using conda-pack will take 15 mins to run. In the interest of time, we have provided the tar file in triton-serve-py directory
</div>

In [None]:
!aws s3 cp s3://ee-assets-prod-us-east-1/modules/05fa7598d4d44836a42fde79b26568b2/v3/mme_env.tar.gz triton-serve-py/t5-summarization/

Finally, we write the Python inference scripts (refer to the workshop README for more details on the required structure of the script). Notice that we load the model and tokenizer from the model repository directory. The script receives and returns text, and also supports variable batch dimensions.

In [None]:
!pygmentize triton-serve-py/t5-summarization/1/model.py

### Python Model configuration <a class="anchor" id="create-python-model-config"></a>

The model configuration file `config.pbtxt` must specify the name of the model (`t5-summarization`), the platform and backend properties (`python`), max_batch_size (16) and the input and output shapes along with the data type (TYPE_STRING) information. Additionally, you 
can specify `instance_group` and `dynamic_batching` properties to achieve high performance inference.

In [None]:
%%writefile triton-serve-py/t5-summarization/config.pbtxt
name: "t5-summarization"
backend: "python"
max_batch_size: 16

input [
  {
    name: "INPUT0"
    data_type: TYPE_STRING
    dims: [ 1 ]
    
  }
]
output [
 {
    name: "SUMMARY"
    data_type: TYPE_STRING
    dims: [ 1 ]
  }
]

instance_group [
    { 
        count:1
        kind: KIND_GPU 
    }
]

parameters: {
  key: "EXECUTION_ENV_PATH",
  value: {string_value: "$$TRITON_MODEL_DIRECTORY/mme_env.tar.gz"}
}

### Packaging model files and uploading to s3

In [None]:
!tar -C triton-serve-py/ -czf $python_model_file_name t5-summarization
model_uri_py = sagemaker_session.upload_data(path=python_model_file_name, key_prefix=prefix)

In [None]:
print(f"PyTorch Model S3 location: {model_uri_py}")

Note that, this T5 python model is uploaded to the same S3 location where MME is configured model_data_url. You can invoke the MME endpoint created in the previous notebook with TargetModel name(t5_py_v0.tar.gz) to invoke the summarization model 

In [None]:
text_1 ="""
March 14 (Reuters) - U.S. prosecutors are investigating the collapse of Silicon Valley Bank, according to a source familiar with the matter, as scrutiny mounts over the firm's sudden collapse and regulators scramble to contain the fallout.
The U.S. Justice Department is probing the sudden demise of the bank, which was shuttered on Friday following a bank run, the source said, declining to be named as the inquiry is not public. The Securities and Exchange Commission has launched a parallel investigation, according to the Wall Street Journal, which first reported the probes.
The investigation is in early stages and may not result in allegations of wrongdoing or charges being filed, the source said. Officials are also examining stock sales by officers of SVB Financial Group (SIVB.O), which owned the bank, the WSJ reported, citing people familiar with the matter.
SEC Chair Gary Gensler on Sunday said in a statement the agency is particularly focused on monitoring for market stability and identifying and prosecuting any form of misconduct that might threaten investors during periods of volatility.
The rapid demise of Silicon Valley Bank and the fall of Signature Bank have left regulators racing to contain risks to the rest of the sector. On Tuesday, ratings agency Moody's cut its outlook on the U.S. banking system to "negative" from "stable."
"""

text_2 ="""
STOCKHOLM, March 15 (Reuters) - For years, Sweden has been warned that its dysfunctional housing market, plagued by under-supply and kept aloft by low rates and generous tax benefits, was a risk to the wider economy.
Now those risks are becoming reality. Households with big mortgages are reining in spending as interest rates rise, and house-builders are pulling the plug on investment, tipping Sweden into recession.
The country is set to be the only EU economy experiencing outright recession this year. The crown is trading at around its weakest level against the euro since the global financial crisis, partly due to housing market worries, making the central bank's job of curbing inflation more difficult.
After years of ultra-low borrowing costs, the pandemic and the Ukraine war have served up a toxic cocktail of high inflation and rapidly rising interest rates to many countries.
But in Sweden, the structural problems rooted in its housing market are magnifying the effects.
"""

preprocess_text_1 = text_1.strip().replace("\n","")
prompt_text_1 = "summarize: "+ preprocess_text_1

preprocess_text_2 = text_2.strip().replace("\n","")
prompt_text_2 = "summarize: "+ preprocess_text_2

text_inputs = [prompt_text_1, prompt_text_2] 

In [None]:
inputs = []
inputs.append(httpclient.InferInput("INPUT0", [len(text_inputs), 1], "BYTES"))

batch_request = [[text_inputs[i]] for i in range(len(text_inputs))]

input0_real = np.array(batch_request, dtype=np.object_)

inputs[0].set_data_from_numpy(input0_real, binary_data=False)

len(input0_real)

In [None]:
outputs = []

outputs.append(httpclient.InferRequestedOutput("SUMMARY"))

In [None]:
request_body, header_length = httpclient.InferenceServerClient.generate_request_body(
    inputs, outputs=outputs
)

print(request_body)

In [None]:
response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/octet-stream',
    Body=request_body,
    TargetModel=python_model_file_name
)

In [None]:
header_length_prefix = "application/vnd.sagemaker-triton.binary+json;json-header-size="
header_length_str = response["ContentType"][len(header_length_prefix) :]

# Read response body
result = httpclient.InferenceServerClient.parse_response_body(
    response["Body"].read(), header_length=int(header_length_str)
)

outputs_data = result.as_numpy("SUMMARY")

for idx, output in enumerate(outputs_data):
    print(f'Original:\n{text_inputs[idx]}\n')
    print(f'Summary:\n{output[0].decode()}\n')

## Terminate endpoint and clean up artifacts

In [None]:
sm_client.delete_model(ModelName=sm_model_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_endpoint(EndpointName=endpoint_name)