## Deploy Flan T5 HuggingFace model 

Welcome to EZSMdeploy! You can use EZSMdeploy to deploy many Machine Learning models on AWS. 

In this demo notebook, we demonstrate how to use the EZSMdeploy for deploying Foundation Models as an endpoint and use them for various NLP tasks. The Foundation models perform Text2Text Generation. It takes a prompting text as an input, and returns the text generated by the model according to the prompt.

Here, we show how to deploy the state-of-the-art pre-trained FLAN T5 models from Hugging Face for Text2Text Generation in the following tasks. You can directly use FLAN-T5 model for many NLP tasks, without fine-tuning the model.

- Text summarization

- Common sense reasoning / natural language inference

- Question and answering

- Sentence / sentiment classification

- Translation

- Pronoun resolution

In [2]:
%pip uninstall -y ezsmdeploy --quiet
# !pip install --upgrade pip
%pip install ezsmdeploy==2.0.0

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Note: you may need to restart the kernel to use updated packages.

## Testing foundation model 

In [1]:
import sagemaker
sagemaker.__version__

'2.171.0'

In [1]:
!pip show ezsmdeploy

Name: ezsmdeploy
Version: 2.0.dev2
Summary: SageMaker custom deployments made easy
Home-page: https://pypi.python.org/pypi/ezsmdeploy
Author: Shreyas Subramanian
Author-email: subshrey@amazon.com
License: MIT
Location: /home/ec2-user/SageMaker/ezsm-ray-FM
Requires: boto3, sagemaker, sagemaker-studio-image-build, shortuuid, yaspin
Required-by: 


In [1]:
import ezsmdeploy

In [None]:
ezonsm = ezsmdeploy.Deploy(model = "tiiuae/falcon-40b",
 huggingface_model=True,
 instance_type='ml.g5.2xlarge'
 )

[K0:00:00.659191 | created model(s). Now deploying on ml.g5.2xlarge
[32m●∙∙[0m [K----------------

In [7]:
ezonsm.predictor.predict({"inputs":"Paris is the capital of "})

[{'generated_text': 'Paris is the capital of France.'}]

In [5]:
ezonsm.predictor.delete_endpoint()

## Query endpoint and parse response
Input to the endpoint is any string of text formatted as json and encoded in utf-8 format. 
Output of the endpoint is a json with generated text.

In [None]:
newline, bold, unbold = "\n", "\033[1m", "\033[0m"


def query_endpoint(encoded_text, endpoint_name):
 client = boto3.client("runtime.sagemaker")
 response = client.invoke_endpoint(
 EndpointName=endpoint_name, ContentType="application/x-text", Body=encoded_text
 )
 return response


def parse_response(query_response):
 model_predictions = json.loads(query_response["Body"].read())
 generated_text = model_predictions["generated_text"]
 return generated_text

In [None]:
# Input must be a json
payload = {
 "text_inputs": "Tell me the steps to make a pizza",
 "max_length": 50,
 "max_time": 50,
 "num_return_sequences": 3,
 "top_k": 50,
 "top_p": 0.95,
 "do_sample": True,
}


def query_endpoint_with_json_payload(encoded_json, endpoint_name):
 client = boto3.client("runtime.sagemaker")
 response = client.invoke_endpoint(
 EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
 )
 return response


query_response = query_endpoint_with_json_payload(
 json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
)


def parse_response_multiple_texts(query_response):
 model_predictions = json.loads(query_response["Body"].read())
 generated_text = model_predictions["generated_texts"]
 return generated_text


generated_texts = parse_response_multiple_texts(query_response)
print(generated_texts)

6. Advanced features: How to use prompts engineering to solve different tasks
Below we demonstrate solving 5 key tasks with Flan T5 model. The tasks are: text summarization, common sense reasoning / question answering, sentence classification, translation, pronoun resolution.

Note . The notebook in the following sections are particularly designed for Flan T5 models (small, base, large, xl). There are other models like T5-one-line-summary which are designed for text summarization in particular. In that case, such models cannot perform all the following tasks.

Summarization
Define the text article you want to summarize.

In [None]:
text = """Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. 
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. 
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. 
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend's Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages."""

In [None]:
prompts = [
 "Briefly summarize this sentence: {text}",
 "Write a short summary for this text: {text}",
 "Generate a short summary this sentence:\n{text}",
 "{text}\n\nWrite a brief summary in a sentence or less",
 "{text}\nSummarize the aforementioned text in a single phrase.",
 "{text}\nCan you generate a short summary of the above paragraph?",
 "Write a sentence based on this summary: {text}",
 "Write a sentence based on '{text}'",
 "Summarize this article:\n\n{text}",
]

num_return_sequences = 3
parameters = {
 "max_length": 50,
 "max_time": 50,
 "num_return_sequences": num_return_sequences,
 "top_k": 50,
 "top_p": 0.95,
 "do_sample": True,
}

print(f"{bold}Number of return sequences are set as {num_return_sequences}{unbold}{newline}")
for each_prompt in prompts:
 payload = {"text_inputs": each_prompt.replace("{text}", text), **parameters}
 query_response = query_endpoint_with_json_payload(
 json.dumps(payload).encode("utf-8"), endpoint_name=endpoint_name
 )
 generated_texts = parse_response_multiple_texts(query_response)
 print(f"{bold} For prompt: '{each_prompt}'{unbold}{newline}")
 print(f"{bold} The {num_return_sequences} summarized results are{unbold}:{newline}")
 for idx, each_generated_text in enumerate(generated_texts):
 print(f"{bold}Result {idx}{unbold}: {each_generated_text}{newline}")