{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy Flan T5 HuggingFace model \n", "\n", "Welcome to EZSMdeploy! You can use EZSMdeploy to deploy many Machine Learning models on AWS. \n", "\n", "In this demo notebook, we demonstrate how to use the EZSMdeploy for deploying Foundation Models as an endpoint and use them for various NLP tasks. The Foundation models perform Text2Text Generation. It takes a prompting text as an input, and returns the text generated by the model according to the prompt.\n", "\n", "Here, we show how to deploy the state-of-the-art pre-trained FLAN T5 models from Hugging Face for Text2Text Generation in the following tasks. You can directly use FLAN-T5 model for many NLP tasks, without fine-tuning the model.\n", "\n", "- Text summarization\n", "\n", "- Common sense reasoning / natural language inference\n", "\n", "- Question and answering\n", "\n", "- Sentence / sentiment classification\n", "\n", "- Translation\n", "\n", "- Pronoun resolution" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: you may need to restart the kernel to use updated packages.\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip uninstall -y ezsmdeploy --quiet\n", "# !pip install --upgrade pip\n", "%pip install ezsmdeploy==2.0.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Note: you may need to restart the kernel to use updated packages." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing foundation model " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'2.171.0'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sagemaker\n", "sagemaker.__version__" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name: ezsmdeploy\n", "Version: 2.0.dev2\n", "Summary: SageMaker custom deployments made easy\n", "Home-page: https://pypi.python.org/pypi/ezsmdeploy\n", "Author: Shreyas Subramanian\n", "Author-email: subshrey@amazon.com\n", "License: MIT\n", "Location: /home/ec2-user/SageMaker/ezsm-ray-FM\n", "Requires: boto3, sagemaker, sagemaker-studio-image-build, shortuuid, yaspin\n", "Required-by: \n" ] } ], "source": [ "!pip show ezsmdeploy" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "import ezsmdeploy" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[K0:00:00.659191 | created model(s). Now deploying on ml.g5.2xlarge\n", "\u001b[32m●∙∙\u001b[0m \u001b[K----------------" ] } ], "source": [ "ezonsm = ezsmdeploy.Deploy(model = \"tiiuae/falcon-40b\",\n", " huggingface_model=True,\n", " instance_type='ml.g5.2xlarge'\n", " )" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[{'generated_text': 'Paris is the capital of France.'}]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ezonsm.predictor.predict({\"inputs\":\"Paris is the capital of \"})" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [], "source": [ "ezonsm.predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Query endpoint and parse response\n", "Input to the endpoint is any string of text formatted as json and encoded in utf-8 format. \n", "Output of the endpoint is a json with generated text." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "newline, bold, unbold = \"\\n\", \"\\033[1m\", \"\\033[0m\"\n", "\n", "\n", "def query_endpoint(encoded_text, endpoint_name):\n", " client = boto3.client(\"runtime.sagemaker\")\n", " response = client.invoke_endpoint(\n", " EndpointName=endpoint_name, ContentType=\"application/x-text\", Body=encoded_text\n", " )\n", " return response\n", "\n", "\n", "def parse_response(query_response):\n", " model_predictions = json.loads(query_response[\"Body\"].read())\n", " generated_text = model_predictions[\"generated_text\"]\n", " return generated_text" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Input must be a json\n", "payload = {\n", " \"text_inputs\": \"Tell me the steps to make a pizza\",\n", " \"max_length\": 50,\n", " \"max_time\": 50,\n", " \"num_return_sequences\": 3,\n", " \"top_k\": 50,\n", " \"top_p\": 0.95,\n", " \"do_sample\": True,\n", "}\n", "\n", "\n", "def query_endpoint_with_json_payload(encoded_json, endpoint_name):\n", " client = boto3.client(\"runtime.sagemaker\")\n", " response = client.invoke_endpoint(\n", " EndpointName=endpoint_name, ContentType=\"application/json\", Body=encoded_json\n", " )\n", " return response\n", "\n", "\n", "query_response = query_endpoint_with_json_payload(\n", " json.dumps(payload).encode(\"utf-8\"), endpoint_name=endpoint_name\n", ")\n", "\n", "\n", "def parse_response_multiple_texts(query_response):\n", " model_predictions = json.loads(query_response[\"Body\"].read())\n", " generated_text = model_predictions[\"generated_texts\"]\n", " return generated_text\n", "\n", "\n", "generated_texts = parse_response_multiple_texts(query_response)\n", "print(generated_texts)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "6. Advanced features: How to use prompts engineering to solve different tasks\n", "Below we demonstrate solving 5 key tasks with Flan T5 model. The tasks are: text summarization, common sense reasoning / question answering, sentence classification, translation, pronoun resolution.\n", "\n", "Note . The notebook in the following sections are particularly designed for Flan T5 models (small, base, large, xl). There are other models like T5-one-line-summary which are designed for text summarization in particular. In that case, such models cannot perform all the following tasks.\n", "\n", "Summarization\n", "Define the text article you want to summarize." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text = \"\"\"Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. \n", "You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. \n", "All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. \n", "Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend's Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.\"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "prompts = [\n", " \"Briefly summarize this sentence: {text}\",\n", " \"Write a short summary for this text: {text}\",\n", " \"Generate a short summary this sentence:\\n{text}\",\n", " \"{text}\\n\\nWrite a brief summary in a sentence or less\",\n", " \"{text}\\nSummarize the aforementioned text in a single phrase.\",\n", " \"{text}\\nCan you generate a short summary of the above paragraph?\",\n", " \"Write a sentence based on this summary: {text}\",\n", " \"Write a sentence based on '{text}'\",\n", " \"Summarize this article:\\n\\n{text}\",\n", "]\n", "\n", "num_return_sequences = 3\n", "parameters = {\n", " \"max_length\": 50,\n", " \"max_time\": 50,\n", " \"num_return_sequences\": num_return_sequences,\n", " \"top_k\": 50,\n", " \"top_p\": 0.95,\n", " \"do_sample\": True,\n", "}\n", "\n", "print(f\"{bold}Number of return sequences are set as {num_return_sequences}{unbold}{newline}\")\n", "for each_prompt in prompts:\n", " payload = {\"text_inputs\": each_prompt.replace(\"{text}\", text), **parameters}\n", " query_response = query_endpoint_with_json_payload(\n", " json.dumps(payload).encode(\"utf-8\"), endpoint_name=endpoint_name\n", " )\n", " generated_texts = parse_response_multiple_texts(query_response)\n", " print(f\"{bold} For prompt: '{each_prompt}'{unbold}{newline}\")\n", " print(f\"{bold} The {num_return_sequences} summarized results are{unbold}:{newline}\")\n", " for idx, each_generated_text in enumerate(generated_texts):\n", " print(f\"{bold}Result {idx}{unbold}: {each_generated_text}{newline}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" } }, "nbformat": 4, "nbformat_minor": 4 }