{ "cells": [ { "cell_type": "markdown", "id": "60eb8afe-2b66-4c0a-869b-4ed86ba688d0", "metadata": {}, "source": [ "# Using a Large Language Model to Summarize Customer Calls" ] }, { "cell_type": "markdown", "id": "53b91072-1344-4697-9598-c64d5fc5cceb", "metadata": {}, "source": [ "Welcome to Sagemaker! \n", "\n", "SageMaker offers access to built-in algorithms and pre-built solution templates to help customers get started with ML quickly. You can access these models and algorithms programmatically using SageMaker Python SDK or through the JumpStart UI in SageMaker Studio. In this notebook, we demonstrate how to apply a third party Algorithm to AWS Chime SDK Call Analytics output and arrive at a summary of a person to person communication using a Large Language Model (LLM). We use the SageMaker Python SDK and Cohere's Sagemaker library. \n", "\n", "In this notebook, we demonstrate a simple workflow that(1) prepares the environment for our model, (2) imports the LLM package, (3) converts a call transcript to a prompt, (4) sends a sample prompt to an LLM, (5) saves the model result in the AWS Chime SDK Data Lake format, and (6) releases the LLM package.\n", "\n", "1. [Prepare Environment](#1.-Prepare-Environment)\n", "2. [LLM Package Handling](#2.-LLM-Package-Handling)\n", "3. [Prepare Prompt](#3.-Prepare-Prompt)\n", "4. [Load Call and Send to LLM](#4.-Load-Call-and-Send-to-LLM)\n", "5. [Save Result](#5.-Save-Result)\n", "6. [Release Models](#6.-Release-Models)" ] }, { "cell_type": "markdown", "id": "15e4a057-4b84-448c-8502-664588d776e5", "metadata": {}, "source": [ "## 1. Prepare Environment" ] }, { "cell_type": "code", "execution_count": null, "id": "0682be22-5a4b-4b7e-8497-c5333d026f05", "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install sagemaker boto3 --upgrade --quiet\n", "!pip show sagemaker | egrep \"Name|Version\"\n", "!pip show boto3 | egrep \"Name|Version\"\n", "!python --version\n", "\n", "!pip install cohere-sagemaker" ] }, { "cell_type": "code", "execution_count": null, "id": "8a0326c4-2a4a-4ee6-a918-ede9c46a364d", "metadata": { "tags": [] }, "outputs": [], "source": [ "from cohere_sagemaker import Client, CohereError\n", "\n", "from sagemaker import ModelPackage, get_execution_role\n", "import boto3\n", "from sagemaker import Session\n", "import json, string, os\n", "from datetime import datetime" ] }, { "cell_type": "markdown", "id": "e282f9fe-b341-40a7-b3b3-c6f42fbefb99", "metadata": {}, "source": [ "## 2. LLM Package Handling" ] }, { "cell_type": "markdown", "id": "8ade8e96-1c4b-4e5d-8bf1-f0352b0d16b8", "metadata": {}, "source": [ "Several options for Large Language Models are available to users. We used Cohere in this notebook because it is available as a [SageMaker Model Package](https://aws.amazon.com/blogs/machine-learning/cohere-brings-language-ai-to-amazon-sagemaker/). To gain access to the Large Language Model used here, please Subscribe to [Foundation Models](https://aws.amazon.com/sagemaker/jumpstart/getting-started/); in the AWS console, go to Sagemaker and choose Jumpstart > Foundation Models on the left toolbar. You will either have access and see the available models, or you will need to Request Access and wait a day. Once you have access to Foundation Models, subscribe to [cohere-gpt-medium](https://aws.amazon.com/marketplace/pp/prodview-6dmzzso5vu5my); this provides access to the LLM used here.\n", "\n", "In this section, we create the utils for the model and make it available for generating useful results. " ] }, { "cell_type": "code", "execution_count": null, "id": "d06c1e33-7317-4b97-8ed7-68eddea18086", "metadata": { "tags": [] }, "outputs": [], "source": [ "cohere_package = \"cohere-gpt-medium-v1-4-825b877abfd53d7ca\"\n", "\n", "cohere_package_map = {\n", " \"us-east-1\": f\"arn:aws:sagemaker:us-east-1:865070037744:model-package/{cohere_package}\",\n", " \"us-east-2\": f\"arn:aws:sagemaker:us-east-2:057799348421:model-package/{cohere_package}\",\n", " \"us-west-1\": f\"arn:aws:sagemaker:us-west-1:382657785993:model-package/{cohere_package}\",\n", " \"us-west-2\": f\"arn:aws:sagemaker:us-west-2:594846645681:model-package/{cohere_package}\",\n", " \"ca-central-1\": f\"arn:aws:sagemaker:ca-central-1:470592106596:model-package/{cohere_package}\",\n", " \"eu-central-1\": f\"arn:aws:sagemaker:eu-central-1:446921602837:model-package/{cohere_package}\",\n", " \"eu-west-1\": f\"arn:aws:sagemaker:eu-west-1:985815980388:model-package/{cohere_package}\",\n", " \"eu-west-2\": f\"arn:aws:sagemaker:eu-west-2:856760150666:model-package/{cohere_package}\",\n", " \"eu-west-3\": f\"arn:aws:sagemaker:eu-west-3:843114510376:model-package/{cohere_package}\",\n", " \"eu-north-1\": f\"arn:aws:sagemaker:eu-north-1:136758871317:model-package/{cohere_package}\",\n", " \"ap-southeast-1\": f\"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/{cohere_package}\",\n", " \"ap-southeast-2\": f\"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/{cohere_package}\",\n", " \"ap-northeast-2\": f\"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/{cohere_package}\",\n", " \"ap-northeast-1\": f\"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/{cohere_package}\",\n", " \"ap-south-1\": f\"arn:aws:sagemaker:ap-south-1:077584701553:model-package/{cohere_package}\",\n", " \"sa-east-1\": f\"arn:aws:sagemaker:sa-east-1:270155090741:model-package/{cohere_package}\",\n", "}\n", "\n", "region = boto3.Session().region_name\n", "if region not in cohere_package_map.keys():\n", " raise Exception(\"UNSUPPORTED REGION\")\n", "\n", "# Start Cohere Client\n", "cohere_name = \"cohere-gpt-medium\"\n", "co = Client(endpoint_name=cohere_name)\n", "\n", "# Set Variables\n", "sagemaker_session = Session()\n", "role = get_execution_role()\n", "\n", "MODEL = \"Cohere\"\n", "MODEL_PACKAGE_ARN = cohere_package_map[region]\n", "cohere_instance_type = \"ml.g5.xlarge\"\n", "\n" ] }, { "cell_type": "markdown", "id": "cb3d9f8b-d852-47ff-ba74-28d6bb6559d6", "metadata": {}, "source": [ "This step takes about 10 minutes, and we encourage you to keep models running while you're using them. The last cell in this notebook will delete the model endpoints and configs. The [estimated cost](https://aws.amazon.com/marketplace/pp/prodview-6dmzzso5vu5my) for keeping the the `ml.g5.xlarge` alive is $1.41 / hour.\n", "\n", "If this cell returns an error, you may already have the model running and available. You can either continue, and Sagemaker will pull the model based on the `cohere_name = cohere-gpt-medium` in the above cell, or release the endpoint at the bottom of this notebook and re-run this cell." ] }, { "cell_type": "code", "execution_count": null, "id": "af3d2bb5-9a17-4a31-88a7-8a18bdccc0c0", "metadata": { "tags": [] }, "outputs": [], "source": [ "if \"MODEL_ENDPOINT_SET\" not in locals() or MODEL_ENDPOINT_SET is False: #avoid over-writing your model\n", " print(\"Loading the model takes ~10 minutes.\")\n", " load = input(\"Do you want to load the model? (y/n)\")\n", " if load.lower() in {'y', 'yes'}:\n", " # create a deployable model from the model package.\n", "\n", " cohere_model = ModelPackage(role=role, model_package_arn=MODEL_PACKAGE_ARN, sagemaker_session=sagemaker_session)\n", "\n", " # Deploy the model\n", " predictor = cohere_model.deploy(\n", " initial_instance_count=1,\n", " instance_type=cohere_instance_type,\n", " endpoint_name=cohere_name)\n", "\n", " MODEL_ENDPOINT_SET = True" ] }, { "cell_type": "markdown", "id": "8fe45a55-d96b-4046-9a3d-15511747d349", "metadata": {}, "source": [ "Let's test the model out with a simple prompt." ] }, { "cell_type": "code", "execution_count": null, "id": "c95c42d2-553f-44dc-a49c-cdd5b5b07e24", "metadata": { "tags": [] }, "outputs": [], "source": [ "prompt = \"Today is a nice day,\"\n", "response = co.generate(prompt=prompt, max_tokens=50, temperature=0.9, return_likelihoods='GENERATION')\n", "print(prompt, end='')\n", "print(response.generations[0].text)" ] }, { "cell_type": "markdown", "id": "f376344f-464f-4d7a-aa44-1472bc655cdf", "metadata": {}, "source": [ "## 3. Prepare Prompt" ] }, { "cell_type": "markdown", "id": "ca50f04f-e27f-4c0f-b9a4-581f4592629b", "metadata": {}, "source": [ "An important aspect of Large Language models is [Prompt Engineering](https://en.wikipedia.org/wiki/Prompt_engineering), where the input to the model is constructed to return a useful result. Our prompt needs to make sure the LLM \"listens\" to the call and responds to the appropriate question. The more powerful of the model, the less specific of questions are required. Cohere is capable enough with only asking the default question, \"What is the customer calling about and what are the next steps?\". \n", "\n", "We provide an adapted [boto3 implementation](https://docs.aws.amazon.com/code-library/latest/ug/python_3_transcribe_code_examples.html#scenarios) of transcribing audio and getting the data in `send_to_transcribe.py`. We suggest creating your own Transcribe pipeline to avoid the `sleep(10)` call while waiting for the Transcribe job to be completed; be sure to include the flags `{\"ShowSpeakerLabels\": True, \"MaxSpeakerLabels\":6}` in the job arguments. To reduce complexity in the repository, we start from an already transcribed output. \n", "\n", "**The first step** in preparing the prompt is converting the AWS Transcribe Output into a dailogue and then breaking it into several partitions." ] }, { "cell_type": "code", "execution_count": null, "id": "53879d99-158b-44a4-ab6a-d48153e9c3d3", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Break Transcription into dialogue chunks. \n", "def chunk_transcription(transcript):\n", " \"\"\"\n", " Read the transcription JSON, break into chunks of dialogue spoken by individual.\n", " This function returns a list of chunks.\n", " Each chunk is a dictionary that has 2 (key, value) pairs.\n", " \"speaker_label\": string, current speaker. This will probably be (\"spk_0\", \"spk_1\", etc)\n", " \"words\": string, the words spoken in this chunk of dialogue.\n", " \n", " \"\"\"\n", " words = transcript['results']['items']\n", " punctuation = set(string.punctuation)\n", " punctuation.add('')\n", " \n", " part_template = {\n", " \"speaker_label\": -1,\n", " \"words\": ''\n", " }\n", " part, parts = part_template.copy(), []\n", " full_count, part_count = 0, 0\n", " for word in words:\n", " if word['speaker_label'] != part['speaker_label']:\n", " part_count = 0\n", " if part['speaker_label']!= -1: \n", " parts.append(part)\n", " part = part_template.copy()\n", " \n", " part['speaker_label'] = word['speaker_label']\n", " w = word['alternatives'][0]['content']\n", " if len(part['words'])>0 and w not in punctuation:\n", " part['words'] += ' '\n", " part['words'] += w\n", " \n", " parts.append(part)\n", " return parts\n", "\n", "# This makes speakers more human readable\n", "def rename_speakers(chunks):\n", " \"\"\"\n", " Replaces the spk_0, spk_1 with a more human-readable version.\n", " \"\"\"\n", " speaker_mapping = {} # if you have a proper speaker_mapping, then replace here\n", " for i in range(20):\n", " speaker_mapping[\"spk_%i\" %i] = \"Speaker %i\" %i\n", " \n", " for c in chunks:\n", " c['speaker_label'] = speaker_mapping[c['speaker_label']]\n", " return chunks\n", "\n", "# convert chunks to lines\n", "def build_lines(chunks):\n", " lines = []\n", " for c in chunks:\n", " lines.append(\"%s: %s\" %(c['speaker_label'], c['words']))\n", " \n", " call_part = ''\n", " for line in lines:\n", " call_part += line + '\\n'\n", " \n", " return call_part\n", "\n", "\n", "# Break call into partitions that \n", "def partition_call(chunks, max_word_count=1500, overlap_percentage=0.2):\n", " \"\"\"\n", " Inpts:\n", " chunks- transcription broken into dicts representing a single speaker's chunk\n", " max_word_count- LLM-defined limits of input. \n", " We use word count here vs token count, assuming 1 word~1 token.\n", " overlap_percentage- LLMs perform better if they have some context of what was spoken. \n", " This number controls the amount of context\n", " \n", " This breaks the call into partitions that are manageable by the LLM.\n", " It returns the individual sections that can be attached to a prompt and sent to the LLM.\n", " \"\"\"\n", " \n", " # Count words in each chunk\n", " counts = [len(d['words'].split()) for d in chunks]\n", " \n", " \n", " part_count = 0 #number of words in current partition\n", " i, j = 0, 0 #pointers\n", " partition_ends = [] #start, end of each partition\n", " while j < len(counts):\n", " part_count += counts[j]\n", " if part_count >= max_word_count:\n", " partition_ends.append([i, j])\n", " while part_count > (max_word_count * overlap_percentage):\n", " i += 1\n", " part_count -= counts[i]\n", " j += 1\n", " partition_ends.append([i, j])\n", " \n", " #with list of partition_ends, build partitions\n", " partitions = []\n", " for pe in partition_ends:\n", " part = chunks[pe[0]:pe[1]]\n", " partition = build_lines(part)\n", " partitions.append(partition)\n", " \n", " return partitions" ] }, { "cell_type": "markdown", "id": "b7c8eed0-80e2-43f5-a810-451dc39fa7ae", "metadata": {}, "source": [ "**The next step** is to iterate through the partitions, attach the relevant question to each, and send the prompt to the LLM. If the call is long enough to require multiple prompts, then they are combined with a final prompt/response call to the LLM. " ] }, { "cell_type": "code", "execution_count": null, "id": "92a175d3-e513-46c8-a2c8-514d2c802517", "metadata": { "tags": [] }, "outputs": [], "source": [ "DEFAULT_QUESTION = \"What is the customer calling about and what are the next steps?\"\n", "\n", "def get_call_prompt(lines, question=DEFAULT_QUESTION):\n", " prompt = \"\"\"Call: \n", "%s\n", "\n", "%s\"\"\" %(lines, question)\n", " return prompt\n", "\n", "def get_call_prompts(partitions, question=DEFAULT_QUESTION):\n", " prompts = []\n", " for partition in partitions:\n", " prompt = get_call_prompt(partition, question)\n", " prompts.append(prompt)\n", " \n", " return prompts\n", "\n", "def get_response(prompt):\n", " cohere_response = co.generate(prompt=prompt, max_tokens=200, temperature=0, return_likelihoods='GENERATION')\n", " cohere_text = cohere_response.generations[0].text\n", " cohere_text = '.'.join(cohere_text.split('.')[:-1]) + '.'\n", " \n", " return cohere_text\n", "\n", "def get_responses(prompts):\n", " cohere_texts = []\n", " for prompt in prompts:\n", " cohere_texts.append(get_response(prompt))\n", " \n", " return cohere_texts\n", "\n", "def summarize_summaries(summaries, question=DEFAULT_QUESTION):\n", " \n", " if len(summaries) == 1:\n", " return summaries[0], None\n", " \n", " prompt = \"\"\"Summaries:\"\"\"\n", " for t in summaries:\n", " prompt += \"\"\"\n", "\n", "%s\"\"\" %t\n", " \n", " prompt += \"\"\"\n", "\n", "Combine the summaries and answer this question: %s\"\"\" %question\n", " \n", " full_summary = get_response(prompt)\n", " \n", " return full_summary, prompt\n" ] }, { "cell_type": "markdown", "id": "40ed7b02-f9dd-4cc8-b377-52cc8f9f211c", "metadata": {}, "source": [ "**The final functions** load the transcript and follows the transcript through each of the above scripts." ] }, { "cell_type": "code", "execution_count": null, "id": "e153c089-6034-4c2c-840a-6f77be9eee0d", "metadata": { "tags": [] }, "outputs": [], "source": [ "\n", "def load_transcript(file_path):\n", " with open(file_path, 'r') as fid:\n", " return json.load(fid)\n", "\n", "def run_call(transcript, question=DEFAULT_QUESTION, verbose=False):\n", "\n", " # break call into dialogue lines\n", " chunks = chunk_transcription(transcript)\n", " chunks = rename_speakers(chunks)\n", " \n", " # break dialogue lines into partitions\n", " partitions = partition_call(chunks, 1000, 0.3)\n", " \n", " prompts = get_call_prompts(partitions, question)\n", " \n", " # Print Option\n", " if verbose:\n", " print('Prompt for Partition 1:')\n", " print(prompts[0])\n", " \n", " # Partition Summary\n", " summaries = get_responses(prompts)\n", " \n", " # Combined Summary\n", " summary, summary_prompt = summarize_summaries(summaries)\n", " \n", " # Print Option\n", " if verbose:\n", " print('Full Summary:')\n", " print(summary)\n", " \n", " summary_dict = {\n", " 'list_prompt': prompts,\n", " 'summary_prompt': summary_prompt,\n", " 'final_summary': summary,\n", " 'question': question,\n", " 'model': MODEL,\n", " 'model_arn': MODEL_PACKAGE_ARN,\n", " }\n", " \n", " return summary_dict\n", " " ] }, { "cell_type": "markdown", "id": "ebe33af3-bf3e-4e30-805b-428df2348df6", "metadata": {}, "source": [ "## 4. Load Call and Send to LLM" ] }, { "cell_type": "markdown", "id": "a577f230-8e08-48a4-8723-4a6cdf758d8c", "metadata": {}, "source": [ "We are now ready to send the prompt to the LLM and get the summary answer." ] }, { "cell_type": "code", "execution_count": null, "id": "abd37ca9-fda0-46e4-a532-6272d6c07294", "metadata": { "tags": [] }, "outputs": [], "source": [ "# List call transcripts\n", "transcript_file = \"./Data/Retail41.json\"\n", "print(transcript_file)\n", "\n", "QUESTION = DEFAULT_QUESTION\n", "# QUESTION = \"How did the agent help the customer?\"\n", "\n", "transcript = load_transcript(transcript_file)\n", " \n", "summary_dict = run_call(transcript, verbose=True, question=QUESTION)\n", "\n", "print('=='*20)" ] }, { "cell_type": "markdown", "id": "c06bbb0f-6ac0-48aa-bb4b-dbfaab7008a5", "metadata": {}, "source": [ "## 5. Save Result" ] }, { "cell_type": "code", "execution_count": null, "id": "5e606bb1-7bca-46ab-9927-136993d1bed6", "metadata": {}, "outputs": [], "source": [ "def prepare_summary(summary_dict, call_metadata=None):\n", " summary_event = { \n", " 'time': datetime.now().strftime(\"%Y-%m-%dT%H:%M:%S\"),\n", " \"service-type\": \"MediaInsights\",\n", " \"detail-type\": \"LargeLanguageModelSummary\",\n", " \"summaryEvent\": summary_dict\n", " }\n", "\n", " if call_metadata is not None:\n", " result['metadata'] = call_metadata\n", " \n", " return summary_event\n", "\n", "def put_event(event, filename):\n", " file_ = './' + filename\n", " with open(file_, 'w') as fid:\n", " json.dump(event, fid)\n", "\n", " return file_\n", "\n", "summary_event = prepare_summary(summary_dict)\n", "summary_file = put_event(summary_event, 'Data/output.json')" ] }, { "cell_type": "markdown", "id": "c89ba24b-ca72-480f-a118-89b2bd43f507", "metadata": {}, "source": [ "## 6. Release Models" ] }, { "cell_type": "markdown", "id": "ec492e7d-3f36-4a9b-a2cd-54308e62cf6a", "metadata": {}, "source": [ "This cell will delete the model endpoints and configs. Reloading the model takes about 10 minutes.\n", "\n", "The [estimated cost](https://aws.amazon.com/marketplace/pp/prodview-6dmzzso5vu5my) for keeping the `ml.g5.xlarge` instance alive is $1.41 / hour." ] }, { "cell_type": "code", "execution_count": null, "id": "f169db08-c854-4ec6-a09d-0938002f9a58", "metadata": { "tags": [] }, "outputs": [], "source": [ "print(\"Releasing the models deletes the endpoint. Reloading the model takes ~10 minutes.\")\n", "delete = input(\"Do you want to delete the model? (y/n)\")\n", "if delete.lower() in {'y', 'yes'}: # Ensure we are not deleting the model unless prompted. \n", " \n", " sagemaker_client = boto3.client('sagemaker', region_name=region)\n", " \n", " endpoint_name = 'cohere-gpt-medium'\n", " response = sagemaker_client.describe_endpoint_config(EndpointConfigName=endpoint_name)\n", " \n", " endpoint_config_name = response['EndpointConfigName']\n", " model_name = response['ProductionVariants'][0]['ModelName']\n", " \n", " sagemaker_client.delete_model(ModelName=model_name)\n", " sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name) \n", " sagemaker_client.delete_endpoint(EndpointName=endpoint_name)\n", " \n", " MODEL_ENDPOINT_SET = False" ] } ], "metadata": { "kernelspec": { "display_name": "conda_amazonei_pytorch_latest_p37", "language": "python", "name": "conda_amazonei_pytorch_latest_p37" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }