# Using a Large Language Model to Summarize Customer Calls

Welcome to Sagemaker! 

SageMaker offers access to built-in algorithms and pre-built solution templates to help customers get started with ML quickly. You can access these models and algorithms programmatically using SageMaker Python SDK or through the JumpStart UI in SageMaker Studio. In this notebook, we demonstrate how to apply a third party Algorithm to AWS Chime SDK Call Analytics output and arrive at a summary of a person to person communication using a Large Language Model (LLM). We use the SageMaker Python SDK and Cohere's Sagemaker library. 

In this notebook, we demonstrate a simple workflow that(1) prepares the environment for our model, (2) imports the LLM package, (3) converts a call transcript to a prompt, (4) sends a sample prompt to an LLM, (5) saves the model result in the AWS Chime SDK Data Lake format, and (6) releases the LLM package.

1. [Prepare Environment](#1.-Prepare-Environment)
2. [LLM Package Handling](#2.-LLM-Package-Handling)
3. [Prepare Prompt](#3.-Prepare-Prompt)
4. [Load Call and Send to LLM](#4.-Load-Call-and-Send-to-LLM)
5. [Save Result](#5.-Save-Result)
6. [Release Models](#6.-Release-Models)

## 1. Prepare Environment

In [None]:
!pip install sagemaker boto3 --upgrade --quiet
!pip show sagemaker | egrep "Name|Version"
!pip show boto3 | egrep "Name|Version"
!python --version

!pip install cohere-sagemaker

In [None]:
from cohere_sagemaker import Client, CohereError

from sagemaker import ModelPackage, get_execution_role
import boto3
from sagemaker import Session
import json, string, os
from datetime import datetime

## 2. LLM Package Handling

Several options for Large Language Models are available to users. We used Cohere in this notebook because it is available as a [SageMaker Model Package](https://aws.amazon.com/blogs/machine-learning/cohere-brings-language-ai-to-amazon-sagemaker/). To gain access to the Large Language Model used here, please Subscribe to [Foundation Models](https://aws.amazon.com/sagemaker/jumpstart/getting-started/); in the AWS console, go to Sagemaker and choose Jumpstart > Foundation Models on the left toolbar. You will either have access and see the available models, or you will need to Request Access and wait a day. Once you have access to Foundation Models, subscribe to [cohere-gpt-medium](https://aws.amazon.com/marketplace/pp/prodview-6dmzzso5vu5my); this provides access to the LLM used here.

In this section, we create the utils for the model and make it available for generating useful results. 

In [None]:
cohere_package = "cohere-gpt-medium-v1-4-825b877abfd53d7ca"

cohere_package_map = {
 "us-east-1": f"arn:aws:sagemaker:us-east-1:865070037744:model-package/{cohere_package}",
 "us-east-2": f"arn:aws:sagemaker:us-east-2:057799348421:model-package/{cohere_package}",
 "us-west-1": f"arn:aws:sagemaker:us-west-1:382657785993:model-package/{cohere_package}",
 "us-west-2": f"arn:aws:sagemaker:us-west-2:594846645681:model-package/{cohere_package}",
 "ca-central-1": f"arn:aws:sagemaker:ca-central-1:470592106596:model-package/{cohere_package}",
 "eu-central-1": f"arn:aws:sagemaker:eu-central-1:446921602837:model-package/{cohere_package}",
 "eu-west-1": f"arn:aws:sagemaker:eu-west-1:985815980388:model-package/{cohere_package}",
 "eu-west-2": f"arn:aws:sagemaker:eu-west-2:856760150666:model-package/{cohere_package}",
 "eu-west-3": f"arn:aws:sagemaker:eu-west-3:843114510376:model-package/{cohere_package}",
 "eu-north-1": f"arn:aws:sagemaker:eu-north-1:136758871317:model-package/{cohere_package}",
 "ap-southeast-1": f"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/{cohere_package}",
 "ap-southeast-2": f"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/{cohere_package}",
 "ap-northeast-2": f"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/{cohere_package}",
 "ap-northeast-1": f"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/{cohere_package}",
 "ap-south-1": f"arn:aws:sagemaker:ap-south-1:077584701553:model-package/{cohere_package}",
 "sa-east-1": f"arn:aws:sagemaker:sa-east-1:270155090741:model-package/{cohere_package}",
}

region = boto3.Session().region_name
if region not in cohere_package_map.keys():
 raise Exception("UNSUPPORTED REGION")

# Start Cohere Client
cohere_name = "cohere-gpt-medium"
co = Client(endpoint_name=cohere_name)

# Set Variables
sagemaker_session = Session()
role = get_execution_role()

MODEL = "Cohere"
MODEL_PACKAGE_ARN = cohere_package_map[region]
cohere_instance_type = "ml.g5.xlarge"



This step takes about 10 minutes, and we encourage you to keep models running while you're using them. The last cell in this notebook will delete the model endpoints and configs. The [estimated cost](https://aws.amazon.com/marketplace/pp/prodview-6dmzzso5vu5my) for keeping the the `ml.g5.xlarge` alive is $1.41 / hour.

If this cell returns an error, you may already have the model running and available. You can either continue, and Sagemaker will pull the model based on the `cohere_name = cohere-gpt-medium` in the above cell, or release the endpoint at the bottom of this notebook and re-run this cell.

In [None]:
if "MODEL_ENDPOINT_SET" not in locals() or MODEL_ENDPOINT_SET is False: #avoid over-writing your model
 print("Loading the model takes ~10 minutes.")
 load = input("Do you want to load the model? (y/n)")
 if load.lower() in {'y', 'yes'}:
 # create a deployable model from the model package.

 cohere_model = ModelPackage(role=role, model_package_arn=MODEL_PACKAGE_ARN, sagemaker_session=sagemaker_session)

 # Deploy the model
 predictor = cohere_model.deploy(
 initial_instance_count=1,
 instance_type=cohere_instance_type,
 endpoint_name=cohere_name)

 MODEL_ENDPOINT_SET = True

Let's test the model out with a simple prompt.

In [None]:
prompt = "Today is a nice day,"
response = co.generate(prompt=prompt, max_tokens=50, temperature=0.9, return_likelihoods='GENERATION')
print(prompt, end='')
print(response.generations[0].text)

## 3. Prepare Prompt

An important aspect of Large Language models is [Prompt Engineering](https://en.wikipedia.org/wiki/Prompt_engineering), where the input to the model is constructed to return a useful result. Our prompt needs to make sure the LLM "listens" to the call and responds to the appropriate question. The more powerful of the model, the less specific of questions are required. Cohere is capable enough with only asking the default question, "What is the customer calling about and what are the next steps?". 

We provide an adapted [boto3 implementation](https://docs.aws.amazon.com/code-library/latest/ug/python_3_transcribe_code_examples.html#scenarios) of transcribing audio and getting the data in `send_to_transcribe.py`. We suggest creating your own Transcribe pipeline to avoid the `sleep(10)` call while waiting for the Transcribe job to be completed; be sure to include the flags `{"ShowSpeakerLabels": True, "MaxSpeakerLabels":6}` in the job arguments. To reduce complexity in the repository, we start from an already transcribed output. 

**The first step** in preparing the prompt is converting the AWS Transcribe Output into a dailogue and then breaking it into several partitions.

In [None]:
# Break Transcription into dialogue chunks. 
def chunk_transcription(transcript):
 """
 Read the transcription JSON, break into chunks of dialogue spoken by individual.
 This function returns a list of chunks.
 Each chunk is a dictionary that has 2 (key, value) pairs.
 "speaker_label": string, current speaker. This will probably be ("spk_0", "spk_1", etc)
 "words": string, the words spoken in this chunk of dialogue.
 
 """
 words = transcript['results']['items']
 punctuation = set(string.punctuation)
 punctuation.add('')
 
 part_template = {
 "speaker_label": -1,
 "words": ''
 }
 part, parts = part_template.copy(), []
 full_count, part_count = 0, 0
 for word in words:
 if word['speaker_label'] != part['speaker_label']:
 part_count = 0
 if part['speaker_label']!= -1: 
 parts.append(part)
 part = part_template.copy()
 
 part['speaker_label'] = word['speaker_label']
 w = word['alternatives'][0]['content']
 if len(part['words'])>0 and w not in punctuation:
 part['words'] += ' '
 part['words'] += w
 
 parts.append(part)
 return parts

# This makes speakers more human readable
def rename_speakers(chunks):
 """
 Replaces the spk_0, spk_1 with a more human-readable version.
 """
 speaker_mapping = {} # if you have a proper speaker_mapping, then replace here
 for i in range(20):
 speaker_mapping["spk_%i" %i] = "Speaker %i" %i
 
 for c in chunks:
 c['speaker_label'] = speaker_mapping[c['speaker_label']]
 return chunks

# convert chunks to lines
def build_lines(chunks):
 lines = []
 for c in chunks:
 lines.append("%s: %s" %(c['speaker_label'], c['words']))
 
 call_part = ''
 for line in lines:
 call_part += line + '\n'
 
 return call_part


# Break call into partitions that 
def partition_call(chunks, max_word_count=1500, overlap_percentage=0.2):
 """
 Inpts:
 chunks- transcription broken into dicts representing a single speaker's chunk
 max_word_count- LLM-defined limits of input. 
 We use word count here vs token count, assuming 1 word~1 token.
 overlap_percentage- LLMs perform better if they have some context of what was spoken. 
 This number controls the amount of context
 
 This breaks the call into partitions that are manageable by the LLM.
 It returns the individual sections that can be attached to a prompt and sent to the LLM.
 """
 
 # Count words in each chunk
 counts = [len(d['words'].split()) for d in chunks]
 
 
 part_count = 0 #number of words in current partition
 i, j = 0, 0 #pointers
 partition_ends = [] #start, end of each partition
 while j < len(counts):
 part_count += counts[j]
 if part_count >= max_word_count:
 partition_ends.append([i, j])
 while part_count > (max_word_count * overlap_percentage):
 i += 1
 part_count -= counts[i]
 j += 1
 partition_ends.append([i, j])
 
 #with list of partition_ends, build partitions
 partitions = []
 for pe in partition_ends:
 part = chunks[pe[0]:pe[1]]
 partition = build_lines(part)
 partitions.append(partition)
 
 return partitions

**The next step** is to iterate through the partitions, attach the relevant question to each, and send the prompt to the LLM. If the call is long enough to require multiple prompts, then they are combined with a final prompt/response call to the LLM. 

In [None]:
DEFAULT_QUESTION = "What is the customer calling about and what are the next steps?"

def get_call_prompt(lines, question=DEFAULT_QUESTION):
 prompt = """Call: 
%s

%s""" %(lines, question)
 return prompt

def get_call_prompts(partitions, question=DEFAULT_QUESTION):
 prompts = []
 for partition in partitions:
 prompt = get_call_prompt(partition, question)
 prompts.append(prompt)
 
 return prompts

def get_response(prompt):
 cohere_response = co.generate(prompt=prompt, max_tokens=200, temperature=0, return_likelihoods='GENERATION')
 cohere_text = cohere_response.generations[0].text
 cohere_text = '.'.join(cohere_text.split('.')[:-1]) + '.'
 
 return cohere_text

def get_responses(prompts):
 cohere_texts = []
 for prompt in prompts:
 cohere_texts.append(get_response(prompt))
 
 return cohere_texts

def summarize_summaries(summaries, question=DEFAULT_QUESTION):
 
 if len(summaries) == 1:
 return summaries[0], None
 
 prompt = """Summaries:"""
 for t in summaries:
 prompt += """

%s""" %t
 
 prompt += """

Combine the summaries and answer this question: %s""" %question
 
 full_summary = get_response(prompt)
 
 return full_summary, prompt


**The final functions** load the transcript and follows the transcript through each of the above scripts.

In [None]:

def load_transcript(file_path):
 with open(file_path, 'r') as fid:
 return json.load(fid)

def run_call(transcript, question=DEFAULT_QUESTION, verbose=False):

 # break call into dialogue lines
 chunks = chunk_transcription(transcript)
 chunks = rename_speakers(chunks)
 
 # break dialogue lines into partitions
 partitions = partition_call(chunks, 1000, 0.3)
 
 prompts = get_call_prompts(partitions, question)
 
 # Print Option
 if verbose:
 print('Prompt for Partition 1:')
 print(prompts[0])
 
 # Partition Summary
 summaries = get_responses(prompts)
 
 # Combined Summary
 summary, summary_prompt = summarize_summaries(summaries)
 
 # Print Option
 if verbose:
 print('Full Summary:')
 print(summary)
 
 summary_dict = {
 'list_prompt': prompts,
 'summary_prompt': summary_prompt,
 'final_summary': summary,
 'question': question,
 'model': MODEL,
 'model_arn': MODEL_PACKAGE_ARN,
 }
 
 return summary_dict
 

## 4. Load Call and Send to LLM

We are now ready to send the prompt to the LLM and get the summary answer.

In [None]:
# List call transcripts
transcript_file = "./Data/Retail41.json"
print(transcript_file)

QUESTION = DEFAULT_QUESTION
# QUESTION = "How did the agent help the customer?"

transcript = load_transcript(transcript_file)
 
summary_dict = run_call(transcript, verbose=True, question=QUESTION)

print('=='*20)

## 5. Save Result

In [None]:
def prepare_summary(summary_dict, call_metadata=None):
 summary_event = { 
 'time': datetime.now().strftime("%Y-%m-%dT%H:%M:%S"),
 "service-type": "MediaInsights",
 "detail-type": "LargeLanguageModelSummary",
 "summaryEvent": summary_dict
 }

 if call_metadata is not None:
 result['metadata'] = call_metadata
 
 return summary_event

def put_event(event, filename):
 file_ = './' + filename
 with open(file_, 'w') as fid:
 json.dump(event, fid)

 return file_

summary_event = prepare_summary(summary_dict)
summary_file = put_event(summary_event, 'Data/output.json')

## 6. Release Models

This cell will delete the model endpoints and configs. Reloading the model takes about 10 minutes.

The [estimated cost](https://aws.amazon.com/marketplace/pp/prodview-6dmzzso5vu5my) for keeping the `ml.g5.xlarge` instance alive is $1.41 / hour.

In [None]:
print("Releasing the models deletes the endpoint. Reloading the model takes ~10 minutes.")
delete = input("Do you want to delete the model? (y/n)")
if delete.lower() in {'y', 'yes'}: # Ensure we are not deleting the model unless prompted. 
 
 sagemaker_client = boto3.client('sagemaker', region_name=region)
 
 endpoint_name = 'cohere-gpt-medium'
 response = sagemaker_client.describe_endpoint_config(EndpointConfigName=endpoint_name)
 
 endpoint_config_name = response['EndpointConfigName']
 model_name = response['ProductionVariants'][0]['ModelName']
 
 sagemaker_client.delete_model(ModelName=model_name)
 sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name) 
 sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
 
 MODEL_ENDPOINT_SET = False