# FAQ Bot - Q&A model, trained using pairs of questions and answers

Fine tune a large language model with a list of question and answers. This approach os called Closed Book Q&A because the model doesn't require context and is capable of answering variations of the questions you provide in your dataset.

This is an evolution of classic ChatBots because LLMs like T5 can disambiguate and generalize better than the old technologies we find in these ChatBots services.

For that purpose you'll use a **[T5 SMALL SSM ~80MParams](https://huggingface.co/google/t5-small-ssm)** model, accelerated by a trn1 instance ([AWS Trainium](https://aws.amazon.com/machine-learning/trainium/)), running on [Amazon SageMaker](https://aws.amazon.com/sagemaker/).

You can set the hyperperameter **--model_name** to change the model size. This solution works well with: 
 - t5-small-ssm
 - t5-large-ssm
 
If you need to fine tune **t5-3b-ssm, t5-11b-ssm or t5-xxl-ssm**, you need **FSDP**, which is out of the scope of this tutorial.

You can see the results of the predictions at the end of this notebook. You'll notice the questions sent to the model are not in the training dataset. They are just variations of the questions used to fine tune the model.

The dataset is the content of all **AWS FAQ** pages, downloaded from: https://aws.amazon.com/faqs/

This notebook was tested with **Python3.8+**

>**If you have never before done a SageMaker training job with Trn1, you'll need to do a service level request. This can take a few hours, best to make the request early so you don't have to wait.**

You can edit this URL to go directly to the page to request the increase:

`https://<region>.console.aws.amazon.com/servicequotas/home/services/sagemaker/quotas/L-79A1FE57`

## 1) Install some dependencies
You need a more recent version of **sagemaker** Python Library. After this install you'll need to restart the kernel.

In [None]:
# add --force-reinstall if it fails to resolve dependencies
%pip install -U sagemaker

In [None]:
import sagemaker
print(sagemaker.__version__)
if not sagemaker.__version__ >= "2.146.0": print("You need to upgrade or restart the kernel if you already upgraded")

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
bucket = sess.default_bucket()
region = sess.boto_region_name

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {bucket}")
print(f"sagemaker session region: {region}")

## 2) Visualize and upload the dataset
Take note of the S3 URI here if you get interrupted, no need to reupload later.

In [5]:
import pandas as pd
df = pd.read_csv('train.csv.gz', compression='gzip', sep=';')
df.head()

Unnamed: 0,service,question,answers
0,/ec2/autoscaling/faqs/,What is Amazon EC2 Auto Scaling?,Amazon EC2 Auto Scaling is a fully managed ser...
1,/ec2/autoscaling/faqs/,When should I use Amazon EC2 Auto Scaling vs. ...,You should use AWS Auto Scaling to manage scal...
2,/ec2/autoscaling/faqs/,How is Predictive Scaling Policy different fro...,Predictive Scaling Policy brings the similar p...
3,/ec2/autoscaling/faqs/,What are the benefits of using Amazon EC2 Auto...,Amazon EC2 Auto Scaling helps to maintain your...
4,/ec2/autoscaling/faqs/,What is fleet management and how is it differe...,If your application runs on Amazon EC2 instanc...


In [None]:
s3_uri = sess.upload_data(path='train.csv.gz', key_prefix='datasets/aws-faq/train')
print(s3_uri)

## 3) Prepare the train/inference script

In [None]:
import os
if not os.path.isdir('src'): os.mkdir('src')

In [None]:
## requirements.txt will be used by SageMaker to install
## additional Python packages

In [None]:
%%writefile src/requirements.txt
torchvision
transformers==4.27.4

In [None]:
!pygmentize src/question_answering.py

## 4) Kick-off our fine tuning job on Amazon SageMaker
We need to create a SageMaker Estimator first and then invoke **.fit**. 

Please, notice we're passing the parameter **checkpoint_s3_uri**. This is important because NeuronSDK will spend some time compiling the model before fine tuning it. The compiler saves the model to cache files and, with this param, the files will be uploaded to **S3**. So, next time we run a job, NeuronSDK can just load back the cache files and start training immediately.

When training for the first time, the training job takes ~9 hours to process all 60 Epochs on an **trn1.32xlarge**.

If you need to wait for a quota increase like I did. When you come back, run cell 2 to setup the sagemaker session and S3 uris, etc. Then run the below to get the process started.

In [None]:
from sagemaker.pytorch import PyTorch

# https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers
image_name="pytorch-training-neuronx"
# We need SDK2.9+ to deal with T5s
image_tag="1.13.0-neuronx-py38-sdk2.9.1-ubuntu20.04"

estimator = PyTorch(
 entry_point="question_answering.py", # Specify your train script
 source_dir="src",
 role=role,
 sagemaker_session=sess,
 instance_count=1,
 instance_type='ml.trn1.32xlarge', 
 disable_profiler=True,
 output_path=f"s3://{bucket}/output",
 image_uri=f"763104351884.dkr.ecr.{region}.amazonaws.com/{image_name}:{image_tag}",
 
 # Parameters required to enable checkpointing
 # This is necessary for caching XLA HLO files and reduce training time next time 
 checkpoint_s3_uri=f"s3://{bucket}/checkpoints",
 volume_size = 512,
 distribution={
 "torch_distributed": {
 "enabled": True
 }
 },
 hyperparameters={
 "model-name": "t5-small-ssm",
 "lr": 5e-5,
 "num-epochs": 60
 },
 metric_definitions=[
 {'Name': 'train:loss', 'Regex': 'loss:(\S+);'}
 ]
)
estimator.framework_version = '1.13.1' # workround when using image_uri

In [None]:
estimator.fit({"train": s3_uri})

## 5) Deploy our model to a SageMaker endpoint
Here, we're using a pre-defined HuggingFace model class+container to just load our fine tuned model on a CPU based instance: c6i.4xlarge (an Intel Xeon based machine).

>If you're picking this up later uncomment line 4, fill in the path to your model artifacts, comment line 9 out, and uncomment line 10.

In [None]:
# uncomment and modify this if you're picking this back up later and your training was sucessful.
# you'll need to get the model s3 URI from sagemaker -> Training -> Training Jobs -> <Your job name> -> Output -> S3 model artifact

# pre_trained_model = YOUR_S3_PATH
from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
 model_data=estimator.model_data, # path to your model and script
 # model_data=pre_trained_model, # path to your model and script
 role=role, # iam role with permissions to create an Endpoint
 transformers_version="4.26.0", # transformers version used
 pytorch_version="1.13.1", # pytorch version used
 py_version='py39', # python version used
 sagemaker_session=sess,
 
 # for production it is important to define vpc_config and use a vpc_endpoint
 #vpc_config={
 # 'Subnets': ['subnet-A-REPLACE', 'subnet-B-REPLACE'],
 # 'SecurityGroupIds': ['sg-A-REPLACE', 'sg-B-REPLACE']
 #} 
)

In [None]:
predictor = huggingface_model.deploy(
 initial_instance_count=1,
 instance_type="ml.c6i.4xlarge",
)

## 6) Run a quick test

In [17]:
%%time
questions = [
 "What is SageMaker?",
 "What is EC2 AutoScaling?",
 "What are the benefits of autoscaling?"
]
resp = predictor.predict({'inputs': questions})
for q,a in zip(questions, resp['answers']):
 print(f"Q: {q}\nA: {a}\n")

Q: What is SageMaker?
A: SageMaker is a new ML (ML) service that makes it easy to build, train, and deploy notebook data inference, and deploy and tune models of data. SageMaker helps you build, train, and manage your ML models, and deploy model data to build your models up and down

Q: What is EC2 AutoScaling?
A: Amazon-based EC2 instancess let you reduce your applications on multiple factors by allowing you to scale your application requirements and costs across multiple instances. Amazoning EC2 instances as a result of optimization in your applications, reducing the number of compute EC and the number of available instances to optimize your

Q: What are the benefits of autoscaling?
A: You can use autoscaling to help you optimize the capacity of your applications by allowing you to take advantage of your application across multiple applications. Autoscaling allows you to easily scale the number of your applications across multiple devices, and optimize your fleet up or down to 40%. Y

## 7) Clean up
This will delete the model and the endpoint you created

In [None]:
predictor.delete_model()
predictor.delete_endpoint()