# Finetuning Foundation Models - HuggingFace Text2Text- FLAN

In this demo notebook, we use the SageMaker Python SDK to **fine-tune a Text2Text model**. Such a model takes prompting text as input and generates text as output. The prompt can include a task description in natural language. Accordingly, the model can be used for a variety of NLP tasks (e.g., text summarization, question answering, etc.).

We will fine-tune a pre-trained **FLAN T5 model** from [Hugging Face](https://huggingface.co/docs/transformers/model_doc/flan-t5). While pre-trained FLAN T5 models can be used "as is" for many tasks, fine-tuning can improve model performance on a particular task or language domain. As an example, we will fine-tune the model for a task that was not used for pre-training. After fine-tuning we will deploy two inference endpoints, one with a pre-trained and one with a fine-tuned model. We will then run the same inference query against both endpoints and compare results.

#### In this notebook:
1. [Setting up](#1.-Setting-up)
1. [Fine-tuning a model](#2.-Fine-tuning-a-model)
1. [Deploying inference endpoints](#3.-Deploying-inference-endpoints)
1. [Running inference queries](#4.-Running-inference-queries)
1. [Cleaning up resources](#5.-Cleaning-up-resources)

### 1. Setting up

We begin by installing and upgrading necessary packages. Restart the kernel after executing the cell below.

In [3]:
#!pip install nest-asyncio==1.5.5 --quiet
#!pip install ipywidgets==8.0.4 --quiet
#!pip install sagemaker==2.148.0 --quiet

We will use the following variables throughout the notebook. In particular, we select FLAN T5 model size and select training and inference instance types. We also obtain execution role associated with the current notebook instance.

In [4]:
import boto3
import sagemaker

# Get current region, role, and default bucket
aws_region = boto3.Session().region_name
aws_role = sagemaker.session.Session().get_caller_identity_arn()
output_bucket = sagemaker.Session().default_bucket()

# This will be useful for printing
newline, bold, unbold = "\n", "\033[1m", "\033[0m"

print(f"{bold}aws_region:{unbold} {aws_region}")
print(f"{bold}aws_role:{unbold} {aws_role}")
print(f"{bold}output_bucket:{unbold} {output_bucket}")

[1maws_region:[0m us-east-1
[1maws_role:[0m arn:aws:iam::509957658284:role/service-role/AmazonSageMaker-ExecutionRole-20211126T131684
[1moutput_bucket:[0m sagemaker-us-east-1-509957658284


## Select Flan model

In [5]:
import IPython
from ipywidgets import Dropdown
from sagemaker.jumpstart.filters import And
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# Default model choice
model_id = "huggingface-text2text-flan-t5-small"
model_version = "*"

In [6]:
from sagemaker.instance_types import retrieve_default

# Instance types for training and inference
training_instance_type = retrieve_default(
    model_id=model_id, model_version=model_version, scope="training"
)

training_instance_type = "ml.p3.2xlarge" 
inference_instance_type = "ml.g5.2xlarge"
print(f"{bold}model_id:{unbold} {model_id}")
print(f"{bold}training_instance_type:{unbold} {training_instance_type}")
print(f"{bold}inference_instance_type:{unbold} {inference_instance_type}")

[1mmodel_id:[0m huggingface-text2text-flan-t5-small
[1mtraining_instance_type:[0m ml.p3.2xlarge
[1minference_instance_type:[0m ml.g5.2xlarge


### 2. Fine-tuning a model

FLAN T5 models were pre-trained on a variety of tasks. In this demo, we fine-tune a model for a new task. In this task, given a piece of text, the model is asked to generate questions that are relevant to the text, but cannot be answered based on provided information. Examples are given in the inference section of this notebook.

#### 2.1. Preparing training data
We will use a subset of SQuAD2.0 for supervised fine-tuning. This dataset contains questions posed by human annotators on a set of Wikipedia articles. In addition to questions with answers, SQuAD2.0 contains about 50k unanswerable questions. Such questions are plausible, but cannot be directly answered from the articles' content. We only use unanswerable questions for our task.

*Citation: @article{rajpurkar2018know, title={Know what you don't know: Unanswerable questions for SQuAD},
author={Rajpurkar, Pranav and Jia, Robin and Liang, Percy}, journal={arXiv preprint arXiv:1806.03822}, year={2018} }*

License: [Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)](https://creativecommons.org/licenses/by-sa/4.0/legalcode)
#original_data_location = f"s3://sagemaker-sample-files/datasets/text/squad2.0/{original_data_file}"

In [7]:
from sagemaker.s3 import S3Downloader

original_data_file = "train-v2.0.json"

The Text2Text generation model can be fine-tuned on any text data provided that the data is in the expected format. The data must include a training and an optional validation parts. The best model is selected according to the validation loss, calculated at the end of each epoch. If a validation set is not given, an (adjustable) percentage of the training data is automatically split and used for validation.

The training data must be formatted in JSON lines (`.jsonl`) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, however it can be saved in multiple jsonl files. The `.jsonl` file extension is mandatory. The training folder can also contain a `template.json` file describing the input and output formats.

If no template file is given, the following default template will be used:
```json
{
    "prompt": "{prompt}",
    "completion": "{completion}"
}
```
In this case, the data in the JSON lines entries must include `prompt` and `completion` fields.

In this demo, we are going to use a custom template (see below).

In [8]:
import json

local_data_file = "task-data.jsonl"  # any name with .jsonl extension

with open(original_data_file) as f:
    data = json.load(f)

with open(local_data_file, "w") as f:
    for article in data["data"]:
        for paragraph in article["paragraphs"]:
            # iterate over questions for a given paragraph
            for qas in paragraph["qas"]:
                if qas["is_impossible"]:
                    # the question is relevant, but cannot be answered
                    example = {"context": paragraph["context"], "question": qas["question"]}
                    json.dump(example, f)
                    f.write("\n")

template = {
    "prompt": "Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}",
    "completion": "{question}",
}
with open("template.json", "w") as f:
    json.dump(template, f)

### Upload to S3

In [9]:
from sagemaker.s3 import S3Uploader

train_data_location = f"s3://{output_bucket}/train_data"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("template.json", train_data_location)
print(f"{bold}training data:{unbold} {train_data_location}")

[1mtraining data:[0m s3://sagemaker-us-east-1-509957658284/train_data


#### 2.2. Start training

We are now ready to launch a training job.

In [10]:
from sagemaker import image_uris, model_uris, script_uris

# Training instance will use this image
train_image_uri = image_uris.retrieve(
    region=aws_region,
    framework=None,  # automatically inferred from model_id
    model_id=model_id,
    model_version=model_version,
    image_scope="training",
    instance_type=training_instance_type,
)

# Pre-trained model
train_model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="training"
)

# Script to execute on the training instance
train_script_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="training"
)

output_location = f"s3://{output_bucket}/demo-fine-tune-flan-t5/"

print(f"{bold}image uri:{unbold} {train_image_uri}")
print(f"{bold}model uri:{unbold} {train_model_uri}")
print(f"{bold}script uri:{unbold} {train_script_uri}")
print(f"{bold}output location:{unbold} {output_location}")

[1mimage uri:[0m 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
[1mmodel uri:[0m s3://jumpstart-cache-prod-us-east-1/huggingface-training/train-huggingface-text2text-flan-t5-small.tar.gz
[1mscript uri:[0m s3://jumpstart-cache-prod-us-east-1/source-directory-tarballs/huggingface/transfer_learning/text2text/prepack/v1.0.3/sourcedir.tar.gz
[1moutput location:[0m s3://sagemaker-us-east-1-509957658284/demo-fine-tune-flan-t5/


In [11]:
from sagemaker import hyperparameters

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# We will override some default hyperparameters with custom values
hyperparameters["epochs"] = "2"
print(hyperparameters)

{'epochs': '2', 'max_steps': '-1', 'seed': '42', 'batch_size': '64', 'learning_rate': '0.0001', 'lr_scheduler_type': 'constant_with_warmup', 'warmup_ratio': '0.0', 'warmup_steps': '0', 'validation_split_ratio': '0.05', 'train_data_split_seed': '0', 'max_train_samples': '-1', 'max_eval_samples': '-1', 'max_input_length': '-1', 'max_output_length': '128', 'pad_to_max_length': 'True', 'gradient_accumulation_steps': '1', 'weight_decay': '0.0', 'adam_beta1': '0.9', 'adam_beta2': '0.999', 'adam_epsilon': '1e-08', 'max_grad_norm': '1.0', 'load_best_model_at_end': 'True', 'early_stopping_patience': '3', 'early_stopping_threshold': '0.0', 'label_smoothing_factor': '0', 'logging_strategy': 'steps', 'logging_first_step': 'False', 'logging_steps': '500', 'logging_nan_inf_filter': 'True', 'save_strategy': 'epoch', 'save_steps': '500', 'save_total_limit': '2', 'dataloader_drop_last': 'False', 'dataloader_num_workers': '0', 'evalaution_strategy': 'epoch', 'eval_steps': '500', 'eval_accumulation_steps

We are now ready to start the training job. This can take a while to complete, from 20 minutes to several hours, depending on the model size, amount of data, and so on (e.g., it can take a few hours for the xl model, 40k examples and 3 epochs).

In [12]:
from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base

model_name = "-".join(model_id.split("-")[2:])  # get the most informative part of ID
training_job_name = name_from_base(f"js-demo-{model_name}-{hyperparameters['epochs']}")
print(f"{bold}job name:{unbold} {training_job_name}")

training_metric_definitions = [
    {"Name": "val_loss", "Regex": "'eval_loss': ([0-9\\.]+)"},
    {"Name": "train_loss", "Regex": "'loss': ([0-9\\.]+)"},
    {"Name": "epoch", "Regex": "'epoch': ([0-9\\.]+)"},
]

# Create SageMaker Estimator instance
sm_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    model_uri=train_model_uri,
    source_dir=train_script_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    volume_size=300,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=output_location,
    metric_definitions=training_metric_definitions,
)

# Launch a SageMaker training job over data located in the given S3 path
# Training jobs can take hours, it is recommended to set wait=False,
# and monitor job status through SageMaker console
sm_estimator.fit({"training": train_data_location}, job_name=training_job_name, wait=False)

[1mjob name:[0m js-demo-flan-t5-small-2-2023-05-29-03-24-42-284


INFO:sagemaker:Creating training-job with name: js-demo-flan-t5-small-2-2023-05-29-03-24-42-284


Performance metrics such as training and validation loss can be accessed through CloudWatch during training. We can also fetch the most recent snapshot of metrics as follows.

In [14]:
from sagemaker import TrainingJobAnalytics

# This can be called while the job is still running
df = TrainingJobAnalytics(training_job_name=training_job_name).dataframe()
df.head(10)



Unnamed: 0,timestamp,metric_name,value
0,0.0,val_loss,2.475903
1,0.0,epoch,1.666667


### 3. Deploying inference endpoints

Remainder of the notebook should be executed once the training job is successfully completed. Recall that variable `training_job_name` contains job name and `output_location` points to an S3 location with a fine-tuned model artifact.

We will create two inference endpoints, one for the original pre-trained model, and one for the fine-tuned model. We will then run the same request against the two endpoints and compare the results.

Note that each endpoint deployment can take a few minutes.

In [15]:
from sagemaker import image_uris

# Retrieve the inference docker image URI. This is the base HuggingFace container image
deploy_image_uri = image_uris.retrieve(
    region=aws_region,
    framework=None,  # automatically inferred from model_id
    model_id=model_id,
    model_version=model_version,
    image_scope="inference",
    instance_type=inference_instance_type,
)

In [16]:
from sagemaker import model_uris, script_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

# Retrieve the URI of the pre-trained model
pre_trained_model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

pre_trained_name = name_from_base(f"jumpstart-demo-pre-trained-{model_id}")

# Create the SageMaker model instance of the pre-trained model
if ("small" in model_id) or ("base" in model_id):
    deploy_source_uri = script_uris.retrieve(
        model_id=model_id, model_version=model_version, script_scope="inference"
    )
    pre_trained_model = Model(
        image_uri=deploy_image_uri,
        source_dir=deploy_source_uri,
        entry_point="inference.py",
        model_data=pre_trained_model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=pre_trained_name,
    )
else:
    # For those large models, we already repack the inference script and model
    # artifacts for you, so the `source_dir` argument to Model is not required.
    pre_trained_model = Model(
        image_uri=deploy_image_uri,
        model_data=pre_trained_model_uri,
        role=aws_role,
        predictor_cls=Predictor,
        name=pre_trained_name,
    )

print(f"{bold}image URI:{unbold}{newline} {deploy_image_uri}")
print(f"{bold}model URI:{unbold}{newline} {pre_trained_model_uri}")
print("Deploying an endpoint ...")

# Deploy the pre-trained model. Note that we need to pass Predictor class when we deploy model
# through Model class, for being able to run inference through the SageMaker API
pre_trained_predictor = pre_trained_model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=pre_trained_name,
)
print(f"{newline}Deployed an endpoint {pre_trained_name}")

[1mimage URI:[0m
 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
[1mmodel URI:[0m
 s3://jumpstart-cache-prod-us-east-1/huggingface-infer/prepack/v1.0.1/infer-prepack-huggingface-text2text-flan-t5-small.tar.gz
Deploying an endpoint ...


INFO:sagemaker:Creating model with name: jumpstart-demo-pre-trained-huggingface--2023-05-29-03-37-21-524
INFO:sagemaker:Creating endpoint-config with name jumpstart-demo-pre-trained-huggingface--2023-05-29-03-37-21-524
INFO:sagemaker:Creating endpoint with name jumpstart-demo-pre-trained-huggingface--2023-05-29-03-37-21-524


-------!
Deployed an endpoint jumpstart-demo-pre-trained-huggingface--2023-05-29-03-37-21-524


In [17]:
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base

fine_tuned_name = name_from_base(f"jumpstart-demo-fine-tuned-{model_id}")
fine_tuned_model_uri = f"{output_location}{training_job_name}/output/model.tar.gz"

# Create the SageMaker model instance of the fine-tuned model
fine_tuned_model = Model(
    image_uri=deploy_image_uri,
    model_data=fine_tuned_model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=fine_tuned_name,
)

print(f"{bold}image URI:{unbold}{newline} {deploy_image_uri}")
print(f"{bold}model URI:{unbold}{newline} {fine_tuned_model_uri}")
print("Deploying an endpoint ...")

# Deploy the fine-tuned model.
fine_tuned_predictor = fine_tuned_model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=fine_tuned_name,
)
print(f"{newline}Deployed an endpoint {fine_tuned_name}")

INFO:sagemaker:Creating model with name: jumpstart-demo-fine-tuned-huggingface-t-2023-05-29-03-42-28-773


[1mimage URI:[0m
 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
[1mmodel URI:[0m
 s3://sagemaker-us-east-1-509957658284/demo-fine-tune-flan-t5/js-demo-flan-t5-small-2-2023-05-29-03-24-42-284/output/model.tar.gz
Deploying an endpoint ...


INFO:sagemaker:Creating endpoint-config with name jumpstart-demo-fine-tuned-huggingface-t-2023-05-29-03-42-28-773
INFO:sagemaker:Creating endpoint with name jumpstart-demo-fine-tuned-huggingface-t-2023-05-29-03-42-28-773


-------!
Deployed an endpoint jumpstart-demo-fine-tuned-huggingface-t-2023-05-29-03-42-28-773


### 4. Running inference queries

As the name suggests, a Text2Text model such as FLAN T5 receives a piece of text as input, and generates text as output. The input text will contain the description of the task. In this demo, our task is to generate questions given a piece of text. The questions must be relevant to the text, but the text should contain no answer. Such a task could arise when automating gathering additional information, or identifying gaps in technical documentation.

In [18]:
prompt = "Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}"

# Sources: Wikipedia, AWS Documentation
test_paragraphs = [
    """
Adelaide is the capital city of South Australia, the state's largest city and the fifth-most populous city in Australia. "Adelaide" may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre. The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language.
Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the north to Sellicks Beach in the south.
""",
    """
Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes behave like raw, unformatted block devices. You can mount these volumes as devices on your instances. EBS volumes that are attached to an instance are exposed as storage volumes that persist independently from the life of the instance. You can create a file system on top of these volumes, or use them in any way you would use a block device (such as a hard drive). You can dynamically change the configuration of a volume attached to an instance.
We recommend Amazon EBS for data that must be quickly accessible and requires long-term persistence. EBS volumes are particularly well-suited for use as the primary storage for file systems, databases, or for any applications that require fine granular updates and access to raw, unformatted, block-level storage. Amazon EBS is well suited to both database-style applications that rely on random reads and writes, and to throughput-intensive applications that perform long, continuous reads and writes.
""",
    """
Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases. 
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition. 
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input. 
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend's Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.
""",
]

In [19]:
import boto3
import json

# Parameters of (output) text generation. A great introduction to generation
# parameters can be found at https://huggingface.co/blog/how-to-generate
parameters = {
    "max_length": 40,  # restrict the length of the generated text
    "num_return_sequences": 5,  # we will inspect several model outputs
    "num_beams": 10,  # use beam search
}


# Helper functions for running inference queries
def query_endpoint_with_json_payload(payload, endpoint_name):
    encoded_json = json.dumps(payload).encode("utf-8")
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
    )
    return response


def parse_response_multiple_texts(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    generated_text = model_predictions["generated_texts"]
    return generated_text


def generate_questions(endpoint_name, text):
    expanded_prompt = prompt.replace("{context}", text)
    payload = {"text_inputs": expanded_prompt, **parameters}
    query_response = query_endpoint_with_json_payload(payload, endpoint_name=endpoint_name)
    generated_texts = parse_response_multiple_texts(query_response)
    for i, generated_text in enumerate(generated_texts):
        print(f"Response {i}: {generated_text}{newline}")

In [20]:
print(f"{bold}Prompt:{unbold} {repr(prompt)}")
for paragraph in test_paragraphs:
    print("-" * 80)
    print(paragraph)
    print("-" * 80)
    print(f"{bold}pre-trained{unbold}")
    generate_questions(pre_trained_name, paragraph)
    print(f"{bold}fine-tuned{unbold}")
    generate_questions(fine_tuned_name, paragraph)

[1mPrompt:[0m 'Ask a question which is related to the following text, but cannot be answered based on the text. Text: {context}'
--------------------------------------------------------------------------------

Adelaide is the capital city of South Australia, the state's largest city and the fifth-most populous city in Australia. "Adelaide" may refer to either Greater Adelaide (including the Adelaide Hills) or the Adelaide city centre. The demonym Adelaidean is used to denote the city and the residents of Adelaide. The Traditional Owners of the Adelaide region are the Kaurna people. The area of the city centre and surrounding parklands is called Tarndanya in the Kaurna language.
Adelaide is situated on the Adelaide Plains north of the Fleurieu Peninsula, between the Gulf St Vincent in the west and the Mount Lofty Ranges in the east. Its metropolitan area extends 20 km (12 mi) from the coast to the foothills of the Mount Lofty Ranges, and stretches 96 km (60 mi) from Gawler in the nor

The pre-trained model was not specifically trained to generate unanswerable questions. Despite the input prompt, it tends to generate questions that can be answered from the text. The fine-tuned model is generally better at this task, and the improvement is more prominent for larger models (e.g., xl rather than base).

### 5. Cleaning up resources

In [21]:
# Delete resources
pre_trained_predictor.delete_model()
pre_trained_predictor.delete_endpoint()
fine_tuned_predictor.delete_model()
fine_tuned_predictor.delete_endpoint()

INFO:sagemaker:Deleting model with name: jumpstart-demo-pre-trained-huggingface--2023-05-29-03-37-21-524
INFO:sagemaker:Deleting endpoint configuration with name: jumpstart-demo-pre-trained-huggingface--2023-05-29-03-37-21-524
INFO:sagemaker:Deleting endpoint with name: jumpstart-demo-pre-trained-huggingface--2023-05-29-03-37-21-524
INFO:sagemaker:Deleting model with name: jumpstart-demo-fine-tuned-huggingface-t-2023-05-29-03-42-28-773
INFO:sagemaker:Deleting endpoint configuration with name: jumpstart-demo-fine-tuned-huggingface-t-2023-05-29-03-42-28-773
INFO:sagemaker:Deleting endpoint with name: jumpstart-demo-fine-tuned-huggingface-t-2023-05-29-03-42-28-773
