## Fine-tune FLAN T5 for dialogue summarization
In this notebook we will explore how we can utilize SageMaker to finetune and deploy a Large Language Model on dialogue summarization. We will utilize a number of cutting edge open-source libraries including ðŸ¤— Transformers, ðŸ¤— Accelerate, ðŸ¤— PEFT, and DeepSpeed to fine-tune an 800M pararmater [FLAN-T5-large](https://huggingface.co/docs/transformers/model_doc/flan-t5) language model

## Initial Setup
In this section we'll import the requisite libraries and instantiate a number of objects and variables to configure our training job

In [None]:
import sagemaker                                        # SageMaker Python SDK
from sagemaker.pytorch import PyTorch                   # PyTorch Estimator for running pytorch training jobs
from sagemaker.debugger import TensorBoardOutputConfig  # Debugger TensorBoard config to log training metrics to TensorBoard
import boto3                                            # AWS SDK for Python
import os
import tarfile
import pandas as pd
from pathlib import Path

In [None]:
role = sagemaker.get_execution_role()  # execution role for the endpoint
sess = sagemaker.session.Session()   # sagemaker session for interacting with different AWS APIs
bucket = sess.default_bucket()  # bucket to house artifacts
model_bucket = sess.default_bucket()  # bucket to house artifacts
s3_key_prefix = "flan-t5-finetune-for-dialogue"  # folder within bucket where code artifact will go

region = sess._region_name  # region name of the current SageMaker Studio environment
account_id = sess.account_id()  # account_id of the current SageMaker Studio environment

s3_client = boto3.client("s3")  # client to intreract with S3 API
sm_client = boto3.client("sagemaker")  # client to intreract with SageMaker
smr_client = boto3.client("sagemaker-runtime")  # client to intreract with SageMaker Endpoints

## Use case and Data Exploration
For this lab we'll utilize the [DialogSum Dataset](https://github.com/cylnlp/dialogsum) which is comprised of over 13K dialogues along with human provided summaries. Our goal is to finetune a model that given a dialogue will automatically generate a summary that captures all of the salient points of the conversation

In [None]:
!wget https://raw.githubusercontent.com/cylnlp/dialogsum/main/DialogSum_Data/dialogsum.train.jsonl -O data/dialogsum.train.jsonl
!wget https://raw.githubusercontent.com/cylnlp/dialogsum/main/DialogSum_Data/dialogsum.test.jsonl -O data/dialogsum.test.jsonl

In [None]:
# Load the train and test data into a pandas dataframe
train_data = pd.read_json("data/dialogsum.train.jsonl", lines=True)
test_data = pd.read_json("data/dialogsum.test.jsonl", lines=True)
print ("Train data shape: ", train_data.shape)
print ("Test data shape: ", test_data.shape)

Let's take a look at a few examples

In [None]:
train_data.head()

In [None]:
print("#####DIALOGUE###### \n", train_data["dialogue"][0])
print("\n#####SUMMARY###### \n", train_data["summary"][0])

In [None]:
# Upload the data to S3
# We will only use the training data which we will split into train and validation sets inside the training script
# We will use the test data to evaluate the model after we deploy it
s3_data_path = sess.upload_data("data/dialogsum.train.jsonl", bucket=bucket, key_prefix=f"{s3_key_prefix}/data")

## Configure and Launch SageMaker Training Job
With the data copied to S3, we're now ready to configure our training job.
A distributed training script can not be launched as a normal python script. In a typical SageMaker training job, the training script is launched as typical a python script such a `python train.py --lr 0.1 ..`. However, in a distributed training job, the training script may run across multiple GPUs on a single machine or even multiple GPUs across multiple machines. In this example, we will use a launcher provided by the ðŸ¤— [Accelerate](https://huggingface.co/docs/accelerate/index) library which will launches your code like so `accelerate launch --config_file config.yml train.py --lr 0.1`. 
SageMaker provides a number of [built-in launchers](https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html) for distributed training jobs which are configured by specifying a `distributions` parameter in the SageMaker SDK. Accelerate however is not one of the supported launchers. To work around this, we'll create a custom launcher that will launch our training script using the Accelerate launcher. We'll then configure our SageMaker training job to use this custom launcher.


The `accelerate launch` command has two key parts, the `config.yml` file and the `train.py` script. The `config.yml` file is used to configure the distributed training job. The `train.py` script is the training script that will be launched by the launcher. In this example, we'll use the [ds_zero3.yml](src/train/ds_zero3.yaml) configuration file. The config file enables [DeepSpeed ZeRo Stage3](#https://www.deepspeed.ai/tutorials/zero/) and a number of other optimizations to enable training of large scale models. This file was generated by running `accelerate config --config_file ds_zero3.yml` and then following the on-screen prompts. 
The [train.py](src/train/train.py) makes use of a number of key libraries to enable training of large models with minimal code changes:
- ðŸ¤— [Accelerate](https://huggingface.co/docs/accelerate/index) - Configures the distributed training environment and adapts training objects (data loaders, models, optimizers) to the distributed environment
- ðŸ¤— [Transformers](https://huggingface.co/docs/transformers/index) - Provides a number of pre-trained models and utilities for training and evaluating models
- ðŸ¤— [PEFT](https://github.com/huggingface/peft) - Provides a number of methods for Parameter Efficient Finetuning(PEFT) of large language models. The [LoRA](https://arxiv.org/pdf/2106.09685.pdf) method will be used to finetune the model
- [DeepSpeed](https://github.com/microsoft/DeepSpeed) - Provides a number of optimizations to enable training of large models. In this example, we'll use DeepSpeed ZeRO Stage3 to enable training of models with over 1B parameters

In [None]:

# configure the tesnorboard output directly to S3
tensorboard_output_config = TensorBoardOutputConfig(
    s3_output_path=f"s3://{bucket}/{s3_key_prefix}/tensorboard"
)

estimator = PyTorch(
    source_dir = "src/train",
    entry_point="acc_launcher.py",
    role=role,
    instance_count=1, 
    instance_type="ml.g5.12xlarge", 
    framework_version="1.13",
    py_version="py39",
    disable_profiler=True,
    tensorboard_output_config=tensorboard_output_config,
    hyperparameters={"training_script": "train.py",
                     "config_file": "ds_zero3.yaml",
                     "lr": 3e-3,
                     "batch_size": 2,
                     "subsample": 50, # percent of data to use
                     "num_epochs": 2,
                     "pretrained_model_name_or_path": "google/flan-t5-large"
                     
    },
    keep_alive_period_in_seconds=3600
)

In [None]:
estimator.fit({"train": s3_data_path}, wait=False)

### Optional Section
#### Monitor the training with TensorBoard
**Note: You have to wait a few minutes for the job to launch before seeing any logs**

We can use [TensorBoard](https://www.tensorflow.org/tensorboard), a visualization toolkit for analyzing deep learning models to monitor the progress of the training. Instructions for using TensorBoard with SageMaker Studio can be found [here](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tensorboard.html). Instructions for accessing TensorBoard in SageMaker Studio are provided below:
1. Open a new terminal in SageMaker Studio by navigating to <em>File->New->Terminal <br> ![](./image/OpenTerminal.JPG)
2. Run the following command in the terminal `pip install tensorboard boto3 tensorflow_io`
3. Run the notebook cell below to generate a command to launch TensorBoard
3. Copy the command and paste it into the terminal and hit Enter
4. Return to the notebook an click the link provided in the bellow cell

In [None]:
from IPython.display import HTML
cur_dir = os.getcwd().replace(os.environ["HOME"],"")
HTML(f'''1. Paste the following command into the Studio Terminal <code style="background-color:gray;">tensorboard --logdir {tensorboard_output_config.s3_output_path}</code><br>
2. Click <a href='/jupyter/default/proxy/6006/'>here</a> to open TensorBoard''')

## Model Deployment
In this section we'll deploy our model to a SageMaker Endpoint. We'll then use the endpoint to generate summaries for random examples from the test set

In [None]:
# We have to wait for the job to finish before we can deploy the model 
estimator.latest_training_job.wait(logs="None")

Once the training job has completed, we can deploy the model to a SageMaker Endpoint.
We will use a [Deep learning container for large model inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-large-model-dlc.html) for deployment which is optimized for serving large models in excess of 100B parameters

In [None]:
# We'll need a few additional imports for model deployment
from sagemaker.model import Model
from sagemaker import serializers, deserializers

In [None]:
# This is the docker image that will be used for inference
inference_image_uri = (
    f"763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0.21.0-deepspeed0.8.0-cu117"
)
print(f"Image going to be used is ---- > {inference_image_uri}")

The next step is to create a model deploymnent packages which will be used to deploy our model to a SageMaker Endpoint. The model deployment package is a tarball that contains the model artifacts, [inference code](src/inference/model.py), and any [additional dependencies](src/inference/requirements.txt) required to run the inference code. We'll go through the following steps to create the model deployment package:
1. Download the trained model artifact from S3 to the local filesystem
2. Cretae a `serving.properties` file that will configure our hosting environment
3. Combine the trained model, the inference code, and the `serving.properties` file into a tarball with the following structure:
```
|-- model.py         # inference code
|-- requirements.txt    # additional dependencies
|-- serving.properties  # configuration file
|-- <model_id>\         # model artifacts
    |-- config.json
    |-- pytorch_model.bin
    |-- special_tokens_map.json
    |-- tokenizer_config.json
    |-- tokenizer.json
    |-- vocab.json
```

In [None]:
!aws s3 cp {estimator.model_data} .

Note that the model was trained using Low Rank Adaptation (LoRA), and as a result the model artifact is small (~10Mb) allowing us to repackage it along with our inference code. At deployment time, the base model will be downloaded from Hugging Face Hub and the LoRA weights will be applied to the base model. For deployment of larger models with LoRA weights, it is recommended to store the based model weights in your own S3 bucket.

In [None]:
# extract the model artifacts into the inference code directory 
with tarfile.open("model.tar.gz", "r:gz") as tar:
    contents = tar.getnames()
    model_id = os.path.dirname(contents[-1]) # model id is the name of the folder containing the model files as generated by the training job
    tar.extractall("src/inference/")

In [None]:
# generate the serving.properties file
# We'll use the python engine for inference and specify the model_id for the base model we want to use
with open("src/inference/serving.properties", "w") as f:
    f.write(
f"""engine=Python
option.model_id={model_id}
    """
    )

Now we have everything needed to create the model package. We'll combine the contents of the `src/inference` directory with the model artifact and create a tarball. We'll then upload the tarball to S3 and use the S3 URI to deploy the model.

In [None]:
%%bash
cd src/
tar czvf model.tar.gz inference/
mv model.tar.gz ../

In [None]:
hf_s3_code_artifact = sess.upload_data("model.tar.gz", bucket, f"{s3_key_prefix}/model")
print(f"S3 Code or Model tar ball uploaded to --- > {hf_s3_code_artifact}")

In [None]:
def deploy_model(image_uri, model_data, role, endpoint_name, instance_type, sagemaker_session):
    """Helper function to create the SageMaker Endpoint resources and return a predictor"""
    model = Model(image_uri=image_uri, model_data=model_data, role=role)

    model.deploy(initial_instance_count=1, instance_type=instance_type, endpoint_name=endpoint_name)

    # our requests and responses will be in json format so we specify the serializer and the deserializer
    predictor = sagemaker.Predictor(
        endpoint_name=endpoint_name,
        sagemaker_session=sagemaker_session,
        serializer=serializers.JSONSerializer(),        # will convert python dict to json
        deserializer=deserializers.JSONDeserializer(),  # will convert json to python dict
    )

    return predictor

In [None]:
# create a unique endpoint name
hf_endpoint_name = sagemaker.utils.name_from_base("t5-summarization")
print(f"Our endpoint will be called {hf_endpoint_name}")

In [None]:
# deployment will take 5 to 10 minutes
hf_predictor = deploy_model(
    image_uri=inference_image_uri,
    model_data=hf_s3_code_artifact,
    role=role,
    endpoint_name=hf_endpoint_name,
    instance_type="ml.g4dn.xlarge",
    sagemaker_session=sess,
)

With the endpoint deployed, we can generate summaries on dialogues from the test dataset. We'll randomly select an examples and generate summaries. You can also provide your own dialogue to generate summaries just be sure to use the same format as the examples in the train dataset 

In [None]:
from random import randint

In [None]:
random_dialogue_idx = randint(0, test_data.shape[0])
random_dialogue = test_data["dialogue"][random_dialogue_idx]

output = hf_predictor.predict({"inputs": [random_dialogue], "parameters":{"max_length": 100}})
output_summary = output["outputs"][0]["summary_text"]

print("#####DIALOGUE######\n", random_dialogue)
print("\n#####GENERATED SUMMARY######\n", output_summary)

In [None]:
# delete the endpoint when finished experimenting
hf_predictor.delete_endpoint()