{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Fine-tune FLAN T5 for dialogue summarization\n", "In this notebook we will explore how we can utilize SageMaker to finetune and deploy a Large Language Model on dialogue summarization. We will utilize a number of cutting edge open-source libraries including 🤗 Transformers, 🤗 Accelerate, 🤗 PEFT, and DeepSpeed to fine-tune an 800M pararmater [FLAN-T5-large](https://huggingface.co/docs/transformers/model_doc/flan-t5) language model" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Initial Setup\n", "In this section we'll import the requisite libraries and instantiate a number of objects and variables to configure our training job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker # SageMaker Python SDK\n", "from sagemaker.pytorch import PyTorch # PyTorch Estimator for running pytorch training jobs\n", "from sagemaker.debugger import TensorBoardOutputConfig # Debugger TensorBoard config to log training metrics to TensorBoard\n", "import boto3 # AWS SDK for Python\n", "import os\n", "import tarfile\n", "import pandas as pd\n", "from pathlib import Path" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "role = sagemaker.get_execution_role() # execution role for the endpoint\n", "sess = sagemaker.session.Session() # sagemaker session for interacting with different AWS APIs\n", "bucket = sess.default_bucket() # bucket to house artifacts\n", "model_bucket = sess.default_bucket() # bucket to house artifacts\n", "s3_key_prefix = \"flan-t5-finetune-for-dialogue\" # folder within bucket where code artifact will go\n", "\n", "region = sess._region_name # region name of the current SageMaker Studio environment\n", "account_id = sess.account_id() # account_id of the current SageMaker Studio environment\n", "\n", "s3_client = boto3.client(\"s3\") # client to intreract with S3 API\n", "sm_client = boto3.client(\"sagemaker\") # client to intreract with SageMaker\n", "smr_client = boto3.client(\"sagemaker-runtime\") # client to intreract with SageMaker Endpoints" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Use case and Data Exploration\n", "For this lab we'll utilize the [DialogSum Dataset](https://github.com/cylnlp/dialogsum) which is comprised of over 13K dialogues along with human provided summaries. Our goal is to finetune a model that given a dialogue will automatically generate a summary that captures all of the salient points of the conversation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!wget https://raw.githubusercontent.com/cylnlp/dialogsum/main/DialogSum_Data/dialogsum.train.jsonl -O data/dialogsum.train.jsonl\n", "!wget https://raw.githubusercontent.com/cylnlp/dialogsum/main/DialogSum_Data/dialogsum.test.jsonl -O data/dialogsum.test.jsonl" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load the train and test data into a pandas dataframe\n", "train_data = pd.read_json(\"data/dialogsum.train.jsonl\", lines=True)\n", "test_data = pd.read_json(\"data/dialogsum.test.jsonl\", lines=True)\n", "print (\"Train data shape: \", train_data.shape)\n", "print (\"Test data shape: \", test_data.shape)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at a few examples" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"#####DIALOGUE###### \\n\", train_data[\"dialogue\"][0])\n", "print(\"\\n#####SUMMARY###### \\n\", train_data[\"summary\"][0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Upload the data to S3\n", "# We will only use the training data which we will split into train and validation sets inside the training script\n", "# We will use the test data to evaluate the model after we deploy it\n", "s3_data_path = sess.upload_data(\"data/dialogsum.train.jsonl\", bucket=bucket, key_prefix=f\"{s3_key_prefix}/data\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Configure and Launch SageMaker Training Job\n", "With the data copied to S3, we're now ready to configure our training job.\n", "A distributed training script can not be launched as a normal python script. In a typical SageMaker training job, the training script is launched as typical a python script such a `python train.py --lr 0.1 ..`. However, in a distributed training job, the training script may run across multiple GPUs on a single machine or even multiple GPUs across multiple machines. In this example, we will use a launcher provided by the 🤗 [Accelerate](https://huggingface.co/docs/accelerate/index) library which will launches your code like so `accelerate launch --config_file config.yml train.py --lr 0.1`. \n", "SageMaker provides a number of [built-in launchers](https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html) for distributed training jobs which are configured by specifying a `distributions` parameter in the SageMaker SDK. Accelerate however is not one of the supported launchers. To work around this, we'll create a custom launcher that will launch our training script using the Accelerate launcher. We'll then configure our SageMaker training job to use this custom launcher.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The `accelerate launch` command has two key parts, the `config.yml` file and the `train.py` script. The `config.yml` file is used to configure the distributed training job. The `train.py` script is the training script that will be launched by the launcher. In this example, we'll use the [ds_zero3.yml](src/train/ds_zero3.yaml) configuration file. The config file enables [DeepSpeed ZeRo Stage3](#https://www.deepspeed.ai/tutorials/zero/) and a number of other optimizations to enable training of large scale models. This file was generated by running `accelerate config --config_file ds_zero3.yml` and then following the on-screen prompts. \n", "The [train.py](src/train/train.py) makes use of a number of key libraries to enable training of large models with minimal code changes:\n", "- 🤗 [Accelerate](https://huggingface.co/docs/accelerate/index) - Configures the distributed training environment and adapts training objects (data loaders, models, optimizers) to the distributed environment\n", "- 🤗 [Transformers](https://huggingface.co/docs/transformers/index) - Provides a number of pre-trained models and utilities for training and evaluating models\n", "- 🤗 [PEFT](https://github.com/huggingface/peft) - Provides a number of methods for Parameter Efficient Finetuning(PEFT) of large language models. The [LoRA](https://arxiv.org/pdf/2106.09685.pdf) method will be used to finetune the model\n", "- [DeepSpeed](https://github.com/microsoft/DeepSpeed) - Provides a number of optimizations to enable training of large models. In this example, we'll use DeepSpeed ZeRO Stage3 to enable training of models with over 1B parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# configure the tesnorboard output directly to S3\n", "tensorboard_output_config = TensorBoardOutputConfig(\n", " s3_output_path=f\"s3://{bucket}/{s3_key_prefix}/tensorboard\"\n", ")\n", "\n", "estimator = PyTorch(\n", " source_dir = \"src/train\",\n", " entry_point=\"acc_launcher.py\",\n", " role=role,\n", " instance_count=1, \n", " instance_type=\"ml.g5.12xlarge\", \n", " framework_version=\"1.13\",\n", " py_version=\"py39\",\n", " disable_profiler=True,\n", " tensorboard_output_config=tensorboard_output_config,\n", " hyperparameters={\"training_script\": \"train.py\",\n", " \"config_file\": \"ds_zero3.yaml\",\n", " \"lr\": 3e-3,\n", " \"batch_size\": 2,\n", " \"subsample\": 50, # percent of data to use\n", " \"num_epochs\": 2,\n", " \"pretrained_model_name_or_path\": \"google/flan-t5-large\"\n", " \n", " },\n", " keep_alive_period_in_seconds=3600\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "estimator.fit({\"train\": s3_data_path}, wait=False)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Optional Section\n", "#### Monitor the training with TensorBoard\n", "**Note: You have to wait a few minutes for the job to launch before seeing any logs**\n", "\n", "We can use [TensorBoard](https://www.tensorflow.org/tensorboard), a visualization toolkit for analyzing deep learning models to monitor the progress of the training. Instructions for using TensorBoard with SageMaker Studio can be found [here](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-tensorboard.html). Instructions for accessing TensorBoard in SageMaker Studio are provided below:\n", "1. Open a new terminal in SageMaker Studio by navigating to File->New->Terminal
![](./image/OpenTerminal.JPG)\n", "2. Run the following command in the terminal `pip install tensorboard boto3 tensorflow_io`\n", "3. Run the notebook cell below to generate a command to launch TensorBoard\n", "3. Copy the command and paste it into the terminal and hit Enter\n", "4. Return to the notebook an click the link provided in the bellow cell" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import HTML\n", "cur_dir = os.getcwd().replace(os.environ[\"HOME\"],\"\")\n", "HTML(f'''1. Paste the following command into the Studio Terminal tensorboard --logdir {tensorboard_output_config.s3_output_path}
\n", "2. Click here to open TensorBoard''')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Model Deployment\n", "In this section we'll deploy our model to a SageMaker Endpoint. We'll then use the endpoint to generate summaries for random examples from the test set" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We have to wait for the job to finish before we can deploy the model \n", "estimator.latest_training_job.wait(logs=\"None\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Once the training job has completed, we can deploy the model to a SageMaker Endpoint.\n", "We will use a [Deep learning container for large model inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-large-model-dlc.html) for deployment which is optimized for serving large models in excess of 100B parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We'll need a few additional imports for model deployment\n", "from sagemaker.model import Model\n", "from sagemaker import serializers, deserializers" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This is the docker image that will be used for inference\n", "inference_image_uri = (\n", " f\"763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0.21.0-deepspeed0.8.0-cu117\"\n", ")\n", "print(f\"Image going to be used is ---- > {inference_image_uri}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The next step is to create a model deploymnent packages which will be used to deploy our model to a SageMaker Endpoint. The model deployment package is a tarball that contains the model artifacts, [inference code](src/inference/model.py), and any [additional dependencies](src/inference/requirements.txt) required to run the inference code. We'll go through the following steps to create the model deployment package:\n", "1. Download the trained model artifact from S3 to the local filesystem\n", "2. Cretae a `serving.properties` file that will configure our hosting environment\n", "3. Combine the trained model, the inference code, and the `serving.properties` file into a tarball with the following structure:\n", "```\n", "|-- model.py # inference code\n", "|-- requirements.txt # additional dependencies\n", "|-- serving.properties # configuration file\n", "|-- \\ # model artifacts\n", " |-- config.json\n", " |-- pytorch_model.bin\n", " |-- special_tokens_map.json\n", " |-- tokenizer_config.json\n", " |-- tokenizer.json\n", " |-- vocab.json\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws s3 cp {estimator.model_data} ." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Note that the model was trained using Low Rank Adaptation (LoRA), and as a result the model artifact is small (~10Mb) allowing us to repackage it along with our inference code. At deployment time, the base model will be downloaded from Hugging Face Hub and the LoRA weights will be applied to the base model. For deployment of larger models with LoRA weights, it is recommended to store the based model weights in your own S3 bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# extract the model artifacts into the inference code directory \n", "with tarfile.open(\"model.tar.gz\", \"r:gz\") as tar:\n", " contents = tar.getnames()\n", " model_id = os.path.dirname(contents[-1]) # model id is the name of the folder containing the model files as generated by the training job\n", " tar.extractall(\"src/inference/\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generate the serving.properties file\n", "# We'll use the python engine for inference and specify the model_id for the base model we want to use\n", "with open(\"src/inference/serving.properties\", \"w\") as f:\n", " f.write(\n", "f\"\"\"engine=Python\n", "option.model_id={model_id}\n", " \"\"\"\n", " )" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now we have everything needed to create the model package. We'll combine the contents of the `src/inference` directory with the model artifact and create a tarball. We'll then upload the tarball to S3 and use the S3 URI to deploy the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "cd src/\n", "tar czvf model.tar.gz inference/\n", "mv model.tar.gz ../" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hf_s3_code_artifact = sess.upload_data(\"model.tar.gz\", bucket, f\"{s3_key_prefix}/model\")\n", "print(f\"S3 Code or Model tar ball uploaded to --- > {hf_s3_code_artifact}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def deploy_model(image_uri, model_data, role, endpoint_name, instance_type, sagemaker_session):\n", " \"\"\"Helper function to create the SageMaker Endpoint resources and return a predictor\"\"\"\n", " model = Model(image_uri=image_uri, model_data=model_data, role=role)\n", "\n", " model.deploy(initial_instance_count=1, instance_type=instance_type, endpoint_name=endpoint_name)\n", "\n", " # our requests and responses will be in json format so we specify the serializer and the deserializer\n", " predictor = sagemaker.Predictor(\n", " endpoint_name=endpoint_name,\n", " sagemaker_session=sagemaker_session,\n", " serializer=serializers.JSONSerializer(), # will convert python dict to json\n", " deserializer=deserializers.JSONDeserializer(), # will convert json to python dict\n", " )\n", "\n", " return predictor" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a unique endpoint name\n", "hf_endpoint_name = sagemaker.utils.name_from_base(\"t5-summarization\")\n", "print(f\"Our endpoint will be called {hf_endpoint_name}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# deployment will take 5 to 10 minutes\n", "hf_predictor = deploy_model(\n", " image_uri=inference_image_uri,\n", " model_data=hf_s3_code_artifact,\n", " role=role,\n", " endpoint_name=hf_endpoint_name,\n", " instance_type=\"ml.g4dn.xlarge\",\n", " sagemaker_session=sess,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "With the endpoint deployed, we can generate summaries on dialogues from the test dataset. We'll randomly select an examples and generate summaries. You can also provide your own dialogue to generate summaries just be sure to use the same format as the examples in the train dataset " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from random import randint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "random_dialogue_idx = randint(0, test_data.shape[0])\n", "random_dialogue = test_data[\"dialogue\"][random_dialogue_idx]\n", "\n", "output = hf_predictor.predict({\"inputs\": [random_dialogue], \"parameters\":{\"max_length\": 100}})\n", "output_summary = output[\"outputs\"][0][\"summary_text\"]\n", "\n", "print(\"#####DIALOGUE######\\n\", random_dialogue)\n", "print(\"\\n#####GENERATED SUMMARY######\\n\", output_summary)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# delete the endpoint when finished experimenting\n", "hf_predictor.delete_endpoint()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }