{ "cells": [ { "cell_type": "markdown", "id": "f6cb03f7", "metadata": {}, "source": [ "# FAQ Bot - Q&A model, trained using pairs of questions and answers\n", "\n", "Fine tune a large language model with a list of question and answers. This approach os called Closed Book Q&A because the model doesn't require context and is capable of answering variations of the questions you provide in your dataset.\n", "\n", "This is an evolution of classic ChatBots because LLMs like T5 can disambiguate and generalize better than the old technologies we find in these ChatBots services.\n", "\n", "For that purpose you'll use a **[T5 SMALL SSM ~80MParams](https://huggingface.co/google/t5-small-ssm)** model, accelerated by a trn1 instance ([AWS Trainium](https://aws.amazon.com/machine-learning/trainium/)), running on [Amazon SageMaker](https://aws.amazon.com/sagemaker/).\n", "\n", "You can set the hyperperameter **--model_name** to change the model size. This solution works well with: \n", " - t5-small-ssm\n", " - t5-large-ssm\n", " \n", "If you need to fine tune **t5-3b-ssm, t5-11b-ssm or t5-xxl-ssm**, you need **FSDP**, which is out of the scope of this tutorial.\n", "\n", "You can see the results of the predictions at the end of this notebook. You'll notice the questions sent to the model are not in the training dataset. They are just variations of the questions used to fine tune the model.\n", "\n", "The dataset is the content of all **AWS FAQ** pages, downloaded from: https://aws.amazon.com/faqs/\n", "\n", "This notebook was tested with **Python3.8+**\n", "\n", ">**If you have never before done a SageMaker training job with Trn1, you'll need to do a service level request. This can take a few hours, best to make the request early so you don't have to wait.**\n", "\n", "You can edit this URL to go directly to the page to request the increase:\n", "\n", "`https://<region>.console.aws.amazon.com/servicequotas/home/services/sagemaker/quotas/L-79A1FE57`" ] }, { "cell_type": "markdown", "id": "5cb78690", "metadata": {}, "source": [ "## 1) Install some dependencies\n", "You need a more recent version of **sagemaker** Python Library. After this install you'll need to restart the kernel." ] }, { "cell_type": "code", "execution_count": null, "id": "103ae05a", "metadata": { "scrolled": true }, "outputs": [], "source": [ "# add --force-reinstall if it fails to resolve dependencies\n", "%pip install -U sagemaker" ] }, { "cell_type": "code", "execution_count": null, "id": "d4a951ec", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "print(sagemaker.__version__)\n", "if not sagemaker.__version__ >= \"2.146.0\": print(\"You need to upgrade or restart the kernel if you already upgraded\")\n", "\n", "sess = sagemaker.Session()\n", "role = sagemaker.get_execution_role()\n", "bucket = sess.default_bucket()\n", "region = sess.boto_region_name\n", "\n", "print(f\"sagemaker role arn: {role}\")\n", "print(f\"sagemaker bucket: {bucket}\")\n", "print(f\"sagemaker session region: {region}\")" ] }, { "cell_type": "markdown", "id": "ea60dba6", "metadata": {}, "source": [ "## 2) Visualize and upload the dataset\n", "Take note of the S3 URI here if you get interrupted, no need to reupload later." ] }, { "cell_type": "code", "execution_count": 5, "id": "855c1822", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>service</th>\n", " <th>question</th>\n", " <th>answers</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>/ec2/autoscaling/faqs/</td>\n", " <td>What is Amazon EC2 Auto Scaling?</td>\n", " <td>Amazon EC2 Auto Scaling is a fully managed ser...</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>/ec2/autoscaling/faqs/</td>\n", " <td>When should I use Amazon EC2 Auto Scaling vs. ...</td>\n", " <td>You should use AWS Auto Scaling to manage scal...</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>/ec2/autoscaling/faqs/</td>\n", " <td>How is Predictive Scaling Policy different fro...</td>\n", " <td>Predictive Scaling Policy brings the similar p...</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>/ec2/autoscaling/faqs/</td>\n", " <td>What are the benefits of using Amazon EC2 Auto...</td>\n", " <td>Amazon EC2 Auto Scaling helps to maintain your...</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>/ec2/autoscaling/faqs/</td>\n", " <td>What is fleet management and how is it differe...</td>\n", " <td>If your application runs on Amazon EC2 instanc...</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " service question \\\n", "0 /ec2/autoscaling/faqs/ What is Amazon EC2 Auto Scaling? \n", "1 /ec2/autoscaling/faqs/ When should I use Amazon EC2 Auto Scaling vs. ... \n", "2 /ec2/autoscaling/faqs/ How is Predictive Scaling Policy different fro... \n", "3 /ec2/autoscaling/faqs/ What are the benefits of using Amazon EC2 Auto... \n", "4 /ec2/autoscaling/faqs/ What is fleet management and how is it differe... \n", "\n", " answers \n", "0 Amazon EC2 Auto Scaling is a fully managed ser... \n", "1 You should use AWS Auto Scaling to manage scal... \n", "2 Predictive Scaling Policy brings the similar p... \n", "3 Amazon EC2 Auto Scaling helps to maintain your... \n", "4 If your application runs on Amazon EC2 instanc... " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv('train.csv.gz', compression='gzip', sep=';')\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "df8e1b39", "metadata": {}, "outputs": [], "source": [ "s3_uri = sess.upload_data(path='train.csv.gz', key_prefix='datasets/aws-faq/train')\n", "print(s3_uri)" ] }, { "cell_type": "markdown", "id": "92dd77a3", "metadata": {}, "source": [ "## 3) Prepare the train/inference script" ] }, { "cell_type": "code", "execution_count": null, "id": "b28590cf", "metadata": {}, "outputs": [], "source": [ "import os\n", "if not os.path.isdir('src'): os.mkdir('src')" ] }, { "cell_type": "code", "execution_count": null, "id": "8c266667", "metadata": {}, "outputs": [], "source": [ "## requirements.txt will be used by SageMaker to install\n", "## additional Python packages" ] }, { "cell_type": "code", "execution_count": null, "id": "d3b08830", "metadata": {}, "outputs": [], "source": [ "%%writefile src/requirements.txt\n", "torchvision\n", "transformers==4.27.4" ] }, { "cell_type": "code", "execution_count": null, "id": "27761ac8", "metadata": {}, "outputs": [], "source": [ "!pygmentize src/question_answering.py" ] }, { "cell_type": "markdown", "id": "81e8b7c0", "metadata": {}, "source": [ "## 4) Kick-off our fine tuning job on Amazon SageMaker\n", "We need to create a SageMaker Estimator first and then invoke **.fit**. \n", "\n", "Please, notice we're passing the parameter **checkpoint_s3_uri**. This is important because NeuronSDK will spend some time compiling the model before fine tuning it. The compiler saves the model to cache files and, with this param, the files will be uploaded to **S3**. So, next time we run a job, NeuronSDK can just load back the cache files and start training immediately.\n", "\n", "When training for the first time, the training job takes ~9 hours to process all 60 Epochs on an **trn1.32xlarge**.\n", "\n", "If you need to wait for a quota increase like I did. When you come back, run cell 2 to setup the sagemaker session and S3 uris, etc. Then run the below to get the process started." ] }, { "cell_type": "code", "execution_count": null, "id": "3d7f8c86", "metadata": {}, "outputs": [], "source": [ "from sagemaker.pytorch import PyTorch\n", "\n", "# https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers\n", "image_name=\"pytorch-training-neuronx\"\n", "# We need SDK2.9+ to deal with T5s\n", "image_tag=\"1.13.0-neuronx-py38-sdk2.9.1-ubuntu20.04\"\n", "\n", "estimator = PyTorch(\n", " entry_point=\"question_answering.py\", # Specify your train script\n", " source_dir=\"src\",\n", " role=role,\n", " sagemaker_session=sess,\n", " instance_count=1,\n", " instance_type='ml.trn1.32xlarge', \n", " disable_profiler=True,\n", " output_path=f\"s3://{bucket}/output\",\n", " image_uri=f\"763104351884.dkr.ecr.{region}.amazonaws.com/{image_name}:{image_tag}\",\n", " \n", " # Parameters required to enable checkpointing\n", " # This is necessary for caching XLA HLO files and reduce training time next time \n", " checkpoint_s3_uri=f\"s3://{bucket}/checkpoints\",\n", " volume_size = 512,\n", " distribution={\n", " \"torch_distributed\": {\n", " \"enabled\": True\n", " }\n", " },\n", " hyperparameters={\n", " \"model-name\": \"t5-small-ssm\",\n", " \"lr\": 5e-5,\n", " \"num-epochs\": 60\n", " },\n", " metric_definitions=[\n", " {'Name': 'train:loss', 'Regex': 'loss:(\\S+);'}\n", " ]\n", ")\n", "estimator.framework_version = '1.13.1' # workround when using image_uri" ] }, { "cell_type": "code", "execution_count": null, "id": "1ab4e4ab", "metadata": { "scrolled": true }, "outputs": [], "source": [ "estimator.fit({\"train\": s3_uri})" ] }, { "cell_type": "markdown", "id": "e4f082b4", "metadata": {}, "source": [ "## 5) Deploy our model to a SageMaker endpoint\n", "Here, we're using a pre-defined HuggingFace model class+container to just load our fine tuned model on a CPU based instance: c6i.4xlarge (an Intel Xeon based machine).\n", "\n", ">If you're picking this up later uncomment line 4, fill in the path to your model artifacts, comment line 9 out, and uncomment line 10." ] }, { "cell_type": "code", "execution_count": null, "id": "d90af272", "metadata": {}, "outputs": [], "source": [ "# uncomment and modify this if you're picking this back up later and your training was sucessful.\n", "# you'll need to get the model s3 URI from sagemaker -> Training -> Training Jobs -> <Your job name> -> Output -> S3 model artifact\n", "\n", "# pre_trained_model = YOUR_S3_PATH\n", "from sagemaker.huggingface.model import HuggingFaceModel\n", "\n", "# create Hugging Face Model Class\n", "huggingface_model = HuggingFaceModel(\n", " model_data=estimator.model_data, # path to your model and script\n", " # model_data=pre_trained_model, # path to your model and script\n", " role=role, # iam role with permissions to create an Endpoint\n", " transformers_version=\"4.26.0\", # transformers version used\n", " pytorch_version=\"1.13.1\", # pytorch version used\n", " py_version='py39', # python version used\n", " sagemaker_session=sess,\n", " \n", " # for production it is important to define vpc_config and use a vpc_endpoint\n", " #vpc_config={\n", " # 'Subnets': ['subnet-A-REPLACE', 'subnet-B-REPLACE'],\n", " # 'SecurityGroupIds': ['sg-A-REPLACE', 'sg-B-REPLACE']\n", " #} \n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "23f9c74f", "metadata": {}, "outputs": [], "source": [ "predictor = huggingface_model.deploy(\n", " initial_instance_count=1,\n", " instance_type=\"ml.c6i.4xlarge\",\n", ")" ] }, { "cell_type": "markdown", "id": "a6f12446", "metadata": {}, "source": [ "## 6) Run a quick test" ] }, { "cell_type": "code", "execution_count": 17, "id": "8e393b0f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Q: What is SageMaker?\n", "A: SageMaker is a new ML (ML) service that makes it easy to build, train, and deploy notebook data inference, and deploy and tune models of data. SageMaker helps you build, train, and manage your ML models, and deploy model data to build your models up and down\n", "\n", "Q: What is EC2 AutoScaling?\n", "A: Amazon-based EC2 instancess let you reduce your applications on multiple factors by allowing you to scale your application requirements and costs across multiple instances. Amazoning EC2 instances as a result of optimization in your applications, reducing the number of compute EC and the number of available instances to optimize your\n", "\n", "Q: What are the benefits of autoscaling?\n", "A: You can use autoscaling to help you optimize the capacity of your applications by allowing you to take advantage of your application across multiple applications. Autoscaling allows you to easily scale the number of your applications across multiple devices, and optimize your fleet up or down to 40%. You can also use auto\n", "\n", "CPU times: user 5.16 ms, sys: 0 ns, total: 5.16 ms\n", "Wall time: 1.66 s\n" ] } ], "source": [ "%%time\n", "questions = [\n", " \"What is SageMaker?\",\n", " \"What is EC2 AutoScaling?\",\n", " \"What are the benefits of autoscaling?\"\n", "]\n", "resp = predictor.predict({'inputs': questions})\n", "for q,a in zip(questions, resp['answers']):\n", " print(f\"Q: {q}\\nA: {a}\\n\")" ] }, { "cell_type": "markdown", "id": "10cd7c8a", "metadata": {}, "source": [ "## 7) Clean up\n", "This will delete the model and the endpoint you created" ] }, { "cell_type": "code", "execution_count": null, "id": "b3f1afb2", "metadata": {}, "outputs": [], "source": [ "predictor.delete_model()\n", "predictor.delete_endpoint()" ] } ], "metadata": { "kernelspec": { "display_name": "conda_pytorch_p39", "language": "python", "name": "conda_pytorch_p39" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 5 }