{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fine tune a PyTorch BERT model and deploy it with Elastic Inference on Amazon SageMaker\n",
    "\n",
    "Text classification is a technique for putting text into different categories and has a wide range of applications: email providers use text classification to detect to spam emails, marketing agencies use it for sentiment analysis of customer reviews, and moderators of discussion forums use it to detect inappropriate comments.\n",
    "\n",
    "In the past, data scientists used methods such as [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf), [word2vec](https://en.wikipedia.org/wiki/Word2vec), or [bag-of-words (BOW)](https://en.wikipedia.org/wiki/Bag-of-words_model) to generate features for training classification models. While these techniques have been very successful in many NLP tasks, they don't always capture the meanings of words accurately when they appear in different contexts. Recently, we see increasing interest in using Bidirectional Encoder Representations from Transformers (BERT) to achieve better results in text classification tasks, due to its ability more accurately encode the meaning of words in different contexts.\n",
    "\n",
    "BERT was trained on BookCorpus and English Wikipedia data, which contain 800 million words and 2,500 million words, respectively. Training BERT from scratch would be prohibitively expensive. By taking advantage of transfer learning, one can quickly fine tune BERT for another use case with a relatively small amount of training data to achieve state-of-the-art results for common NLP tasks, such as text classification and question answering. \n",
    "\n",
    "Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models. The SageMaker Python SDK provides open source APIs and containers that make it easy to train and deploy models in Amazon SageMaker with several different machine learning and deep learning frameworks.\n",
    "\n",
    "Our customers often ask for quick fine-tuning and easy deployment of their NLP models. Furthermore, customers prefer low inference latency and low model inference cost. [Amazon Elastic Inference](https://aws.amazon.com/machine-learning/elastic-inference) enables attaching GPU-powered inference acceleration to endpoints, reducing the cost of deep learning inference without sacrificing performance.\n",
    "\n",
    "This blog post demonstrates how to use Amazon SageMaker to fine tune a PyTorch BERT model and deploy it with Elastic Inference. This work is inspired by a post by [Chris McCormick and Nick Ryan](https://mccormickml.com/2019/07/22/BERT-fine-tuning).\n",
    "\n",
    "In this example, we walk through our dataset, the training process, and finally model deployment. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup\n",
    "\n",
    "To start, we import some Python libraries and initialize a SageMaker session, S3 bucket and prefix, and IAM role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# need torch 1.3.1 for elastic inference\n",
    "!pip install torch==1.3.1\n",
    "!pip install transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import sagemaker\n",
    "\n",
    "sagemaker_session = sagemaker.Session()\n",
    "\n",
    "bucket = sagemaker_session.default_bucket()\n",
    "prefix = \"sagemaker/DEMO-pytorch-bert\"\n",
    "\n",
    "role = sagemaker.get_execution_role()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Prepare training data\n",
    "\n",
    "We use Corpus of Linguistic Acceptability (CoLA) (https://nyu-mll.github.io/CoLA/), a dataset of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. We download and unzip the data using the following code:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not os.path.exists(\"./cola_public_1.1.zip\"):\n",
    "    !curl -o ./cola_public_1.1.zip https://nyu-mll.github.io/CoLA/cola_public_1.1.zip\n",
    "if not os.path.exists(\"./cola_public/\"):\n",
    "    !unzip cola_public_1.1.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get sentences and labels\n",
    "\n",
    "Let us take a quick look at our data. First we read in the training data. The only two columns we need are the sentence itself and its label. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv(\n",
    "    \"./cola_public/raw/in_domain_train.tsv\",\n",
    "    sep=\"\\t\",\n",
    "    header=None,\n",
    "    usecols=[1, 3],\n",
    "    names=[\"label\", \"sentence\"],\n",
    ")\n",
    "sentences = df.sentence.values\n",
    "labels = df.label.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Printing out a few sentences shows us how sentences are labeled based on their grammatical completeness. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(sentences[20:25])\n",
    "print(labels[20:25])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then split the dataset for training and testing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "train, test = train_test_split(df)\n",
    "train.to_csv(\"./cola_public/train.csv\", index=False)\n",
    "test.to_csv(\"./cola_public/test.csv\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we upload both to Amazon S3 for use later. The SageMaker Python SDK provides a helpful function for uploading to Amazon S3:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inputs_train = sagemaker_session.upload_data(\"./cola_public/train.csv\", bucket=bucket, key_prefix=prefix)\n",
    "inputs_test = sagemaker_session.upload_data(\"./cola_public/test.csv\", bucket=bucket, key_prefix=prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training script\n",
    "\n",
    "We use the [PyTorch-Transformers library](https://pytorch.org/hub/huggingface_pytorch-transformers), which contains PyTorch implementations and pre-trained model weights for many NLP models, including BERT.\n",
    "\n",
    "Our training script should save model artifacts learned during training to a file path called `model_dir`, as stipulated by the SageMaker PyTorch image. Upon completion of training, model artifacts saved in `model_dir` will be uploaded to S3 by SageMaker and will become available in S3 for deployment.\n",
    "\n",
    "We save this script in a file named `train_deploy.py`, and put the file in a directory named `code/`. The full training script can be viewed under `code/`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pygmentize code/train_deploy.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train on Amazon SageMaker\n",
    "\n",
    "We use Amazon SageMaker to train and deploy a model using our custom PyTorch code. The Amazon SageMaker Python SDK makes it easier to run a PyTorch script in Amazon SageMaker using its PyTorch estimator. After that, we can use the SageMaker Python SDK to deploy the trained model and run predictions. For more information on how to use this SDK with PyTorch, see [the SageMaker Python SDK documentation](https://sagemaker.readthedocs.io/en/stable/using_pytorch.html).\n",
    "\n",
    "To start, we use the `PyTorch` estimator class to train our model. When creating our estimator, we make sure to specify a few things:\n",
    "\n",
    "* `entry_point`: the name of our PyTorch script. It contains our training script, which loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model. It also contains code to load and run the model during inference.\n",
    "* `source_dir`: the location of our training scripts and requirements.txt file. \"requirements.txt\" lists packages you want to use with your script.\n",
    "* `framework_version`: the PyTorch version we want to use\n",
    "\n",
    "The PyTorch estimator supports multi-machine, distributed PyTorch training. To use this, we just set train_instance_count to be greater than one. Our training script supports distributed training for only GPU instances. \n",
    "\n",
    "After creating the estimator, we then call fit(), which launches a training job. We use the Amazon S3 URIs where we uploaded the training data earlier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from sagemaker.pytorch import PyTorch\n",
    "\n",
    "# place to save model artifact\n",
    "output_path = f\"s3://{bucket}/{prefix}\"\n",
    "\n",
    "estimator = PyTorch(\n",
    "    entry_point=\"train_deploy.py\",\n",
    "    source_dir=\"code\",\n",
    "    role=role,\n",
    "    framework_version=\"1.3.1\",\n",
    "    py_version=\"py3\",\n",
    "    instance_count=2,  # this script only support distributed training for GPU instances.\n",
    "    instance_type=\"ml.p3.2xlarge\",\n",
    "    output_path=output_path,\n",
    "    hyperparameters={\n",
    "        \"epochs\": 1,\n",
    "        \"num_labels\": 2,\n",
    "        \"backend\": \"gloo\",\n",
    "    },\n",
    "    disable_profiler=True, # disable debugger\n",
    ")\n",
    "estimator.fit({\"training\": inputs_train, \"testing\": inputs_test})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Host"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training our model, we host it on an Amazon SageMaker Endpoint. To make the endpoint load the model and serve predictions, we implement a few methods in `train_deploy.py`.\n",
    "\n",
    "* `model_fn()`: function defined to load the saved model and return a model object that can be used for model serving. The SageMaker PyTorch model server loads our model by invoking model_fn.\n",
    "* `input_fn()`: deserializes and prepares the prediction input. In this example, our request body is first serialized to JSON and then sent to model serving endpoint. Therefore, in `input_fn()`, we first deserialize the JSON-formatted request body and return the input as a `torch.tensor`, as required for BERT.\n",
    "* `predict_fn()`: performs the prediction and returns the result.\n",
    "\n",
    "To deploy our endpoint, we call `deploy()` on our PyTorch estimator object, passing in our desired number of instances and instance type:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor = estimator.deploy(initial_instance_count=1, instance_type=\"ml.m4.xlarge\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then configure the predictor to use `application/json` for the content type when sending requests to our endpoint:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.serializer = sagemaker.serializers.JSONSerializer()\n",
    "predictor.deserializer = sagemaker.deserializers.JSONDeserializer()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we use the returned predictor object to call the endpoint:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = predictor.predict(\"Somebody just left - guess who.\")\n",
    "print(\"predicted class: \", np.argmax(result, axis=1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see the predicted class is 1 as expected because test sentence is a grammatically correct sentence. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before moving on, let's delete the Amazon SageMaker endpoint to avoid charges:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use a pretrained model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to reuse pretrained model, you can create a `PyTorchModel` from existing model artifacts. For example,\n",
    "we can retrieve model artifacts we just trained. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_data = estimator.model_data\n",
    "print(model_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from sagemaker.pytorch.model import PyTorchModel \n",
    "\n",
    "pytorch_model = PyTorchModel(model_data=model_data,\n",
    "                             role=role,\n",
    "                             framework_version=\"1.3.1\",\n",
    "                             source_dir=\"code\",\n",
    "                             py_version=\"py3\",\n",
    "                             entry_point=\"train_deploy.py\")\n",
    "\n",
    "predictor = pytorch_model.deploy(initial_instance_count=1, instance_type=\"ml.m4.xlarge\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.serializer = sagemaker.serializers.JSONSerializer()\n",
    "predictor.deserializer = sagemaker.deserializers.JSONDeserializer()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = predictor.predict(\"Remember to delete me when are done\")\n",
    "print(\"predicted class: \", np.argmax(result, axis=1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# batch inference \n",
    "result = predictor.predict([\n",
    "    \"This is how you do batch inference\", \n",
    "    \"Put several sentences in a list\",\n",
    "    \"Make sure they are shorter than 64 words\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Predicted class: \", np.argmax(result, axis=1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Elastic Inference\n",
    "\n",
    "Selecting the right instance type for inference requires deciding between different amounts of GPU, CPU, and memory resources, and optimizing for one of these resources on a standalone GPU instance usually leads to under-utilization of other resources. [Amazon Elastic Inference](https://aws.amazon.com/machine-learning/elastic-inference/) solves this problem by enabling us to attach the right amount of GPU-powered inference acceleration to our endpoint. In March 2020, [Elastic Inference support for PyTorch became available](https://aws.amazon.com/blogs/machine-learning/reduce-ml-inference-costs-on-amazon-sagemaker-for-pytorch-models-using-amazon-elastic-inference/) for both Amazon SageMaker and Amazon EC2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To use Elastic Inference, we must convert our trained model to TorchScript. The location of the model artifacts is `estimator.model_data`. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we create a folder to save model trained model, and download the `model.tar.gz` file to local directory. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sh -s $estimator.model_data\n",
    "mkdir model\n",
    "aws s3 cp $1 model/ \n",
    "tar xvzf model/model.tar.gz --directory ./model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code converts our model into the TorchScript format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import subprocess\n",
    "import torch\n",
    "from transformers import BertForSequenceClassification\n",
    "\n",
    "model_torchScript = BertForSequenceClassification.from_pretrained(\"model/\", torchscript=True)\n",
    "device = \"cpu\"\n",
    "# max length for the sentences: 64\n",
    "max_len = 64\n",
    "\n",
    "for_jit_trace_input_ids = [0] * max_len\n",
    "for_jit_trace_attention_masks = [0] * max_len\n",
    "for_jit_trace_input = torch.tensor([for_jit_trace_input_ids])\n",
    "for_jit_trace_masks = torch.tensor([for_jit_trace_input_ids])\n",
    "\n",
    "traced_model = torch.jit.trace(\n",
    "    model_torchScript, [for_jit_trace_input.to(device), for_jit_trace_masks.to(device)]\n",
    ")\n",
    "torch.jit.save(traced_model, \"traced_bert.pt\")\n",
    "\n",
    "subprocess.call([\"tar\", \"-czvf\", \"traced_bert.tar.gz\", \"traced_bert.pt\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Loading the TorchScript model and using it for prediction require small changes in our model loading and prediction functions. We create a new script `deploy_ei.py` that is slightly different from `train_deploy.py` script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pygmentize code/deploy_ei.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we upload TorchScript model to S3 and deploy using Elastic Inference. The accelerator_type=`ml.eia2.xlarge` parameter is how we attach the Elastic Inference accelerator to our endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.pytorch import PyTorchModel\n",
    "\n",
    "instance_type = 'ml.m5.large'\n",
    "accelerator_type = 'ml.eia2.xlarge'\n",
    "\n",
    "# TorchScript model\n",
    "tar_filename = 'traced_bert.tar.gz'\n",
    "\n",
    "# Returns S3 bucket URL\n",
    "print('Upload tarball to S3')\n",
    "model_data = sagemaker_session.upload_data(path=tar_filename, bucket=bucket, key_prefix=prefix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "endpoint_name = 'bert-ei-traced-{}-{}-{}'.format(instance_type, \n",
    "                                                 accelerator_type, time.time()).replace('.', '').replace('_', '')\n",
    "\n",
    "pytorch = PyTorchModel(\n",
    "    model_data=model_data,\n",
    "    role=role,\n",
    "    entry_point='deploy_ei.py',\n",
    "    source_dir='code',\n",
    "    framework_version='1.3.1',\n",
    "    py_version='py3',\n",
    "    sagemaker_session=sagemaker_session\n",
    ")\n",
    "\n",
    "# Function will exit before endpoint is finished creating\n",
    "predictor = pytorch.deploy(\n",
    "    initial_instance_count=1,\n",
    "    instance_type=instance_type,\n",
    "    accelerator_type=accelerator_type,\n",
    "    endpoint_name=endpoint_name,\n",
    "    wait=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.serializer = sagemaker.serializers.JSONSerializer()\n",
    "predictor.deserializer = sagemaker.deserializers.JSONDeserializer()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res = predictor.predict('Please remember to delete me when you are done.')\n",
    "print(\"Predicted class:\", np.argmax(res, axis=1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cleanup\n",
    "\n",
    "Lastly, please remember to delete the Amazon SageMaker endpoint to avoid charges:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_pytorch_p36",
   "language": "python",
   "name": "conda_pytorch_p36"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}