{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a4bbc1ef",
   "metadata": {},
   "source": [
    "# Deploy a BERT model from Hugging Face Model Hub to Amazon SageMaker for a Fill-Mask use case\n",
    "\n",
    "[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a fully managed Machine Learning (ML) service that lets you build, train and deploy ML models for any use case with a fully managed infrastructure, tools and workflows.  SageMaker has a feature that enables customers to train, fine-tune and run inference using [Hugging Face](https://huggingface.co/) models for [Natural Language Processing (NLP)](https://en.wikipedia.org/wiki/Natural_language_processing) on SageMaker.\n",
    "\n",
    "This notebook demonstrates how to use the [SageMaker Hugging Face Inference Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit) to deploy [bert-base-uncased](https://huggingface.co/bert-base-uncased) which is a pre-trained [BERT](https://huggingface.co/transformers/v3.0.2/model_doc/bert.html) model for a [Fill-Mask](https://huggingface.co/tasks/fill-mask) use case.\n",
    "\n",
    "**Note:**\n",
    "\n",
    "* This notebook should only be run from within a SageMaker notebook instance as it references SageMaker native APIs.\n",
    "* At the time of writing this notebook, the most relevant latest version of the Jupyter notebook kernel for this notebook was `conda_python3` and this came built-in with SageMaker notebooks.\n",
    "* This notebook uses CPU based instances for training.\n",
    "* This notebook will create resources in the same AWS account and in the same region where this notebook is running.\n",
    "\n",
    "**Table of Contents:**\n",
    "\n",
    "1. [Complete prerequisites](#Complete%20prerequisites)\n",
    "\n",
    "    1. [Check and configure access to the Internet](#Check%20and%20configure%20access%20to%20the%20Internet)\n",
    "\n",
    "    2. [Check and upgrade required software versions](#Check%20and%20upgrade%20required%20software%20versions)\n",
    "    \n",
    "    3. [Check and configure security permissions](#Check%20and%20configure%20security%20permissions)\n",
    "\n",
    "    4. [Organize imports](#Organize%20imports)\n",
    "    \n",
    "    5. [Create common objects](#Create%20common%20objects)\n",
    "    \n",
    "2. [Perform deployment](#Perform%20deployment)\n",
    "\n",
    "    1. [Set the deployment parameters](#Set%20the%20deployment%20parameters)\n",
    "    \n",
    "    2. [(Optional) Delete previously deployed resources](#(Optional)%20Delete%20previously%20deployed%20resources)\n",
    "    \n",
    "    3. [Deploy the model](#Deploy%20the%20model)\n",
    "    \n",
    "3. [Send traffic to endpoint](#Send%20traffic%20to%20endpoint)\n",
    "\n",
    "4. [Cleanup](#Cleanup)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b19ae078",
   "metadata": {},
   "source": [
    "##  1. Complete prerequisites <a id='Complete%20prerequisites'></a>\n",
    "\n",
    "Check and complete the prerequisites."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2282f8e2",
   "metadata": {},
   "source": [
    "###  A. Check and configure access to the Internet <a id='Check%20and%20configure%20access%20to%20the%20Internet'></a>\n",
    "\n",
    "This notebook requires outbound access to the Internet to download the required software updates.  You can either provide direct Internet access (default) or provide Internet access through a VPC.  For more information on this, refer [here](https://docs.aws.amazon.com/sagemaker/latest/dg/appendix-notebook-and-internet-access.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1758b9f",
   "metadata": {},
   "source": [
    "### B. Check and upgrade required software versions  <a id='Check%20and%20upgrade%20required%20software%20versions'></a>\n",
    "\n",
    "This notebook requires:\n",
    "* [SageMaker Python SDK version 2.x](https://sagemaker.readthedocs.io/en/stable/v2.html)\n",
    "* [Python 3.8.x](https://www.python.org/downloads/release/python-380/)\n",
    "* [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)\n",
    "\n",
    "Note: If you get 'module not found' errors in the following cell, then uncomment the appropriate installation commands and install the modules.  Also, uncomment and run the kernel shutdown command.  When the kernel comes back, comment out the installation and kernel shutdown commands and run the following cell.  Now, you should not see any errors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "194625b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import IPython\n",
    "import sagemaker\n",
    "import sys\n",
    "\n",
    "\"\"\"\n",
    "Last tested versions:\n",
    "SageMaker Python SDK version : 2.117.0\n",
    "Python version : 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) \n",
    "[GCC 9.4.0]\n",
    "Boto3 version : 1.26.15\n",
    "\"\"\"\n",
    "\n",
    "# Install/upgrade sagemaker, boto3 and tensorflow\n",
    "#!{sys.executable} -m pip install -U sagemaker boto3\n",
    "#IPython.Application.instance().kernel.do_shutdown(True)\n",
    "\n",
    "# Get the current installed version of Sagemaker SDK, Python and Boto3\n",
    "print('SageMaker Python SDK version : {}'.format(sagemaker.__version__))\n",
    "print('Python version : {}'.format(sys.version))\n",
    "print('Boto3 version : {}'.format(boto3.__version__))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63f962ca",
   "metadata": {},
   "source": [
    "###  C. Check and configure security permissions <a id='Check%20and%20configure%20security%20permissions'></a>\n",
    "\n",
    "This notebook uses the IAM role attached to the underlying notebook instance.  This role should have the following permissions,\n",
    "\n",
    "1. Full access to deploy models.\n",
    "2. Access to write to CloudWatch logs and metrics.\n",
    "\n",
    "To view the name of this role, run the following cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75927555",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(sagemaker.get_execution_role())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fe85e9d",
   "metadata": {},
   "source": [
    "###  D. Organize imports <a id='Organize%20imports'></a>\n",
    "\n",
    "Organize all the library and module imports for later use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f38f035",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.huggingface import HuggingFaceModel"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d0abc89",
   "metadata": {},
   "source": [
    "###  E. Create common objects <a id='Create%20common%20objects'></a>\n",
    "\n",
    "Create common objects to be used in future steps in this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30bd176e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the SageMaker Boto3 client\n",
    "sm_client = boto3.client('sagemaker')\n",
    "\n",
    "# Base name to be used to create resources\n",
    "nb_name = 'torch-hugging-face-bert-fill-mask'\n",
    "\n",
    "# Names of various resources\n",
    "model_name = 'model-{}'.format(nb_name)\n",
    "endpoint_name = 'endpt-{}'.format(nb_name)\n",
    "\n",
    "# Hugging Face Model Hub parameters.  Refer https://huggingface.co/models\n",
    "# Here, we will use https://huggingface.co/bert-base-uncased\n",
    "hf_model_id = 'bert-base-uncased'\n",
    "hf_task = 'fill-mask'\n",
    "\n",
    "# Set the inference id\n",
    "inference_id = 'inf-id-{}'.format(nb_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "007e7ee5",
   "metadata": {},
   "source": [
    "##  2. Perform deployment <a id='Perform%20deployment'></a>\n",
    "\n",
    "In this step, we will deploy [bert-base-uncased](https://huggingface.co/bert-base-uncased) which is a pre-trained BERT model from the Hugging Face Model Hub using the SageMaker Hugging Face Inference Toolkit.  The Hugging Face Task will be Fill-Mask.\n",
    "\n",
    "In the below code, the parameters specified while creating an instance of HuggingFaceModel class will determine the appropriate Hugging Face Inference container to be used.  For a full list, refer [here](https://github.com/aws/deep-learning-containers/blob/master/available_images.md)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0946c09c",
   "metadata": {},
   "source": [
    "### A) Set the deployment parameters <a id='Set%20the%20deployment%20parameters'></a>\n",
    "\n",
    "1. Deployment instance details:\n",
    "\n",
    "    1. Instance count\n",
    "    \n",
    "    2. Instance type\n",
    "    \n",
    "    3. The Elastic Inference accelerator type\n",
    "    \n",
    "2. Serializer and deserializer.\n",
    "\n",
    "3. Hugging Face Model Hub configuration for the pre-trained model that is to be deployed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e9a707c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the instance count, instance type and other parameters\n",
    "deploy_initial_instance_count = 1\n",
    "deploy_instance_type = 'ml.m5.xlarge'\n",
    "deploy_instance_volume_in_gb = 5\n",
    "accelerator_type = None\n",
    "serializer = None\n",
    "deserializer = None\n",
    "model_data_download_timeout_in_secs = 300\n",
    "container_startup_health_check_timeout_in_secs = 60\n",
    "\n",
    "# Hugging Face Model Hub parameters\n",
    "hub = {\n",
    "  'HF_MODEL_ID':hf_model_id,\n",
    "  'HF_TASK':hf_task\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "739c8cce",
   "metadata": {},
   "source": [
    "### B) (Optional) Delete previously deployed resources <a id='(Optional)%20Delete%20previously%20deployed%20resources'></a>\n",
    "\n",
    "This step deletes the model, endpoint configuration and endpoint.  You may want to run this step if you are running only some parts of this notebook especially the step that deploys the model.\n",
    "\n",
    "Note: You may run into errors if the model, endpoint or endpoint config does not exist.  This may be due to partial deletes in the past.  In this case, comment out the appropriate lines of the code and run the rest.  Alternatively, you can go to the [SageMaker console](https://console.aws.amazon.com/sagemaker/home), switch to the required region and delete these resources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c52c88ee",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete the model, endpoint configuration and endpoint\n",
    "endpoint_config_name = sm_client.describe_endpoint(EndpointName=endpoint_name)['EndpointConfigName']\n",
    "sm_client.delete_model(ModelName=model_name)\n",
    "sm_client.delete_endpoint(EndpointName=endpoint_name)\n",
    "sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b072fe07",
   "metadata": {},
   "source": [
    "### C) Deploy the model <a id='Deploy%20the%20model'></a>\n",
    "\n",
    "Deploy the model from the Hugging Face Model Hub to a SageMaker real-time inference endpoint using the parameters specified in the previous step.\n",
    "\n",
    "Note: This step automatically creates the endpoint configuration before creating the endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d7da426",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an instance of the HuggingFaceModel class\n",
    "huggingface_model = HuggingFaceModel(\n",
    "    name=model_name,\n",
    "    transformers_version='4.17.0',\n",
    "    pytorch_version='1.10.2',\n",
    "    py_version='py38',\n",
    "    env=hub,\n",
    "    role=sagemaker.get_execution_role(),\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e7311d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Deploy the model to SageMaker Inference\n",
    "predictor = huggingface_model.deploy(initial_instance_count=deploy_initial_instance_count,\n",
    "                                     instance_type=deploy_instance_type,\n",
    "                                     accelerator_type=accelerator_type,\n",
    "                                     serializer=serializer,\n",
    "                                     deserializer=deserializer,\n",
    "                                     endpoint_name=endpoint_name,\n",
    "                                     data_capture_config=None,\n",
    "                                     async_inference_config=None,\n",
    "                                     serverless_inference_config=None,\n",
    "                                     volume_size=deploy_instance_volume_in_gb,\n",
    "                                     model_data_download_timeout=model_data_download_timeout_in_secs,\n",
    "                                     container_startup_health_check_timeout=container_startup_health_check_timeout_in_secs,\n",
    "                                     wait=True\n",
    "                                    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e03ef14",
   "metadata": {},
   "source": [
    "## 3. Send traffic to endpoint <a id='Send%20traffic%20to%20endpoint'></a>\n",
    "\n",
    "In this step, we will send traffic to the endpoint by calling the `predict()` method on the `predictor` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bea0f311",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the input data\n",
    "input_data = {\n",
    "    \"inputs\": \"Washington DC is the [MASK] of USA.\"\n",
    "}\n",
    "\n",
    "# Invoke the endpoint to get the prediction\n",
    "predicted_object = predictor.predict(data=input_data,\n",
    "                                     initial_args=None,\n",
    "                                     target_model=None,\n",
    "                                     target_variant=None,\n",
    "                                     inference_id=inference_id)\n",
    "predicted_value = predicted_object[0]\n",
    "\n",
    "# Print the output prediction\n",
    "print(predicted_value)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f68650cd",
   "metadata": {},
   "source": [
    "## 4. Cleanup <a id='Cleanup'></a>\n",
    "\n",
    "As a best practice, you should delete resources when no longer required.  This will help you avoid incurring unncessary costs.\n",
    "\n",
    "This step will cleanup the resources created by this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c50a0b5b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete the model, endpoint configuration and endpoint\n",
    "predictor.delete_model()\n",
    "predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c8a4b57",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}