# Deploy a BERT model from Hugging Face Model Hub to Amazon SageMaker for a Fill-Mask use case

[Amazon SageMaker](https://aws.amazon.com/sagemaker/) is a fully managed Machine Learning (ML) service that lets you build, train and deploy ML models for any use case with a fully managed infrastructure, tools and workflows. SageMaker has a feature that enables customers to train, fine-tune and run inference using [Hugging Face](https://huggingface.co/) models for [Natural Language Processing (NLP)](https://en.wikipedia.org/wiki/Natural_language_processing) on SageMaker.

This notebook demonstrates how to use the [SageMaker Hugging Face Inference Toolkit](https://github.com/aws/sagemaker-huggingface-inference-toolkit) to deploy [bert-base-uncased](https://huggingface.co/bert-base-uncased) which is a pre-trained [BERT](https://huggingface.co/transformers/v3.0.2/model_doc/bert.html) model for a [Fill-Mask](https://huggingface.co/tasks/fill-mask) use case.

**Note:**

* This notebook should only be run from within a SageMaker notebook instance as it references SageMaker native APIs.
* At the time of writing this notebook, the most relevant latest version of the Jupyter notebook kernel for this notebook was `conda_python3` and this came built-in with SageMaker notebooks.
* This notebook uses CPU based instances for training.
* This notebook will create resources in the same AWS account and in the same region where this notebook is running.

**Table of Contents:**

1. [Complete prerequisites](#Complete%20prerequisites)

 1. [Check and configure access to the Internet](#Check%20and%20configure%20access%20to%20the%20Internet)

 2. [Check and upgrade required software versions](#Check%20and%20upgrade%20required%20software%20versions)
 
 3. [Check and configure security permissions](#Check%20and%20configure%20security%20permissions)

 4. [Organize imports](#Organize%20imports)
 
 5. [Create common objects](#Create%20common%20objects)
 
2. [Perform deployment](#Perform%20deployment)

 1. [Set the deployment parameters](#Set%20the%20deployment%20parameters)
 
 2. [(Optional) Delete previously deployed resources](#(Optional)%20Delete%20previously%20deployed%20resources)
 
 3. [Deploy the model](#Deploy%20the%20model)
 
3. [Send traffic to endpoint](#Send%20traffic%20to%20endpoint)

4. [Cleanup](#Cleanup)

## 1. Complete prerequisites 

Check and complete the prerequisites.

### A. Check and configure access to the Internet 

This notebook requires outbound access to the Internet to download the required software updates. You can either provide direct Internet access (default) or provide Internet access through a VPC. For more information on this, refer [here](https://docs.aws.amazon.com/sagemaker/latest/dg/appendix-notebook-and-internet-access.html).

### B. Check and upgrade required software versions 

This notebook requires:
* [SageMaker Python SDK version 2.x](https://sagemaker.readthedocs.io/en/stable/v2.html)
* [Python 3.8.x](https://www.python.org/downloads/release/python-380/)
* [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)

Note: If you get 'module not found' errors in the following cell, then uncomment the appropriate installation commands and install the modules. Also, uncomment and run the kernel shutdown command. When the kernel comes back, comment out the installation and kernel shutdown commands and run the following cell. Now, you should not see any errors.

In [None]:
import boto3
import IPython
import sagemaker
import sys

"""
Last tested versions:
SageMaker Python SDK version : 2.117.0
Python version : 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) 
[GCC 9.4.0]
Boto3 version : 1.26.15
"""

# Install/upgrade sagemaker, boto3 and tensorflow
#!{sys.executable} -m pip install -U sagemaker boto3
#IPython.Application.instance().kernel.do_shutdown(True)

# Get the current installed version of Sagemaker SDK, Python and Boto3
print('SageMaker Python SDK version : {}'.format(sagemaker.__version__))
print('Python version : {}'.format(sys.version))
print('Boto3 version : {}'.format(boto3.__version__))

### C. Check and configure security permissions 

This notebook uses the IAM role attached to the underlying notebook instance. This role should have the following permissions,

1. Full access to deploy models.
2. Access to write to CloudWatch logs and metrics.

To view the name of this role, run the following cell.

In [None]:
print(sagemaker.get_execution_role())

### D. Organize imports 

Organize all the library and module imports for later use.

In [None]:
from sagemaker.huggingface import HuggingFaceModel

### E. Create common objects 

Create common objects to be used in future steps in this notebook.

In [None]:
# Create the SageMaker Boto3 client
sm_client = boto3.client('sagemaker')

# Base name to be used to create resources
nb_name = 'torch-hugging-face-bert-fill-mask'

# Names of various resources
model_name = 'model-{}'.format(nb_name)
endpoint_name = 'endpt-{}'.format(nb_name)

# Hugging Face Model Hub parameters. Refer https://huggingface.co/models
# Here, we will use https://huggingface.co/bert-base-uncased
hf_model_id = 'bert-base-uncased'
hf_task = 'fill-mask'

# Set the inference id
inference_id = 'inf-id-{}'.format(nb_name)

## 2. Perform deployment 

In this step, we will deploy [bert-base-uncased](https://huggingface.co/bert-base-uncased) which is a pre-trained BERT model from the Hugging Face Model Hub using the SageMaker Hugging Face Inference Toolkit. The Hugging Face Task will be Fill-Mask.

In the below code, the parameters specified while creating an instance of HuggingFaceModel class will determine the appropriate Hugging Face Inference container to be used. For a full list, refer [here](https://github.com/aws/deep-learning-containers/blob/master/available_images.md).

### A) Set the deployment parameters 

1. Deployment instance details:

 1. Instance count
 
 2. Instance type
 
 3. The Elastic Inference accelerator type
 
2. Serializer and deserializer.

3. Hugging Face Model Hub configuration for the pre-trained model that is to be deployed.

In [None]:
# Set the instance count, instance type and other parameters
deploy_initial_instance_count = 1
deploy_instance_type = 'ml.m5.xlarge'
deploy_instance_volume_in_gb = 5
accelerator_type = None
serializer = None
deserializer = None
model_data_download_timeout_in_secs = 300
container_startup_health_check_timeout_in_secs = 60

# Hugging Face Model Hub parameters
hub = {
 'HF_MODEL_ID':hf_model_id,
 'HF_TASK':hf_task
}

### B) (Optional) Delete previously deployed resources 

This step deletes the model, endpoint configuration and endpoint. You may want to run this step if you are running only some parts of this notebook especially the step that deploys the model.

Note: You may run into errors if the model, endpoint or endpoint config does not exist. This may be due to partial deletes in the past. In this case, comment out the appropriate lines of the code and run the rest. Alternatively, you can go to the [SageMaker console](https://console.aws.amazon.com/sagemaker/home), switch to the required region and delete these resources.

In [None]:
# Delete the model, endpoint configuration and endpoint
endpoint_config_name = sm_client.describe_endpoint(EndpointName=endpoint_name)['EndpointConfigName']
sm_client.delete_model(ModelName=model_name)
sm_client.delete_endpoint(EndpointName=endpoint_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

### C) Deploy the model 

Deploy the model from the Hugging Face Model Hub to a SageMaker real-time inference endpoint using the parameters specified in the previous step.

Note: This step automatically creates the endpoint configuration before creating the endpoint.

In [None]:
# Create an instance of the HuggingFaceModel class
huggingface_model = HuggingFaceModel(
 name=model_name,
 transformers_version='4.17.0',
 pytorch_version='1.10.2',
 py_version='py38',
 env=hub,
 role=sagemaker.get_execution_role(),
)

In [None]:
# Deploy the model to SageMaker Inference
predictor = huggingface_model.deploy(initial_instance_count=deploy_initial_instance_count,
 instance_type=deploy_instance_type,
 accelerator_type=accelerator_type,
 serializer=serializer,
 deserializer=deserializer,
 endpoint_name=endpoint_name,
 data_capture_config=None,
 async_inference_config=None,
 serverless_inference_config=None,
 volume_size=deploy_instance_volume_in_gb,
 model_data_download_timeout=model_data_download_timeout_in_secs,
 container_startup_health_check_timeout=container_startup_health_check_timeout_in_secs,
 wait=True
 )

## 3. Send traffic to endpoint 

In this step, we will send traffic to the endpoint by calling the `predict()` method on the `predictor` object.

In [None]:
# Set the input data
input_data = {
 "inputs": "Washington DC is the [MASK] of USA."
}

# Invoke the endpoint to get the prediction
predicted_object = predictor.predict(data=input_data,
 initial_args=None,
 target_model=None,
 target_variant=None,
 inference_id=inference_id)
predicted_value = predicted_object[0]

# Print the output prediction
print(predicted_value)

## 4. Cleanup 

As a best practice, you should delete resources when no longer required. This will help you avoid incurring unncessary costs.

This step will cleanup the resources created by this notebook.

In [None]:
# Delete the model, endpoint configuration and endpoint
predictor.delete_model()
predictor.delete_endpoint()