# Deploy Transformers model using SageMaker Hugging Face Deep Learning Container


1. [Introduction](#Introduction) 
2. [Setup](#Setup)
3. [Download Hugging Face Pretrained Model](#Download-Hugging-Face-Pretrained-Model)
4. [Package the saved model to tar.gz format](#Package-the-saved-model-to-tar.gz-format)
5. [Upload Pre-trained Model to S3](#Upload-Pre-trained-Model-to-S3)
6. [Deploy the model using `model_data`](#Deploy-the-model-using-model_data) 
7. [Prediction with Amazon SageMaker endpoint](#Prediction-with-Amazon-SageMaker-endpoint)


## Introduction

For inference, you can use your trained Hugging Face model or one of the pretrained Hugging Face models to deploy an inference job with SageMaker. 

### How to deploy an inference job using the Hugging Face Deep Learning Containers
You have two options for running inference with SageMaker. You can run inference using a model that you trained, or deploy a pre-trained Hugging Face model.

* Run inference with your trained model: You have two options for running inference with your own trained model. You can run inference with a model that you trained using an existing Hugging Face model with the SageMaker Hugging Face Deep Learning Containers, or you can bring your own existing Hugging Face model and deploy it using SageMaker. When you run inference with a model that you trained with the SageMaker Hugging Face Estimator, you can deploy the model immediately after training completes or you can upload the trained model to an Amazon S3 bucket and ingest it when running inference later. If you bring your own existing Hugging Face model, you must upload the trained model to an Amazon S3 bucket and ingest that bucket when running inference.

* Run inference with a pre-trained HuggingFace model: You can use one of the thousands of pre-trained Hugging Face models to run your inference jobs with no additional training needed. We will see this in our lab today.

In this notebook, we will use the `Hugging Face Inference DLCs and Amazon SageMaker Python SDK` to deploy a pre-trained transformer model for real-time inference. We will perform following steps:

## Setup

In [None]:
!pip install "sagemaker>=2.48.0" --upgrade
!pip install torch -q
!pip install transformers -q
!pip install ipywidgets -q

#### **Note: Restart the notebook after installing the above packages.**

In [None]:
from IPython.display import display_html
def restartkernel() :
 display_html("",raw=True)

In [None]:
restartkernel()

## Deploy a Hugging Face Transformer model from S3 to SageMaker for inference


### Download Hugging Face Pretrained Model

In this example we are downloading a pre-trained HuggingFace model - `facebook/bart-large-mnli` from the HuggingFace library. We will use this model for classifying the tweets as `POSTIVE` or `NEGATIVE`.

In [None]:
pip install huggingface-hub

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
MODEL = 'facebook/bart-large-mnli'
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model.save_pretrained('model_token')
tokenizer.save_pretrained('model_token')

### Package the saved model to tar.gz format

Once the model is downloaded, we need to package (tokenizer and model weights) it to `.tar.gz` format as expected by Amazon SageMaker. 

In [None]:
!cd model_token && tar zcvf model.tar.gz * 
!mv model_token/model.tar.gz ./model.tar.gz

### Upload Pre-trained Model to S3

We are going to use the `sagemaker.s3.S3Uploader` api to upload our model to an S3 location. We will provide this s3 path to the `HuggingFaceModel` class during deployment.

In [None]:
import sagemaker
from sagemaker.s3 import S3Uploader,s3_path_join

# get the s3 bucket
sess = sagemaker.Session()
role = sagemaker.get_execution_role()
sagemaker_session_bucket = sess.default_bucket()
# uploads a given file to S3.
upload_path = s3_path_join("s3://",sagemaker_session_bucket,"bart_model")
print(f"Uploading Model to {upload_path}")
model_uri = S3Uploader.upload('model.tar.gz',upload_path)
print(f"Uploaded model to {model_uri}")

In [None]:
%store model_uri

### Deploy the model using `model_data`

We will deploy our pre-trained model which we packaged and uploaded to s3 in the previous steps, using the model_data argument to specify the s3 location of your tokenizer and model weights.

#### Parameters for `HuggingFaceModel` class
We will use following parameters in this lab for deploying the model. 
* `model_data (str)` – The Amazon S3 location of a SageMaker model data .tar.gz file.

* `role (str)` – An AWS IAM role specified with either the name or full ARN. The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role, if it needs to access an AWS resource.

* `transformers_version (str)` – Transformers version you want to use for executing your model training code. Defaults to None. Required unless image_uri is provided.

* `pytorch_version (str)` – PyTorch version you want to use for executing your inference code. Defaults to None. Required unless tensorflow_version is provided. List of supported versions: https://github.com/aws/sagemaker-python-sdk#huggingface-sagemaker-estimators.

* `py_version (str)` – Python version you want to use for executing your model training code. Defaults to None. Required unless image_uri is provided.

For details about other paramets, please click [here](#https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html?highlight=huggingfacemodel#sagemaker.huggingface.model.HuggingFaceModel).

In [None]:
from sagemaker.huggingface import HuggingFaceModel
import sagemaker 

role = sagemaker.get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
 model_data=model_uri, # path to your trained sagemaker model
 role=role, # iam role with permissions to create an Endpoint
 transformers_version="4.17", # transformers version used
 pytorch_version="1.10", # pytorch version used
 py_version="py38", # python version of the DLC
)

In [None]:
# deploy model to SageMaker Inference
import time
ts = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
endpoint_name = "bart-large-blog-" + ts

predictor = huggingface_model.deploy(
 initial_instance_count=1,
 instance_type="ml.g4dn.xlarge",
 endpoint_name=endpoint_name
)

## Prediction with Amazon SageMaker endpoint

In [None]:
# example request, you always need to define "inputs"
data = {
 "inputs": "The new Hugging Face SageMaker DLC makes it super easy to deploy models in production. I love it!"
}

# request
predictor.predict(data)