# Deploying GPT-2 and GPT-J

In this notebook, we will be using Hugging Face models and SageMaker Hugging Face-specific API's to deploy both GPT-2 and GPT-J. We will also showcase how to deploy what would could be GPT2 models fine-tuned on different datasets to the same SageMaker instance as a Multi Model Endpoint. This will allow you to get real-time predictions from several models, while only paying for one running endpoint instance.

*****
## Deploying GTP-2 to SageMaker Multi-Model Endpoint

In [None]:
!pip install -U transformers
!pip install -U sagemaker

### Get sagemaker session, role and default bucket
If you are going to use Sagemaker in a local environment (not SageMaker Studio or Notebook Instances), you need access to an IAM Role with the required permissions for Sagemaker. You can find more about this [here](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html).

In [None]:
import sagemaker
import boto3

sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
region = sess.boto_region_name
sm_client = boto3.client('sagemaker')

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {region}")

### Load GPT-2 model and tokenizer, save them to the same folder with Transformers `save_pretrained` utility 

In [None]:
import transformers 
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForCausalLM.from_pretrained('gpt2')

model.save_pretrained('gpt2-model/')
tokenizer.save_pretrained('gpt2-model/')

### Tar model and tokenizer artifacts, upload to S3

In [None]:
import tarfile 

with tarfile.open('gpt2-model.tar.gz', 'w:gz') as f:
    f.add('gpt2-model/',arcname='.')
f.close()

prefix = 'gpt2-hf-workshop/gpt2-test'

Check out the file contents and structure of the model.tar.gz artifact.

In [None]:
! tar -ztvf gpt2-model.tar.gz

We will upload the same model package twice with different names, to simulate deploying 2 models to the same endpoint.

In [None]:
! aws s3 cp gpt2-model.tar.gz s3://"$sagemaker_session_bucket"/"$prefix"/gpt2-model1.tar.gz
! aws s3 cp gpt2-model.tar.gz s3://"$sagemaker_session_bucket"/"$prefix"/gpt2-model2.tar.gz

### Get image URI for Hugging Face inference Deep Learning Container

In [None]:
from sagemaker import image_uris

hf_inference_dlc = image_uris.retrieve(framework='huggingface', 
                                region=region, 
                                version='4.12.3', 
                                image_scope='inference', 
                                base_framework_version='pytorch1.9.1', 
                                py_version='py38', 
                                container_version='ubuntu20.04', 
                                instance_type='ml.m5.xlarge')

### Use `MultiDataModel`to setup a multi-model endpoint definition
By setting the `HF_TASK` environment variable, we avoid having to write and test our own inference code. Depending on the task and model you choose, the Hugging Face inference Container will run the appropriate code by default. 

In [None]:
from sagemaker.multidatamodel import MultiDataModel
from sagemaker.predictor import Predictor

hub = {
    'HF_TASK':'text-generation'
}

mme = MultiDataModel(
    name='gpt2-models',
    model_data_prefix=f's3://{sagemaker_session_bucket}/{prefix}/',
    image_uri=hf_inference_dlc,
    env=hub,
    predictor_cls=Predictor,
    role=role,
    sagemaker_session=sess,
    )

We can see that our model object has already "registered" the model artifacts we uploaded to S3 under the `model_data_prefix`.

In [None]:
for model in mme.list_models():
    print(model)

### Deploy Multi-Model Endpoint and send inference requests to both models

In [None]:
import datetime
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

endpoint_name_gpt2 = 'mme-gpt2-'+datetime.datetime.now().strftime(
                     "%Y-%m-%d-%H-%M-%S"
)

predictor_gpt2 = mme.deploy(
            initial_instance_count=1,
            instance_type='ml.m5.xlarge',
            serializer=JSONSerializer(),
            deserializer=JSONDeserializer(),
            endpoint_name='mme-gpt2',
            wait = False
            )


********************************************************************************************************************************************
********************************************************************************************************************************************


# Deploying GPT-J to SageMaker Endpoint

In [None]:
from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'EleutherAI/gpt-j-6B',
	'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.6.1',
	pytorch_version='1.7.1',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge',
    wait = False# ec2 instance type
)



In [None]:
# predictor.predict({
# 	'inputs': "Can you please let us know more details about your "
# })