{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deploying GPT-2 and GPT-J"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we will be using Hugging Face models and SageMaker Hugging Face-specific API's to deploy both GPT-2 and GPT-J. We will also showcase how to deploy what would could be GPT2 models fine-tuned on different datasets to the same SageMaker instance as a Multi Model Endpoint. This will allow you to get real-time predictions from several models, while only paying for one running endpoint instance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*****\n",
    "## Deploying GTP-2 to SageMaker Multi-Model Endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -U transformers\n",
    "!pip install -U sagemaker"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get sagemaker session, role and default bucket\n",
    "If you are going to use Sagemaker in a local environment (not SageMaker Studio or Notebook Instances), you need access to an IAM Role with the required permissions for Sagemaker. You can find more about this [here](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "\n",
    "sess = sagemaker.Session()\n",
    "# sagemaker session bucket -> used for uploading data, models and logs\n",
    "# sagemaker will automatically create this bucket if it not exists\n",
    "sagemaker_session_bucket=None\n",
    "if sagemaker_session_bucket is None and sess is not None:\n",
    "    # set to default bucket if a bucket name is not given\n",
    "    sagemaker_session_bucket = sess.default_bucket()\n",
    "\n",
    "try:\n",
    "    role = sagemaker.get_execution_role()\n",
    "except ValueError:\n",
    "    iam = boto3.client('iam')\n",
    "    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']\n",
    "\n",
    "sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)\n",
    "region = sess.boto_region_name\n",
    "sm_client = boto3.client('sagemaker')\n",
    "\n",
    "print(f\"sagemaker role arn: {role}\")\n",
    "print(f\"sagemaker bucket: {sess.default_bucket()}\")\n",
    "print(f\"sagemaker session region: {region}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load GPT-2 model and tokenizer, save them to the same folder with Transformers `save_pretrained` utility "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import transformers \n",
    "from transformers import AutoTokenizer, AutoModelForCausalLM\n",
    "\n",
    "tokenizer = AutoTokenizer.from_pretrained('gpt2')\n",
    "model = AutoModelForCausalLM.from_pretrained('gpt2')\n",
    "\n",
    "model.save_pretrained('gpt2-model/')\n",
    "tokenizer.save_pretrained('gpt2-model/')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tar model and tokenizer artifacts, upload to S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tarfile \n",
    "\n",
    "with tarfile.open('gpt2-model.tar.gz', 'w:gz') as f:\n",
    "    f.add('gpt2-model/',arcname='.')\n",
    "f.close()\n",
    "\n",
    "prefix = 'gpt2-hf-workshop/gpt2-test'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check out the file contents and structure of the model.tar.gz artifact."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! tar -ztvf gpt2-model.tar.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will upload the same model package twice with different names, to simulate deploying 2 models to the same endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! aws s3 cp gpt2-model.tar.gz s3://\"$sagemaker_session_bucket\"/\"$prefix\"/gpt2-model1.tar.gz\n",
    "! aws s3 cp gpt2-model.tar.gz s3://\"$sagemaker_session_bucket\"/\"$prefix\"/gpt2-model2.tar.gz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get image URI for Hugging Face inference Deep Learning Container"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker import image_uris\n",
    "\n",
    "hf_inference_dlc = image_uris.retrieve(framework='huggingface', \n",
    "                                region=region, \n",
    "                                version='4.12.3', \n",
    "                                image_scope='inference', \n",
    "                                base_framework_version='pytorch1.9.1', \n",
    "                                py_version='py38', \n",
    "                                container_version='ubuntu20.04', \n",
    "                                instance_type='ml.m5.xlarge')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use `MultiDataModel`to setup a multi-model endpoint definition\n",
    "By setting the `HF_TASK` environment variable, we avoid having to write and test our own inference code. Depending on the task and model you choose, the Hugging Face inference Container will run the appropriate code by default. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.multidatamodel import MultiDataModel\n",
    "from sagemaker.predictor import Predictor\n",
    "\n",
    "hub = {\n",
    "    'HF_TASK':'text-generation'\n",
    "}\n",
    "\n",
    "mme = MultiDataModel(\n",
    "    name='gpt2-models',\n",
    "    model_data_prefix=f's3://{sagemaker_session_bucket}/{prefix}/',\n",
    "    image_uri=hf_inference_dlc,\n",
    "    env=hub,\n",
    "    predictor_cls=Predictor,\n",
    "    role=role,\n",
    "    sagemaker_session=sess,\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that our model object has already \"registered\" the model artifacts we uploaded to S3 under the `model_data_prefix`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for model in mme.list_models():\n",
    "    print(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deploy Multi-Model Endpoint and send inference requests to both models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "from sagemaker.serializers import JSONSerializer\n",
    "from sagemaker.deserializers import JSONDeserializer\n",
    "\n",
    "endpoint_name_gpt2 = 'mme-gpt2-'+datetime.datetime.now().strftime(\n",
    "                     \"%Y-%m-%d-%H-%M-%S\"\n",
    ")\n",
    "\n",
    "predictor_gpt2 = mme.deploy(\n",
    "            initial_instance_count=1,\n",
    "            instance_type='ml.m5.xlarge',\n",
    "            serializer=JSONSerializer(),\n",
    "            deserializer=JSONDeserializer(),\n",
    "            endpoint_name='mme-gpt2',\n",
    "            wait = False\n",
    "            )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "********************************************************************************************************************************************\n",
    "********************************************************************************************************************************************\n",
    "\n",
    "\n",
    "# Deploying GPT-J to SageMaker Endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.huggingface import HuggingFaceModel\n",
    "import sagemaker\n",
    "\n",
    "role = sagemaker.get_execution_role()\n",
    "# Hub Model configuration. https://huggingface.co/models\n",
    "hub = {\n",
    "\t'HF_MODEL_ID':'EleutherAI/gpt-j-6B',\n",
    "\t'HF_TASK':'text-generation'\n",
    "}\n",
    "\n",
    "# create Hugging Face Model Class\n",
    "huggingface_model = HuggingFaceModel(\n",
    "\ttransformers_version='4.6.1',\n",
    "\tpytorch_version='1.7.1',\n",
    "\tpy_version='py36',\n",
    "\tenv=hub,\n",
    "\trole=role, \n",
    ")\n",
    "\n",
    "# deploy model to SageMaker Inference\n",
    "predictor = huggingface_model.deploy(\n",
    "\tinitial_instance_count=1, # number of instances\n",
    "\tinstance_type='ml.m5.xlarge',\n",
    "    wait = False# ec2 instance type\n",
    ")\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# predictor.predict({\n",
    "# \t'inputs': \"Can you please let us know more details about your \"\n",
    "# })"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (PyTorch 1.8 Python 3.6 CPU Optimized)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/1.8.1-cpu-py36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}