{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SageMaker Real-Time Inference\n",
    "## XGBoost Regression Example\n",
    "\n",
    "Amazon SageMaker Real-Time Inference is instance-based hosting, you should utilize it for low latency, high throughput sensitive workloads.\n",
    "\n",
    "For this notebook we'll be working with the SageMaker XGBoost Algorithm to train a model and then deploy a real-time endpoint. We will be using the public S3 Abalone regression dataset for this example."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of Contents\n",
    "- Setup\n",
    "- Deployment\n",
    "    - Model Creation\n",
    "    - Endpoint Configuration (Prod Variants + Instance Setup)\n",
    "    - Real-Time Endpoint Creation\n",
    "    - Endpoint Invocation\n",
    "- Cleanup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "For testing you need to properly configure your Notebook Role to have <b>SageMaker Full Access</b>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's start by installing preview wheels of the Python SDK, boto and aws cli\n",
    "! pip install sagemaker botocore boto3 awscli --upgrade"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Setup clients\n",
    "import boto3\n",
    "\n",
    "client = boto3.client(service_name=\"sagemaker\")\n",
    "runtime = boto3.client(service_name=\"sagemaker-runtime\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SageMaker Setup\n",
    "To begin, we import the AWS SDK for Python (Boto3) and set up our environment, including an IAM role and an S3 bucket to store our data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "import sagemaker\n",
    "from sagemaker.estimator import Estimator\n",
    "\n",
    "boto_session = boto3.session.Session()\n",
    "region = boto_session.region_name\n",
    "print(region)\n",
    "\n",
    "sagemaker_session = sagemaker.Session()\n",
    "base_job_prefix = \"xgboost-example\"\n",
    "role = sagemaker.get_execution_role()\n",
    "print(role)\n",
    "\n",
    "default_bucket = sagemaker_session.default_bucket()\n",
    "s3_prefix = base_job_prefix"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deployment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model Creation\n",
    "Create a model by providing your model artifacts, the container image URI, environment variables for the container (if applicable), a model name, and the SageMaker IAM role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_s3_key = f\"{s3_prefix}/model.tar.gz\"\n",
    "model_url = f\"s3://{default_bucket}/{model_s3_key}\"\n",
    "print(f\"Uploading Model to {model_url}\")\n",
    "\n",
    "with open(\"model/model.tar.gz\", \"rb\") as model_file:\n",
    "    boto_session.resource(\"s3\").Bucket(default_bucket).Object(model_s3_key).upload_fileobj(model_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from time import gmtime, strftime\n",
    "\n",
    "model_name = \"xgboost-realtime\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
    "print(\"Model name: \" + model_name)\n",
    "\n",
    "# environment variables\n",
    "byo_container_env_vars = {\"SAGEMAKER_CONTAINER_LOG_LEVEL\": \"20\"}\n",
    "\n",
    "inference_instance_type = \"ml.m5.xlarge\"\n",
    "\n",
    "# retrieve xgboost image\n",
    "image_uri = sagemaker.image_uris.retrieve(\n",
    "    framework=\"xgboost\",\n",
    "    region=region,\n",
    "    version=\"1.0-1\",\n",
    "    py_version=\"py3\",\n",
    "    instance_type=inference_instance_type,\n",
    ")\n",
    "\n",
    "create_model_response = client.create_model(\n",
    "    ModelName=model_name,\n",
    "    Containers=[\n",
    "        {\n",
    "            \"Image\": image_uri,\n",
    "            \"Mode\": \"SingleModel\",\n",
    "            \"ModelDataUrl\": model_url,\n",
    "            \"Environment\": byo_container_env_vars,\n",
    "        }\n",
    "    ],\n",
    "    ExecutionRoleArn=role,\n",
    ")\n",
    "\n",
    "print(\"Model Arn: \" + create_model_response[\"ModelArn\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Endpoint Configuration Creation\n",
    "\n",
    "This is where you can adjust the instance count/type for your endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xgboost_epc_name = \"xgboost-real-time-epc\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
    "\n",
    "endpoint_config_response = client.create_endpoint_config(\n",
    "    EndpointConfigName=xgboost_epc_name,\n",
    "    ProductionVariants=[\n",
    "        {\n",
    "            \"VariantName\": \"byoVariant\",\n",
    "            \"ModelName\": model_name,\n",
    "            \"InitialInstanceCount\": 1,\n",
    "            \"InstanceType\": \"ml.m5.xlarge\"\n",
    "        },\n",
    "    ],\n",
    ")\n",
    "print(\"Endpoint Configuration Arn: \" + endpoint_config_response[\"EndpointConfigArn\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### RealTime Endpoint Creation\n",
    "Now that we have an endpoint configuration, we can create a RealTime endpoint and deploy our model to it. When creating the endpoint, provide the name of your endpoint configuration and a name for the new endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint_name = \"xgboost-realtime-ep\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
    "\n",
    "create_endpoint_response = client.create_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    EndpointConfigName=xgboost_epc_name,\n",
    ")\n",
    "\n",
    "print(\"Endpoint Arn: \" + create_endpoint_response[\"EndpointArn\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Wait until the endpoint status is InService before invoking the endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# wait for endpoint to reach a terminal state (InService) using describe endpoint\n",
    "import time\n",
    "\n",
    "describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n",
    "\n",
    "while describe_endpoint_response[\"EndpointStatus\"] == \"Creating\":\n",
    "    describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)\n",
    "    print(describe_endpoint_response[\"EndpointStatus\"])\n",
    "    time.sleep(15)\n",
    "\n",
    "describe_endpoint_response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Endpoint Invocation\n",
    "Invoke the endpoint by sending a request to it. The following is a sample data point grabbed from the CSV file downloaded from the public Abalone dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = runtime.invoke_endpoint(\n",
    "    EndpointName=endpoint_name,\n",
    "    Body=b\".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0\",\n",
    "    ContentType=\"text/csv\",\n",
    ")\n",
    "\n",
    "print(response[\"Body\"].read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean Up\n",
    "Delete any resources you created in this notebook that you no longer wish to use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# client.delete_model(ModelName=model_name)\n",
    "# client.delete_endpoint_config(EndpointConfigName=xgboost_epc_name)\n",
    "# client.delete_endpoint(EndpointName=endpoint_name)"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}