{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create Asynchronous SageMaker endpoint to generate synthetic defects with missing components"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Amazon SageMaker Asynchronous Inference is a new inference option in Amazon SageMaker that queues incoming requests and processes them asynchronously. Asynchronous inference enables users to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker import get_execution_role\n",
    "from sagemaker.pytorch import PyTorchModel\n",
    "import boto3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1 Get the execution role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "role = get_execution_role()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2 Set up environment variables "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "env = dict()\n",
    "env['TS_MAX_REQUEST_SIZE'] = '1000000000'\n",
    "env['TS_MAX_RESPONSE_SIZE'] = '1000000000'\n",
    "env['TS_DEFAULT_RESPONSE_TIMEOUT'] = '1000000'\n",
    "env['DEFAULT_WORKERS_PER_MODEL'] = '1'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3 Set up pytorch docker image. \n",
    "#### First step is to download the big lama model from https://github.com/saic-mdal/lama and save it in S3."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reference deep learning containers\n",
    "\n",
    "https://github.com/aws/deep-learning-containers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = PyTorchModel(\n",
    "    entry_point=\"./inference_defect_gen.py\",\n",
    "    role=role,\n",
    "    source_dir = './',\n",
    "    model_data='s3://qualityinspection/model/big-lama.tar.gz',\n",
    "    image_uri = '763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker',\n",
    "    framework_version=\"1.7.1\",\n",
    "    py_version=\"py3\",\n",
    "    env = env,\n",
    "    model_server_workers=1\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4 Specify additional asynchronous inference specific configuration parameters and deploy the endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-----------!"
     ]
    }
   ],
   "source": [
    "from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig\n",
    "#option2:async_inference\n",
    "bucket = 'qualityinspection'\n",
    "prefix = 'async-endpoint'\n",
    "\n",
    "async_config = AsyncInferenceConfig(output_path=f\"s3://{bucket}/{prefix}/output\", \n",
    "                                    max_concurrent_invocations_per_instance=10\n",
    "                                   )\n",
    "\n",
    "predictor = model.deploy(\n",
    "    initial_instance_count=1,\n",
    "    instance_type='ml.g4dn.xlarge',\n",
    "    model_server_workers=1, \n",
    "    async_inference_config=async_config\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### It takes about 6-8 minutes to finish the endpoint deployment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5 Invoke the endpoint with input file in s3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/ec2-user/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/boto3/compat.py:88: PythonDeprecationWarning: Boto3 will no longer support Python 3.6 starting May 30, 2022. To continue receiving service updates, bug fixes, and security updates please upgrade to Python 3.7 or later. More information can be found here: https://aws.amazon.com/blogs/developer/python-support-policy-updates-for-aws-sdks-and-tools/\n",
      "  warnings.warn(warning, PythonDeprecationWarning)\n"
     ]
    }
   ],
   "source": [
    "runtime= boto3.client('runtime.sagemaker')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = runtime.invoke_endpoint_async(EndpointName='endpoint name',\n",
    "                                   InputLocation='s3://qualityinspection/input/input.txt'\n",
    "                                   )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### The inference takes 3-4 minutes to finish in a ml.g4dn.xlarge instance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6 Clean Up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Delete the endpoint if you no longer use it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'ResponseMetadata': {'RequestId': 'e5f2a83c-85ab-4952-9289-cf6efb13062f',\n",
       "  'HTTPStatusCode': 200,\n",
       "  'HTTPHeaders': {'x-amzn-requestid': 'e5f2a83c-85ab-4952-9289-cf6efb13062f',\n",
       "   'content-type': 'application/x-amz-json-1.1',\n",
       "   'content-length': '0',\n",
       "   'date': 'Sat, 01 Oct 2022 03:14:38 GMT'},\n",
       "  'RetryAttempts': 0}}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import boto3\n",
    "sm_boto3 = boto3.client(\"sagemaker\")\n",
    "sm_boto3.delete_endpoint(EndpointName='endpoint name')"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "f5a3948e370ebd1a06ae51fd25b958ed2a347e6ea1a2dc75289fbcb418a53253"
  },
  "kernelspec": {
   "display_name": "conda_amazonei_mxnet_p36",
   "language": "python",
   "name": "conda_amazonei_mxnet_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}