{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install sagemaker xgboost==1.5.1 scikit-learn install sm-serverless-benchmarking -Uqq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploy a pretrained Named Entity Recognition Model\n",
    "In this example, we'll deploy a pretrained Named Entity Recognition (NER) using SageMaker Jumpstart then benchmark the model using the SageMaker Serverless Inference Benchmarking toolkit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "from sagemaker import image_uris, model_uris, script_uris\n",
    "from sagemaker.model import Model\n",
    "import uuid\n",
    "\n",
    "role = (\n",
    "    sagemaker.get_execution_role()\n",
    ")  # manually provide role if using non role based identity\n",
    "sess = sagemaker.Session()\n",
    "region = sess.boto_region_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_id, model_version = (\n",
    "    \"huggingface-ner-distilbert-base-cased-finetuned-conll03-english\",\n",
    "    \"*\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = f\"js-huggingface-ner-distilbert-{str(uuid.uuid1())[:5]}\"\n",
    "\n",
    "inference_instance_type = \"ml.m5.xlarge\"  # used to lookup cpu inference container. No instance will be deployed\n",
    "\n",
    "# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.\n",
    "deploy_image_uri = image_uris.retrieve(\n",
    "    region=None,\n",
    "    framework=None,  # automatically inferred from model_id\n",
    "    image_scope=\"inference\",\n",
    "    model_id=model_id,\n",
    "    model_version=model_version,\n",
    "    instance_type=inference_instance_type,\n",
    ")\n",
    "\n",
    "# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.\n",
    "deploy_source_uri = script_uris.retrieve(\n",
    "    model_id=model_id, model_version=model_version, script_scope=\"inference\"\n",
    ")\n",
    "\n",
    "\n",
    "# Retrieve the model uri. This includes the pre-trained model and parameters.\n",
    "model_uri = model_uris.retrieve(\n",
    "    model_id=model_id, model_version=model_version, model_scope=\"inference\"\n",
    ")\n",
    "\n",
    "\n",
    "# Create the SageMaker model instance\n",
    "model = Model(\n",
    "    image_uri=deploy_image_uri,\n",
    "    source_dir=deploy_source_uri,\n",
    "    model_data=model_uri,\n",
    "    entry_point=\"inference.py\",  # entry point file in source_dir and present in deploy_source_uri\n",
    "    role=role,\n",
    "    name=model_name,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.sagemaker_session = sess\n",
    "model.create(instance_type=inference_instance_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validate Endpoint\n",
    "Before launching a full benchmarking job, it is a good idea to first deploy the model on a test endpoint to ensure everything is functioning as it should. Here we will deploy a temporary endpoint and test it with an example payload. Afterwards, the endpoint is deleted. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a temporary endpoint\n",
    "from sm_serverless_benchmarking.endpoint import ServerlessEndpoint\n",
    "\n",
    "endpoint = ServerlessEndpoint(model_name=model.name, memory_size=6144)\n",
    "endpoint.create_endpoint()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# invoke it with a sample payload and make sure a valid response is returned\n",
    "input_text = \"My name is Wolfgang and I live in Berlin\"\n",
    "response = endpoint.invoke_endpoint(\n",
    "    {\"Body\": input_text, \"ContentType\": \"application/x-text\"}\n",
    ")\n",
    "print(response[\"Body\"].read().decode(\"utf8\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint.clean_up()  # delete the endpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Launch Benchmarking SageMaker Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sm_serverless_benchmarking.utils import convert_invoke_args_to_jsonl\n",
    "from sm_serverless_benchmarking.sagemaker_runner import run_as_sagemaker_job\n",
    "\n",
    "example_invoke_args = [\n",
    "    {\n",
    "        \"Body\": \"My name is Wolfgang and I live in Berlin\",\n",
    "        \"ContentType\": \"application/x-text\",\n",
    "    },\n",
    "    {\n",
    "        \"Body\": \"Amazon.com, Inc. is an American multinational technology company which focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. It is headquartered in Seattle, WA\",\n",
    "        \"ContentType\": \"application/x-text\",\n",
    "    },\n",
    "    {\n",
    "        \"Body\": \"Simon is attending a machine learning workshop in New York next week\",\n",
    "        \"ContentType\": \"application/x-text\",\n",
    "    },\n",
    "]\n",
    "\n",
    "example_invoke_file = convert_invoke_args_to_jsonl(example_invoke_args)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "processor = run_as_sagemaker_job(\n",
    "    role=role, model_name=model.name, invoke_args_examples_file=example_invoke_file\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\n",
    "    f\"Once the job is finished, the outputs will be uploaded to {processor.latest_job.outputs[0].destination}\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can optionally run the command below to copy all of the benchmark output artifacts into the current directory. The primary report output will be under the `benchmarking_report/` directory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp --recursive {processor.latest_job.outputs[0].destination} ."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run a Local Benchmarking Job [OPTIONAL]\n",
    "You can also run the same benchmark locally "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sm_serverless_benchmarking.benchmark import run_serverless_benchmarks\n",
    "\n",
    "report = run_serverless_benchmarks(\n",
    "    model_name=model.name, invoke_args_examples_file=example_invoke_file\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "37058495916f7ab2e7db9963171426deb73c0dc04073ed3a56b3427789bc2f48"
  },
  "kernelspec": {
   "display_name": "Python 3.8.13 ('serverless-benchmarking')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}