{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install sagemaker xgboost==1.5.1 scikit-learn install sm-serverless-benchmarking -Uqq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy a pretrained Named Entity Recognition Model\n", "In this example, we'll deploy a pretrained Named Entity Recognition (NER) using SageMaker Jumpstart then benchmark the model using the SageMaker Serverless Inference Benchmarking toolkit" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker import image_uris, model_uris, script_uris\n", "from sagemaker.model import Model\n", "import uuid\n", "\n", "role = (\n", " sagemaker.get_execution_role()\n", ") # manually provide role if using non role based identity\n", "sess = sagemaker.Session()\n", "region = sess.boto_region_name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_id, model_version = (\n", " \"huggingface-ner-distilbert-base-cased-finetuned-conll03-english\",\n", " \"*\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_name = f\"js-huggingface-ner-distilbert-{str(uuid.uuid1())[:5]}\"\n", "\n", "inference_instance_type = \"ml.m5.xlarge\" # used to lookup cpu inference container. No instance will be deployed\n", "\n", "# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.\n", "deploy_image_uri = image_uris.retrieve(\n", " region=None,\n", " framework=None, # automatically inferred from model_id\n", " image_scope=\"inference\",\n", " model_id=model_id,\n", " model_version=model_version,\n", " instance_type=inference_instance_type,\n", ")\n", "\n", "# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.\n", "deploy_source_uri = script_uris.retrieve(\n", " model_id=model_id, model_version=model_version, script_scope=\"inference\"\n", ")\n", "\n", "\n", "# Retrieve the model uri. This includes the pre-trained model and parameters.\n", "model_uri = model_uris.retrieve(\n", " model_id=model_id, model_version=model_version, model_scope=\"inference\"\n", ")\n", "\n", "\n", "# Create the SageMaker model instance\n", "model = Model(\n", " image_uri=deploy_image_uri,\n", " source_dir=deploy_source_uri,\n", " model_data=model_uri,\n", " entry_point=\"inference.py\", # entry point file in source_dir and present in deploy_source_uri\n", " role=role,\n", " name=model_name,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.sagemaker_session = sess\n", "model.create(instance_type=inference_instance_type)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Validate Endpoint\n", "Before launching a full benchmarking job, it is a good idea to first deploy the model on a test endpoint to ensure everything is functioning as it should. Here we will deploy a temporary endpoint and test it with an example payload. Afterwards, the endpoint is deleted. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a temporary endpoint\n", "from sm_serverless_benchmarking.endpoint import ServerlessEndpoint\n", "\n", "endpoint = ServerlessEndpoint(model_name=model.name, memory_size=6144)\n", "endpoint.create_endpoint()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# invoke it with a sample payload and make sure a valid response is returned\n", "input_text = \"My name is Wolfgang and I live in Berlin\"\n", "response = endpoint.invoke_endpoint(\n", " {\"Body\": input_text, \"ContentType\": \"application/x-text\"}\n", ")\n", "print(response[\"Body\"].read().decode(\"utf8\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint.clean_up() # delete the endpoint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Launch Benchmarking SageMaker Job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sm_serverless_benchmarking.utils import convert_invoke_args_to_jsonl\n", "from sm_serverless_benchmarking.sagemaker_runner import run_as_sagemaker_job\n", "\n", "example_invoke_args = [\n", " {\n", " \"Body\": \"My name is Wolfgang and I live in Berlin\",\n", " \"ContentType\": \"application/x-text\",\n", " },\n", " {\n", " \"Body\": \"Amazon.com, Inc. is an American multinational technology company which focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. It is headquartered in Seattle, WA\",\n", " \"ContentType\": \"application/x-text\",\n", " },\n", " {\n", " \"Body\": \"Simon is attending a machine learning workshop in New York next week\",\n", " \"ContentType\": \"application/x-text\",\n", " },\n", "]\n", "\n", "example_invoke_file = convert_invoke_args_to_jsonl(example_invoke_args)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "processor = run_as_sagemaker_job(\n", " role=role, model_name=model.name, invoke_args_examples_file=example_invoke_file\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\n", " f\"Once the job is finished, the outputs will be uploaded to {processor.latest_job.outputs[0].destination}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can optionally run the command below to copy all of the benchmark output artifacts into the current directory. The primary report output will be under the `benchmarking_report/` directory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws s3 cp --recursive {processor.latest_job.outputs[0].destination} ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run a Local Benchmarking Job [OPTIONAL]\n", "You can also run the same benchmark locally " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sm_serverless_benchmarking.benchmark import run_serverless_benchmarks\n", "\n", "report = run_serverless_benchmarks(\n", " model_name=model.name, invoke_args_examples_file=example_invoke_file\n", ")" ] } ], "metadata": { "interpreter": { "hash": "37058495916f7ab2e7db9963171426deb73c0dc04073ed3a56b3427789bc2f48" }, "kernelspec": { "display_name": "Python 3.8.13 ('serverless-benchmarking')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }