{ "cells": [ { "cell_type": "markdown", "id": "504fbfd4", "metadata": {}, "source": [ "# Using Async Inference and Jumpstart models on SageMaker to power pre-labeling workflows" ] }, { "cell_type": "markdown", "id": "b68cd7e7", "metadata": {}, "source": [ "1. [Set Up](#1.-Set-Up)\n", "2. [Run inference on the pre-trained model](#2.-Run-inference-on-the-pre-trained-model)\n", " * [Retrieve JumpStart Artifacts & Deploy an Endpoint](#2.1.-Retrieve-model-artifacts-&-Deploy-to-an-endpoint)\n", " * [Download & process images for annotations](#2.2.-Download-images-for-annotations)\n", " * [Single prediction example](#2.3.-Single-prediction-example)\n", " * [Display model predictions](#2.4.-Display-model-predictions)\n", " * [Send images to Async endpoint](#2.5.-Send-images-to-Async-endpoint)\n", "3. [Convert the model output annotations to SageMaker GroudTruth format](#3.0.-Convert-the-model-output-annotations-to-SageMaker-GroudTruth-format)\n", "4. [Create Boundingbox Verification Job](#4.0.-Create-Boundingbox-verification-job)\n", " * [Execute API call to create the job](#4.1.-Execute-API-call-to-create-the-job)\n", " * [Complete Verification](#4.2.-Complete-verification) \n", "5. [Clean up the endpoint](#5.0.-Clean-up-the-endpoint)" ] }, { "cell_type": "markdown", "id": "b05d37bd", "metadata": {}, "source": [ "Note: This notebook was tested on ml.t3.medium instance in Amazon SageMaker Studio with Python 3 (Data Science) kernel and in Amazon SageMaker Notebook instance with conda_python3 kernel." ] }, { "cell_type": "markdown", "id": "60c71ea4", "metadata": {}, "source": [ "### 1. Set Up" ] }, { "cell_type": "code", "execution_count": null, "id": "5a7f3ac8", "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install sagemaker --upgrade\n", "!pip install awswrangler " ] }, { "cell_type": "code", "execution_count": null, "id": "db71f0bd-906d-4df5-86ef-df5109cbdfb0", "metadata": { "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "id": "2db0d475", "metadata": {}, "source": [ "#### Permissions and environment variables\n", "\n", "---\n", "To train and host on Amazon SageMaker, we need to set up and authenticate the use of AWS services. Here, we use the execution role associated with the current notebook as the AWS account role with SageMaker access. It has necessary permissions, including access to your data in S3. " ] }, { "cell_type": "code", "execution_count": null, "id": "06ebb3e7", "metadata": { "tags": [] }, "outputs": [], "source": [ "import sagemaker, boto3, json\n", "import awswrangler as wr\n", "from sagemaker import get_execution_role\n", "from utils import *\n", "\n", "\n", "aws_role = get_execution_role()\n", "aws_region = boto3.Session().region_name\n", "sess = sagemaker.Session()" ] }, { "cell_type": "markdown", "id": "e8c55a7d", "metadata": {}, "source": [ "## 2. Run inference on the pre-trained model\n", "\n", "***\n", "Using JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset." ] }, { "cell_type": "markdown", "id": "6b9b9121", "metadata": { "tags": [] }, "source": [ "### 2.1. Retrieve model artifacts & Deploy to an endpoint\n", "\n", "There are three options to perform inference on an existing pre-trained model\n", "\n", "* Option A- Create model from SageMaker Jumpstart. Using JumpStart, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. \n", "\n", "* Option B- Use a model shared with your team or organization. You can use this option if you want to use a model developed by one of the teams within your organization (e.g. Perception).\n", "\n", "* Option C- Use an existing endpoint. You can use this option if you have an existing model already deployed in your account. \n", "\n", "Following sections provide details" ] }, { "cell_type": "markdown", "id": "1340d6a6-bc50-47b4-b1e3-ce8f3aa7e6fc", "metadata": { "tags": [] }, "source": [ "#### Option A: Create model from SageMaker Jumpstart and deploy to an endpoint\n", "***\n", "\n", "Here, we download jumpstart model_manifest file from the jumpstart s3 bucket, filter-out all the Instance Segmentation models and select a model for inference. We retrieve the `deploy_image_uri`, `deploy_source_uri`, and `base_model_uri` for the pre-trained model. To host the pre-trained base-model, we create an instance of [`sagemaker.model.Model`](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it. \n", "\n", "This has following steps\n", "\n", "* Create the model\n", "* Async Inference set up & Deploy model\n", "* Setup Auto scaling\n", "\n", "If you have already created an endpoint and want to use it, proceed to Option C." ] }, { "cell_type": "markdown", "id": "0cac26b8-2739-4429-9d3e-61bdcbd8c2a5", "metadata": {}, "source": [ "##### Create the model" ] }, { "cell_type": "code", "execution_count": null, "id": "758a3fee", "metadata": { "tags": [] }, "outputs": [], "source": [ "from ipywidgets import Dropdown\n", "\n", "# download JumpStart model_manifest file.\n", "boto3.client(\"s3\").download_file(\n", " f\"jumpstart-cache-prod-{aws_region}\", \"models_manifest.json\", \"models_manifest.json\"\n", ")\n", "with open(\"models_manifest.json\", \"rb\") as json_file:\n", " model_list = json.load(json_file)\n", "\n", "# filter-out all the Instance Segmentation models from the manifest list.\n", "is_models = []\n", "for model in model_list:\n", " model_id = model[\"model_id\"]\n", " if \"-is-\" in model_id and model_id not in is_models:\n", " is_models.append(model_id)\n", "\n", "is_models" ] }, { "cell_type": "markdown", "id": "025b47a3-c46c-4b46-acb8-7eddd0293238", "metadata": { "tags": [] }, "source": [ "From the above list of models, pick an instance segementation model for pre-labeling task\n", "\n", "NOTE: model_version=\"*\" fetches the latest version of the model" ] }, { "cell_type": "code", "execution_count": null, "id": "69253b25", "metadata": { "tags": [] }, "outputs": [], "source": [ "model_id = 'mxnet-is-mask-rcnn-fpn-resnet101-v1d-coco'\n", "model_version = \"*\"" ] }, { "cell_type": "code", "execution_count": null, "id": "5c397245", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker import image_uris, model_uris, script_uris, hyperparameters\n", "from sagemaker.model import Model\n", "from sagemaker.predictor import Predictor\n", "from sagemaker.utils import name_from_base\n", "\n", "endpoint_name = name_from_base(f\"jumpstart-example-infer-{model_id}\")\n", "inference_instance_type = \"ml.p3.2xlarge\"\n", "\n", "# Retrieve the inference docker container uri\n", "deploy_image_uri = image_uris.retrieve(\n", " region=None,\n", " framework=None, # automatically inferred from model_id\n", " image_scope=\"inference\",\n", " model_id=model_id,\n", " model_version=model_version,\n", " instance_type=inference_instance_type,\n", ")\n", "\n", "# Retrieve the inference script uri. This includes scripts for model loading, inference handling etc.\n", "deploy_source_uri = script_uris.retrieve(\n", " model_id=model_id, model_version=model_version, script_scope=\"inference\"\n", ")\n", "\n", "\n", "# Retrieve the base model uri\n", "base_model_uri = model_uris.retrieve(\n", " model_id=model_id, model_version=model_version, model_scope=\"inference\"\n", ")\n", "\n", "# Create the SageMaker model instance\n", "model = Model(\n", " image_uri=deploy_image_uri,\n", " source_dir=deploy_source_uri,\n", " model_data=base_model_uri,\n", " entry_point=\"inference.py\", # entry point file in source_dir and present in deploy_source_uri\n", " role=aws_role,\n", " predictor_cls=Predictor,\n", " name=endpoint_name,\n", ")\n", "\n", "print(f'Model endpoint is {endpoint_name}')" ] }, { "cell_type": "markdown", "id": "3d62abb2", "metadata": {}, "source": [ "##### Async Inference set up & Deploy model\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "22602ccd", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig\n", "\n", "async_config = AsyncInferenceConfig(\n", " output_path=f\"s3://{sess.default_bucket()}/asyncinference/output\",\n", " max_concurrent_invocations_per_instance=4,\n", " # Optionally specify Amazon SNS topics\n", " # notification_config = {\n", " # \"SuccessTopic\": \"arn:aws:sns:::\",\n", " # \"ErrorTopic\": \"arn:aws:sns:::\",\n", " # }\n", ")\n", "\n", "base_model_predictor = model.deploy(\n", " async_inference_config=async_config,\n", " instance_type=inference_instance_type,\n", " initial_instance_count=1,\n", " predictor_cls=Predictor,\n", " endpoint_name=endpoint_name\n", ")" ] }, { "cell_type": "markdown", "id": "17391d2f", "metadata": {}, "source": [ "##### Set up Autoscaling\n", "---\n", "\n", "First register your endpoint variant with Application Autoscaling, define a scaling policy, and then apply the scaling policy. In this configuration, we use a custom metric, `CustomizedMetricSpecification`, called `ApproximateBacklogSizePerInstance`. Please refer to the SageMaker Developer guide for a detailed list of metrics available with your asynchronous inference endpoint." ] }, { "cell_type": "code", "execution_count": null, "id": "f786a47d", "metadata": { "tags": [] }, "outputs": [], "source": [ "client = boto3.client(\n", " \"application-autoscaling\"\n", ") # Common class representing Application Auto Scaling for SageMaker amongst other services\n", "\n", "resource_id = (\n", " \"endpoint/\" + endpoint_name + \"/variant/\" + \"AllTraffic\"\n", ") # This is the format in which application autoscaling references the endpoint\n", "\n", "# Configure Autoscaling on asynchronous endpoint down to zero instances\n", "response = client.register_scalable_target(\n", " ServiceNamespace=\"sagemaker\",\n", " ResourceId=resource_id,\n", " ScalableDimension=\"sagemaker:variant:DesiredInstanceCount\",\n", " MinCapacity=1, # Note that this endpoint can autoscale down to zero!\n", " MaxCapacity=5,\n", ")\n", "\n", "response = client.put_scaling_policy(\n", " PolicyName=\"Invocations-ScalingPolicy\",\n", " ServiceNamespace=\"sagemaker\", # The namespace of the AWS service that provides the resource.\n", " ResourceId=resource_id, # Endpoint name\n", " ScalableDimension=\"sagemaker:variant:DesiredInstanceCount\", # SageMaker supports only Instance Count\n", " PolicyType=\"TargetTrackingScaling\", # 'StepScaling'|'TargetTrackingScaling'\n", " TargetTrackingScalingPolicyConfiguration={\n", " \"TargetValue\": 5.0, # The target value for the metric. - here the metric is - SageMakerVariantInvocationsPerInstance\n", " \"CustomizedMetricSpecification\": {\n", " \"MetricName\": \"ApproximateBacklogSizePerInstance\",\n", " \"Namespace\": \"AWS/SageMaker\",\n", " \"Dimensions\": [{\"Name\": \"EndpointName\", \"Value\": endpoint_name}],\n", " \"Statistic\": \"Average\",\n", " },\n", " \"ScaleInCooldown\": 300, \n", " \"ScaleOutCooldown\": 300 \n", " },\n", ")" ] }, { "cell_type": "markdown", "id": "5ab39a35-137d-4df7-b615-79ecb369a520", "metadata": { "tags": [] }, "source": [ "#### Option B: Use a model shared with your team or organization\n", "***\n", "You can also use a model that is developed by another team (e.g. Perception) and shared with your team. Modles in an organization can be shared among the teams via SageMaker Jumpstart. Following screenshots provide some examples. For more details on how to share a Jumpstart model visit the [documentation page.](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-content-sharing.html)\n" ] }, { "cell_type": "markdown", "id": "f6a6fb4a-bdd1-4f9a-89f6-ce0557d35f35", "metadata": { "tags": [] }, "source": [ "* You can find the models shared from SageMaker Studio Jumpstart page. You can access it from Home menu. \n", "\n", "![Jumpstart-shared-discover.png]()\n", "\n", "\n", "\"Jumpstart\n", "\n", "\n", "* You can choose to Deploy the shared model from the model details page. Review the configurations and click Deploy.\n", "\n", "\"Deploy\n", "\n", "* Deployment takes few minutes and when it is complete you will see the Endpoint status changes to \"In Service\". This page will also give you the ARN of the deployed endpoint. You can copy the ARN and move to next step. \n", "\n", "\"Endpoint\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c855b2dd-7f3a-4b11-90eb-ad0927a922ae", "metadata": {}, "outputs": [], "source": [ "from sagemaker.predictor import Predictor\n", "endpoint_name = # Copy the endpoint here\n", "base_model_predictor = sagemaker.predictor_async.AsyncPredictor(predictor = Predictor(session=sess,endpoint_name= endpoint_name))" ] }, { "cell_type": "markdown", "id": "9b5102c0-2168-47a3-b9a2-a491e1fe684f", "metadata": {}, "source": [ "#### Option C: Use an existing endpoint\n", "***\n", "If we already have the model deployed, we can get the Predictor with the endpoint name. If not go to Option A or B from the above section to deploy the model." ] }, { "cell_type": "code", "execution_count": null, "id": "cce69d53-d8dc-4cbd-865b-cad08b3cb11c", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.predictor import Predictor\n", "endpoint_name = 'jumpstart-example-infer-mxnet-is-mask-r-2023-02-27-21-19-55-096'\n", "base_model_predictor = sagemaker.predictor_async.AsyncPredictor(predictor = Predictor(session=sess,endpoint_name= endpoint_name))" ] }, { "cell_type": "markdown", "id": "fc7baf82", "metadata": {}, "source": [ "### 2.2. Download images for annotations \n", "---\n", "In this step, we download images that need to be annotated. We download Ford multi AV sesonal dataset for this notebook. You can use your images that need to be labaled. We will use the Jumpstart model to label the images. \n", "\n", "* Reference: https://arxiv.org/abs/2003.07969\n", "* Ford Multi-AV Seasonal Dataset was accessed on DATE from https://registry.opendata.aws/ford-multi-av-seasonal" ] }, { "cell_type": "code", "execution_count": null, "id": "c9a0de15", "metadata": {}, "outputs": [], "source": [ "%%time\n", "!aws s3 cp --no-sign-request s3://ford-multi-av-seasonal/2018-04-17/V2/Log1/2018-04-17-V2-Log1-FL.tar.gz ./data" ] }, { "cell_type": "code", "execution_count": null, "id": "ae5b0e37", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "%%time\n", "!rm -r /data/images\n", "!mkdir /data/images\n", "!rm -r /data/processedimages\n", "!mkdir /data/processedimages\n", "!rm -r /data/predictions\n", "!mkdir /data/predictions\n", "!rm -r /data/segmentation\n", "!mkdir /data/segmentation\n", "\n", "!tar -xzf /data/2018-04-17-V2-Log1-FL.tar.gz -C /data/images --no-same-owner" ] }, { "cell_type": "markdown", "id": "10f09c4a", "metadata": {}, "source": [ "#### Start resize job locally. \n", "\n", "This uses mogrify to resize the images in batch. \n", "\n", "Optionally you can speed up the resizing with GNU parallel. If you want to proceed with it, install GNU Parallel library. This can be installed by running the following command in SageMaker image terminal. \n", "\n", "```\n", "conda install -c conda-forge parallel\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "id": "8f0d1c01-1aad-4a76-a100-d32f89b8b031", "metadata": {}, "outputs": [], "source": [ "#%%time\n", "#!mogrify -path data/processedimages -format png -resize 559x536! data/images/*.png " ] }, { "cell_type": "code", "execution_count": null, "id": "1553025f-52d6-47d6-b320-7e6d2fe8ea08", "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "!ls data/images/*.png | parallel --jobs 3 mogrify -path data/processedimages -format png -resize 559x536! {} " ] }, { "cell_type": "markdown", "id": "5e54ffbe", "metadata": {}, "source": [ "### 2.3. Single prediction example\n", "\n", "Note that it may take some time for the inference outputs to be written back to S3 from an Async endpoint" ] }, { "cell_type": "code", "execution_count": null, "id": "1e29a1d8", "metadata": { "tags": [] }, "outputs": [], "source": [ "!ls data/processedimages | tail -5" ] }, { "cell_type": "code", "execution_count": null, "id": "359a3a0d", "metadata": { "tags": [] }, "outputs": [], "source": [ "input_1_location = 'data/processedimages/1523946753799396.png'\n", "input_1_s3_location = upload_image(sess,input_1_location,sess.default_bucket())" ] }, { "cell_type": "code", "execution_count": null, "id": "a43ee986", "metadata": { "tags": [] }, "outputs": [], "source": [ "async_response = base_model_predictor.predict_async(input_path=input_1_s3_location)\n", "output_location = async_response.output_path\n", "print(f'Output path for single prediction is {output_location}')" ] }, { "cell_type": "code", "execution_count": null, "id": "1cd4cf70-f695-4b71-831f-3cc71c040f60", "metadata": { "tags": [] }, "outputs": [], "source": [ "#Wait until the object is available in S3\n", "wr.s3.wait_objects_exist([output_location])" ] }, { "cell_type": "code", "execution_count": null, "id": "810a3732", "metadata": { "tags": [] }, "outputs": [], "source": [ "#Copy object locally\n", "!aws s3 cp {output_location} data/single.out" ] }, { "cell_type": "markdown", "id": "3c571ee1", "metadata": {}, "source": [ "### 2.4. Display model predictions\n", "---\n", "Next, we to plot the boxes on top of image with masks. For this, we adopt a similar function from [GluonCV](https://cv.gluon.ai/_modules/gluoncv/utils/viz/bbox.html#plot_bbox)" ] }, { "cell_type": "code", "execution_count": null, "id": "a21f9feb-479d-42b4-abb6-55d50a6f7492", "metadata": { "tags": [] }, "outputs": [], "source": [ "plot_response('data/single.out')" ] }, { "cell_type": "markdown", "id": "3eb9b98f", "metadata": {}, "source": [ "### 2.5. Send images to Async endpoint\n", "\n", "In the next step we send a batch of images to the Async endpoint. You can control how many images you want to send to the endpoint by changing the variable max_images\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "9b9eac6d", "metadata": { "tags": [] }, "outputs": [], "source": [ "import glob\n", "import time\n", "\n", "max_images = 10\n", "input_locations,output_locations, = [], []\n", "\n", "for i, file in enumerate(glob.glob(\"data/processedimages/*.png\")):\n", " input_1_s3_location = upload_image(sess,file,sess.default_bucket())\n", " input_locations.append(input_1_s3_location)\n", " async_response = base_model_predictor.predict_async(input_path=input_1_s3_location)\n", " output_locations.append(async_response.output_path)\n", " if i > max_images:\n", " break" ] }, { "cell_type": "code", "execution_count": null, "id": "4c7d760c-9769-4d4f-935c-02fb217bb452", "metadata": { "tags": [] }, "outputs": [], "source": [ "#Wait for objects to be available in S3\n", "wr.s3.wait_objects_exist(output_locations,delay=5,max_attempts=2*max_images)" ] }, { "cell_type": "markdown", "id": "b45f64e6-ec23-4ea0-b990-2a534e03c5f6", "metadata": {}, "source": [ "### 3.0. Convert the model output annotations to SageMaker GroudTruth format\n", "---\n", "Next, we take the output from the Jumpstart model and convert the annotations to SageMaker Groundtruth format." ] }, { "cell_type": "code", "execution_count": null, "id": "f3cf759e", "metadata": { "tags": [] }, "outputs": [], "source": [ "image_bucket=sess.default_bucket()\n", "image_prefix = \"asyncinference/images\"\n", "manifest_file_name=\"annotations.manifest\"\n", "convert_to_sm_gt_manifest(output_locations,image_bucket,image_prefix,manifest_file_name)" ] }, { "cell_type": "markdown", "id": "450c0215-4ff4-43eb-9378-3538fedad862", "metadata": {}, "source": [ "Upload the manifest file to S3\n", "\n", "---\n", "Next we will upload the generated manifest file to S3 which can be used for Bounding box & Label verification\n", "Provide the s3 bucket where the manifest file needs to be uploaded in the below section" ] }, { "cell_type": "code", "execution_count": null, "id": "2f1b41eb-d00b-4f4b-9254-8f5e4d502a88", "metadata": { "tags": [] }, "outputs": [], "source": [ "manifest_bucket = 'sm-gt-label-490491240736'\n", "s3_manifest_file = upload_file(sess,manifest_file_name,manifest_bucket,'manifest')\n", "print(f\"Labeling manifest file uploaded to {s3_manifest_file}\")" ] }, { "cell_type": "markdown", "id": "65353fe4-7093-43c6-bd55-bd8f038cfc99", "metadata": {}, "source": [ "### 4.0. Create Boundingbox verification job\n", "---\n", "In this section, we will create a [bounding box verification job](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-verification-data.html#sms-data-verify-start-api). We will upload the SageMaker Ground Truth UI Template, label categories file and create the verification job. This uses Private workforce to perform the labeling and you can change if you are using other types of workforce. For more details refer to CreateLabelingJob API [here.](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html)" ] }, { "cell_type": "markdown", "id": "4bea6430-0290-4570-b777-961d2d90e794", "metadata": {}, "source": [ "#### 4.1. Execute API call to create the job" ] }, { "cell_type": "code", "execution_count": null, "id": "f3bce21d-86c3-4d05-a20c-ba39806d31e7", "metadata": { "tags": [] }, "outputs": [], "source": [ "ui_template_file=\"instructions.template\"\n", "label_categories_file = \"label_categories.json\"\n", "ui_template_uri= upload_file(sess,ui_template_file,manifest_bucket,'uitemplate')\n", "label_caegories_json_uri= upload_file(sess,label_categories_file,manifest_bucket,'uitemplate')\n", "\n", "print(f\"UI template uploaded to {ui_template_uri}\")\n", "print(f\"Label categories uploaded to {label_caegories_json_uri}\")" ] }, { "cell_type": "markdown", "id": "c954c7a4-b85c-40dd-8b97-1689f56d71fd", "metadata": {}, "source": [ "---\n", "In the below section, specify the parameters requried to start the verification job\n", "* labeling_job_iam_role_arn - Speicify the IAM role ARN that will be assumed by the verification job\n", "* private_workforce_arn - Specify the ARN of the Private workforce tha will perform the labeling\n", "* pre_human_lambda - Lambda function that processes before the verification job is assigned. For more details refer to [documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#SageMaker-Type-HumanTaskConfig-PreHumanTaskLambdaArn).\n", "* annotation_consolidation_lambda - Lambda function that consolidates the annotations from multiple workers. For more details refer to [documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AnnotationConsolidationConfig.html#SageMaker-Type-AnnotationConsolidationConfig-AnnotationConsolidationLambdaArn)" ] }, { "cell_type": "code", "execution_count": null, "id": "74bc08d0-94c2-4af0-a5a0-c476ade4a7b4", "metadata": { "tags": [] }, "outputs": [], "source": [ "labeling_job_name = 'async-jumpstart-bbox-labeling-job-' + str(int(time.time()))\n", "manifest_label_name = 'prelabel'\n", "labeling_job_iam_role_arn='arn:aws:iam::490491240736:role/service-role/AmazonSageMaker-ExecutionRole-20230222T140752'\n", "private_workforce_arn = 'arn:aws:sagemaker:us-west-2:490491240736:workteam/private-crowd/inhouse-team'\n", "label_output = f's3://{manifest_bucket}/gtoutput'\n", "arn_region_map = {\n", " \"us-west-2\": \"081040173940\",\n", " \"us-east-1\": \"432418664414\",\n", " \"us-east-2\": \"266458841044\",\n", " \"eu-west-1\": \"568282634449\",\n", " \"eu-west-2\": \"487402164563\",\n", " \"ap-northeast-1\": \"477331159723\",\n", " \"ap-northeast-2\": \"845288260483\",\n", " \"ca-central-1\": \"918755190332\",\n", " \"eu-central-1\": \"203001061592\",\n", " \"ap-south-1\": \"565803892007\",\n", " \"ap-southeast-1\": \"377565633583\",\n", " \"ap-southeast-2\": \"454466003867\",\n", "}\n", "\n", "#Provide Lambda function for preprocessing\n", "#Ref: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#SageMaker-Type-HumanTaskConfig-PreHumanTaskLambdaArn\n", "pre_human_lambda = f'arn:aws:lambda:{aws_region}:{arn_region_map[aws_region]}:function:PRE-AdjustmentBoundingBox'\n", "\n", "#Provide Lambda function for annotation consolidation\n", "#Ref: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AnnotationConsolidationConfig.html#SageMaker-Type-AnnotationConsolidationConfig-AnnotationConsolidationLambdaArn\n", "annotation_consolidation_lambda = f'arn:aws:lambda:{aws_region}:{arn_region_map[aws_region]}:function:ACS-AdjustmentBoundingBox'" ] }, { "cell_type": "code", "execution_count": null, "id": "d29416f4-d146-4efb-829b-16d93251e0f1", "metadata": { "tags": [] }, "outputs": [], "source": [ "sagemaker_client = boto3.client(\"sagemaker\")\n", "#Create the labeling job \n", "response = sagemaker_client.create_labeling_job(\n", " LabelingJobName=labeling_job_name,\n", " LabelAttributeName=manifest_label_name,\n", " InputConfig={\n", " 'DataSource': {\n", " 'S3DataSource': {\n", " 'ManifestS3Uri': s3_manifest_file\n", " }\n", " },\n", " 'DataAttributes': {\n", " 'ContentClassifiers': [\n", " 'FreeOfPersonallyIdentifiableInformation','FreeOfAdultContent',\n", " ]\n", " }\n", " },\n", " OutputConfig={\n", " 'S3OutputPath': label_output,\n", " #'KmsKeyId': 'string' If you want to encrypt the output provide KMS key here\n", " },\n", " RoleArn=labeling_job_iam_role_arn,\n", " LabelCategoryConfigS3Uri=label_caegories_json_uri,\n", " StoppingConditions={\n", " 'MaxHumanLabeledObjectCount': 123,\n", " 'MaxPercentageOfInputDatasetLabeled': 100\n", " },\n", " HumanTaskConfig={\n", " 'WorkteamArn': private_workforce_arn,\n", " 'UiConfig': {\n", " 'UiTemplateS3Uri': ui_template_uri\n", " },\n", " 'PreHumanTaskLambdaArn': pre_human_lambda,\n", " 'TaskKeywords': [\n", " 'Bounding Box',\n", " ],\n", " 'TaskTitle': 'Bounding Box task',\n", " 'TaskDescription': 'Draw bounding boxes around objects in an image',\n", " 'NumberOfHumanWorkersPerDataObject': 2,\n", " 'TaskTimeLimitInSeconds': 3600,\n", " #'TaskAvailabilityLifetimeInSeconds': 1000,\n", " 'MaxConcurrentTaskCount': 5,\n", " 'AnnotationConsolidationConfig': {\n", " 'AnnotationConsolidationLambdaArn': annotation_consolidation_lambda\n", " }\n", " }\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "1489aea9-fe0c-41f4-9c16-c4abc2d4f715", "metadata": { "tags": [] }, "outputs": [], "source": [ "if response['ResponseMetadata']['HTTPStatusCode'] == 200:\n", " print(f\"Bounding box verification job started successfully. \\nJob arn {response['LabelingJobArn']}\")\n", "else:\n", " print(\"Error with the verification job.Check the response below\")\n", " print(response)" ] }, { "cell_type": "markdown", "id": "9ab6afd4-bc27-4862-b21c-a768c25000a9", "metadata": {}, "source": [ "#### 4.2. Complete verification\n", "\n", "In this step you will complete the verification by accessing the Labeling portal. For more details on this please refer to SageMaker GroundTruth documentation page [here.](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-getting-started-step3.html)\n", "\n", "When you access the portal as a workforce member, you will be able to see the bounding boxes created by the Jumpstart model and can make adjustemnts as requried. \n", "\n", "![GT-Verification.png](images/GT-Verification.png)" ] }, { "cell_type": "markdown", "id": "9e250144", "metadata": {}, "source": [ "### 5.0. Clean up the endpoint\n", "---\n", "This step is optional. We clean up by deleting the endpoint, model configuration and removing any images processed locally" ] }, { "cell_type": "code", "execution_count": null, "id": "4f5b20d8", "metadata": {}, "outputs": [], "source": [ "# Delete the SageMaker endpoint\n", "base_model_predictor.delete_model()\n", "base_model_predictor.delete_endpoint()" ] }, { "cell_type": "code", "execution_count": null, "id": "1e8319cf", "metadata": {}, "outputs": [], "source": [ "!rm -r data" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 21, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 28, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 29, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 5 }