{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Amazon SageMaker Ground Truth Demonstration for Image Semantic Segmentation\n", "\n", "1. [Introduction](#Introduction)\n", "2. [Run a Ground Truth labeling job](#Run-a-Ground-Truth-labeling-job)\n", " 1. [Prepare the data](#Prepare-the-data)\n", " 2. [Prepare labeling input manifest file](#Prepare-labeling-input-manifest-file)\n", " 3. [Specify Label categories](#Specify-Labels-Categories)\n", " 4. [Create the instruction template](#Create-A-Worker-Task-Template)\n", " 5. [Specify Parameters for Labeling Job](#Use-the-CreateLabelingJob-API-JOB-1)\n", "3. [Launch an Adjustment Job](#Launch-Second-Level-Reviewer-Tasks-In-Worker-Portal)\n", " 1. [Choose Labeling Job Type for Job 2](#Choose-Labeling-Job-Type-Job-2)\n", " 2. [Specify Label categories for Job 2](#Specify-Label-Categories-for-Job-2)\n", " 3. [Create the instruction template](#Create-A-Worker-Task-Template-for-Job-2)\n", " 4. [Specify Parameters for Labeling Job](#Use-the-CreateLabelingJob-API-to-Create-a-2nd-Labeling-Job)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "\n", "This sample notebook takes you through an end-to-end workflow to demonstrate the functionality Image Semantic Segmentation of SageMaker Ground Truth. We'll start with (1) an unlabeled image data set (2) acquire labels for all the images (3) inital polygon labeling and then (3) an adjustment job per initial job. Before you begin, we highly recommend you start a Ground Truth labeling job through the AWS Console first to familiarize yourself with the workflow. The AWS Console offers less flexibility than the API, but is simple to use. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get latest version of AWS python SDK" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install -q --upgrade pip\n", "!pip install awscli -q --upgrade\n", "!pip install botocore -q --upgrade\n", "!pip install boto3 -q --upgrade\n", "!pip install sagemaker -q --upgrade\n", "\n", "# NOTE: Restart Kernel after the above command" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import botocore\n", "import json\n", "import time\n", "import sagemaker\n", "import re\n", "import os\n", "import s3fs\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prerequisites\n", "\n", "You will create some of the resources you need to launch a Ground Truth streaming labeling job in this notebook. \n", "\n", "A work team - A work team is a group of workers that complete labeling tasks. If you want to preview the worker UI and execute the labeling task you will need to create a private work team, add yourself as a worker to this team, and provide the work team ARN below. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "WORKTEAM_ARN = \"<>\"\n", "\n", "print(f\"This notebook will use the work team ARN: {WORKTEAM_ARN}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Make sure workteam arn is populated if private work team is chosen\n", "assert WORKTEAM_ARN != \"<>\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The IAM execution role you used to create this notebook instance must have the following permissions: \n", " * AWS managed policy [AmazonSageMakerGroundTruthExecution](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonSageMakerGroundTruthExecution). Run the following code-block to see your IAM execution role name. This [GIF](add-policy.gif) demonstrates how to add this policy to an IAM role in the IAM console. You can also find instructions in the IAM User Guide: [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console).\n", " * When you create your role, you specify Amazon S3 permissions. Make sure that your IAM role has access to the S3 bucket that you plan to use in this example. If you do not specify an S3 bucket in this notebook, the default bucket in the AWS region you are running this notebook instance will be used. If you do not require granular permissions, you can attach [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess) to your role." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "role = sagemaker.get_execution_role()\n", "role_name = role.split(\"/\")[-1]\n", "print(\n", " \"IMPORTANT: Make sure this execution role has the AWS Managed policy AmazonGroundTruthExecution attached.\"\n", ")\n", "print(\"********************************************************************************\")\n", "print(\"The IAM execution role name:\", role_name)\n", "print(\"The IAM execution role ARN:\", role)\n", "print(\"********************************************************************************\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run-a-Ground-Truth-labeling-job" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare-the-data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The sample images to be labeled in this tutorial are pulled from the publicly available [Caltech 101 dataset](https://data.caltech.edu/records/mzrjq-6wc02) (Li, F.-F., Andreeto, M., Ranzato, M. A., & Perona, P. (2022). Caltech 101 (Version 1.0) [Data set]. CaltechDATA), which contains pictures in 101 object categories. To minimize the cost of this tutorial, you use a sample set of 10 images, with two images from each of the following categories: airplanes, cars, ferries, helicopters, and motorbikes. But the steps to launch a labeling job for a larger dataset are the same as the ones in this tutorial. The sample set of 10 images is already available in the Amazon S3 bucket sagemaker-sample-files." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "\n", "\n", "sess = sagemaker.Session()\n", "bucket = sess.default_bucket()\n", "\n", "!aws s3 sync s3://sagemaker-sample-files/datasets/image/caltech-101/inference/ s3://{bucket}/images/\n", "\n", "print('Copy and paste the below link into a web browser to confirm the ten images were successfully uploaded to your bucket:')\n", "print(f'https://s3.console.aws.amazon.com/s3/buckets/{bucket}/images/')\n", "\n", "print('\\nWhen prompted by Sagemaker to enter the S3 location for input datasets, you can paste in the below S3 URL')\n", "\n", "print(f's3://{bucket}/images/')\n", "\n", "print('\\nWhen prompted by Sagemaker to Specify a new location, you can paste in the below S3 URL')\n", "\n", "print(f's3://{bucket}/labeled-data/')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare-labeling-input-manifest-file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SageMaker Ground Truth operates using manifests. When using a modality like image classification, a single image corresponds to a single entry in a manifest and a given manifest will directly contain paths for all of the images to be labeled. To learn how to create an input manifest file, see [Use an Input Manifest File](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-input-data-input-manifest.html). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3 = boto3.resource('s3')\n", "INPUT_MANIFEST_S3_PREFIX = \"s3://\" + bucket + \"/\" \n", "print(INPUT_MANIFEST_S3_PREFIX)\n", "INPUT_MANIFEST_FILE_NAME = \"input.manifest\" #Provide an Input manifest filename\n", "INPUT_MANIFEST = INPUT_MANIFEST_S3_PREFIX + INPUT_MANIFEST_FILE_NAME\n", "my_bucket=s3.Bucket(name=bucket)\n", "\n", "\n", "img_list=[]\n", "for obj in my_bucket.objects.filter(Delimiter='/', Prefix='images/'):\n", " img_list.append(obj.key)\n", "\n", "Input_image_s3=[]\n", "for i in range(1,len(img_list)):\n", " Input_image_s3.append(INPUT_MANIFEST_S3_PREFIX+img_list[i])\n", "\n", "manifest_lines = [\n", " {\n", " \"source\": image\n", " }\n", " for image in Input_image_s3\n", "]\n", "\n", "s3 = s3fs.S3FileSystem(anon=False)\n", "with s3.open(f\"{INPUT_MANIFEST}\",'w') as f:\n", " f.writelines([json.dumps(m)+\"\\n\" for m in manifest_lines])\n", "print(f\"Input manifest file created at {INPUT_MANIFEST} with {len(manifest_lines)} tasks.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Labeling-Job-Name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following cells will create a name for your labeling job. This labeling job name and these topics will be used in your CreateLabelingJob request later in this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Job Name\n", "LABELING_JOB_NAME = \"GroundTruth-Semantic-Seg-\" + str(int(time.time()))\n", "\n", "print(\"Your labeling job name will be :\", LABELING_JOB_NAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Choose-Labeling-Job-Built-In-Task-Type\n", "\n", "Ground Truth supports a variety of built-in task types which streamline the process of creating image, text, video, video frame, and 3D point cloud labeling jobs. The image bounding box task type will be used by default for this demonstration. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "task_type = \"Image Semantic Segmentation\"\n", "print(f\"Your task type: {task_type}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "task_type_map = {\n", " \"Image Semantic Segmentation\": \"SemanticSegmentation\"\n", "}\n", "\n", "arn_region_map = {\n", " \"us-west-2\": \"081040173940\",\n", " \"us-east-1\": \"432418664414\",\n", " \"us-east-2\": \"266458841044\",\n", " \"eu-west-1\": \"568282634449\",\n", " \"eu-west-2\": \"487402164563\",\n", " \"ap-northeast-1\": \"477331159723\",\n", " \"ap-northeast-2\": \"845288260483\",\n", " \"ca-central-1\": \"918755190332\",\n", " \"eu-central-1\": \"203001061592\",\n", " \"ap-south-1\": \"565803892007\",\n", " \"ap-southeast-1\": \"377565633583\",\n", " \"ap-southeast-2\": \"454466003867\",\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "region = boto3.session.Session().region_name\n", "task_type_suffix = task_type_map[task_type]\n", "region_account = arn_region_map[region]\n", "PRE_HUMAN_TASK_LAMBDA = f\"arn:aws:lambda:{region}:{region_account}:function:PRE-{task_type_suffix}\"\n", "POST_ANNOTATION_LAMBDA = f\"arn:aws:lambda:{region}:{region_account}:function:ACS-{task_type_suffix}\"\n", "print(PRE_HUMAN_TASK_LAMBDA)\n", "print(POST_ANNOTATION_LAMBDA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specify-Labels-Categories\n", "\n", "You specify the labels that you want workers to use to annotate your data in a label category configuration file. Workers can assign one or more attributes to annotations to give more information about that object. \n", "\n", "For all task types, you can use the following cell to identify the labels you use for your labeling job. To create a label category configuration file with label category attributes, see [Create a Labeling Category Configuration File with Label Category Attributes\n", "](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-label-cat-config-attributes.html) in the Amazon SageMaker developer guide. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LABEL_CATEGORIES = [\"Airplane\", \"Car\", \"Ferry\", \"Helicopter\", \"Motorbike\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following cell will create a label category configuration file using the labels specified above. \n", "\n", "**IMPORTANT**: Make sure you have added label categories above and they appear under `labels` when you run the following cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Specify labels and this notebook will upload and a label category configuration file to S3.\n", "json_body = {\n", " \"document-version\": \"2018-11-28\",\n", " \"labels\": [{\"label\": label} for label in LABEL_CATEGORIES],\n", "}\n", "with open(\"class_labels.json\", \"w\") as f:\n", " json.dump(json_body, f)\n", "\n", "print(\"Your label category configuration file:\")\n", "print(\"\\n\", json.dumps(json_body, indent=2))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3 = boto3.client(\"s3\")\n", "s3.upload_file(\"class_labels.json\", bucket, \"class_labels.json\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LABEL_CATEGORIES_S3_URI = f\"s3://{bucket}/class_labels.json\"\n", "print(f\"You should now see class_labels.json in {LABEL_CATEGORIES_S3_URI}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create-A-Worker-Task-Template\n", "\n", "Part or all of your images will be annotated by human annotators. It is essential to provide good instructions. Good instructions are:\n", "\n", "1. Concise. We recommend limiting verbal/textual instruction to two sentences and focusing on clear visuals.\n", "2. Visual. In the case of object detection, we recommend providing several labeled examples with different numbers of boxes.\n", "3. When used through the AWS Console, Ground Truth helps you create the instructions using a visual wizard. When using the API, you need to create an HTML template for your instructions. \n", "\n", "NOTE: If you use any images in your template (as we do), they need to be publicly accessible. You can enable public access to files in your S3 bucket through the S3 Console, as described in S3 Documentation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import display, HTML\n", "\n", "\n", "def make_template(save_fname=\"instructions.template\"):\n", " template = r\"\"\"\n", " \n", " \n", " \n", "\n", "
    \n", "
  1. Inspect the image
  2. \n", "
  3. Determine if the specified label is/are visible in the picture.
  4. \n", "
\n", " \n", "\n", "
\n", " \n", "

Use the tools to label the requested items in the image

\n", "
\n", " \n", "
\n", "\n", " \"\"\".format()\n", " with open(save_fname, \"w\") as f:\n", " f.write(template)\n", " \n", "make_template(save_fname=\"instructions.template\")\n", "result = s3.upload_file(\"instructions.template\", bucket, \"instructions.template\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specify-Parameters-for-Labeling-Job\n", "\n", "\n", "To learn more about these parameters, use the following documentation:\n", "* [TaskTitle](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskTitle)\n", "* [TaskDescription](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskDescription)\n", "* [TaskKeywords](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskKeywords)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TASK_TITLE = \"Semantic Segmentation\"\n", "\n", "TASK_DESCRIPTION = \"Semantic Segmentation\"\n", "\n", "TASK_KEYWORDS = [\"Semantic Segmentation\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The path in Amazon S3 to your worker task template or human task UI\n", "HUMAN_UI = []\n", "\n", "UI_TEMPLATE_S3_URI = f\"s3://{bucket}/instructions.template\"\n", "HUMAN_UI.append(UI_TEMPLATE_S3_URI)\n", "UI_CONFIG_PARAM = \"UiTemplateS3Uri\"\n", "\n", "print(f\"{UI_CONFIG_PARAM} resource that will be used: {HUMAN_UI[0]}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If you want to store your output manifest in a different folder, provide an OUTPUT_PATH.\n", "OUTPUT_FOLDER_PREFIX = \"/gt-demo-output\"\n", "OUTPUT_BUCKET = \"s3://\" + bucket + OUTPUT_FOLDER_PREFIX\n", "print(\"Your output data will be stored in:\", OUTPUT_BUCKET)\n", "\n", "# An IAM role with AmazonGroundTruthExecution policies attached.\n", "# This must be the same role that you used to create this notebook instance.\n", "ROLE_ARN = role" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use-the-CreateLabelingJob-API-JOB-1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LABEL_ATTRIBUTE_NAME = LABELING_JOB_NAME + \"-ref\"\n", "\n", "human_task_config = {\n", " \"PreHumanTaskLambdaArn\": PRE_HUMAN_TASK_LAMBDA,\n", " \"MaxConcurrentTaskCount\": 100, # Maximum of 100 objects will be available to the workteam at any time\n", " \"NumberOfHumanWorkersPerDataObject\": 1, # We will obtain and consolidate 1 human annotationsfor each image.\n", " \"TaskAvailabilityLifetimeInSeconds\": 21600, # Your workteam has 6 hours to complete all pending tasks.\n", " \"TaskDescription\": TASK_DESCRIPTION,\n", " \"WorkteamArn\": WORKTEAM_ARN,\n", " \"AnnotationConsolidationConfig\": {\"AnnotationConsolidationLambdaArn\": POST_ANNOTATION_LAMBDA},\n", " \"TaskKeywords\": TASK_KEYWORDS,\n", " \"TaskTimeLimitInSeconds\": 600, # Each image must be labeled within 10 minutes.\n", " \"TaskTitle\": TASK_TITLE,\n", " \"UiConfig\": {UI_CONFIG_PARAM: HUMAN_UI[0]},\n", "}\n", "\n", "\n", "human_task_config[\"WorkteamArn\"] = WORKTEAM_ARN\n", "\n", "ground_truth_request = {\n", " 'InputConfig':{\n", " 'DataSource': {\n", " 'S3DataSource': {\n", " 'ManifestS3Uri': INPUT_MANIFEST }\n", " }},\n", " \"HumanTaskConfig\": human_task_config,\n", " \"LabelAttributeName\": LABEL_ATTRIBUTE_NAME,\n", " \"LabelCategoryConfigS3Uri\": LABEL_CATEGORIES_S3_URI,\n", " \"LabelingJobName\": LABELING_JOB_NAME,\n", " \"OutputConfig\": {\"S3OutputPath\": OUTPUT_BUCKET},\n", " \"RoleArn\": ROLE_ARN,\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DataAttributes\n", "You should not share explicit, confidential, or personal information or protected health information with the Amazon Mechanical Turk workforce. \n", "\n", "If you are using Amazon Mechanical Turk workforce, you must verify that your data is free of personal, confidential, and explicit content and protected health information using this code cell. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ground_truth_request[\"InputConfig\"][\"DataAttributes\"] = {\n", " \"ContentClassifiers\": [\"FreeOfPersonallyIdentifiableInformation\", \"FreeOfAdultContent\"]\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Your create labeling job request:\\n\", json.dumps(ground_truth_request, indent=4))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_client = boto3.client(\"sagemaker\")\n", "sagemaker_client.create_labeling_job(**ground_truth_request)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use the DescribeLabelingJob API to describe Labeling Job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Launch-Second-Level-Reviewer-Tasks-In-Worker-Portal\n", "\n", "Use the following section to set up your second labeling job. This labeling job will be chained to the first job that you set up above. This means the output data from the first labeling job will be sent to this labeling job as input data. \n", "\n", "Bounding box, semantic segmentation, and all video frame and 3D point cloud labeling job types support an *adjustment* task which you can use to have worker modify and add to the annotations created in the first labeling job for that respective task type. You can select one of these adjustment task types below. \n", "\n", "If you do not choose an adjustment task type, the output data from this second job will contain any new labels that workers add, as well as the labels added in the first labeling job. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Job Name\n", "LABELING_JOB_NAME2 = \"GroundTruth-Semantic-Seg-ADJ-\" + str(int(time.time()))\n", "\n", "print(\"Your labeling job 2 name will be :\", LABELING_JOB_NAME2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Choose-Labeling-Job-Type-Job-2\n", "\n", "Ground Truth supports a variety of built-in task types which streamline the process of creating image, text, video, video frame, and 3D point cloud labeling jobs. \n", "\n", "Specify the S3 URI of your input manifest file below. The S3 URI looks similar to `s3://your-bucket/path-to-input-manifest/input-manifest.manifest`. To learn more about each task type, see [Built-in Task Types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "task_type2 = \"Adjustment Semantic Segmentation\"\n", "print(f\"Your task type: {task_type2}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following cells will configure the lambda functions Ground Truth uses to pre-process your input data and output data. These cells will configure your PreHumanTaskLambdaArn and AnnotationConsolidationLambdaArn.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "task_type_map2 = {\n", " \"Adjustment Semantic Segmentation\": \"AdjustmentSemanticSegmentation\"\n", "}\n", "\n", "\n", "arn_region_map = {\n", " \"us-west-2\": \"081040173940\",\n", " \"us-east-1\": \"432418664414\",\n", " \"us-east-2\": \"266458841044\",\n", " \"eu-west-1\": \"568282634449\",\n", " \"eu-west-2\": \"487402164563\",\n", " \"ap-northeast-1\": \"477331159723\",\n", " \"ap-northeast-2\": \"845288260483\",\n", " \"ca-central-1\": \"918755190332\",\n", " \"eu-central-1\": \"203001061592\",\n", " \"ap-south-1\": \"565803892007\",\n", " \"ap-southeast-1\": \"377565633583\",\n", " \"ap-southeast-2\": \"454466003867\",\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "region = boto3.session.Session().region_name\n", "task_type_suffix2 = task_type_map2[task_type2]\n", "region_account = arn_region_map[region]\n", "PRE_HUMAN_TASK_LAMBDA2 = (\n", " f\"arn:aws:lambda:{region}:{region_account}:function:PRE-{task_type_suffix2}\"\n", ")\n", "POST_ANNOTATION_LAMBDA2 = (\n", " f\"arn:aws:lambda:{region}:{region_account}:function:ACS-{task_type_suffix2}\"\n", ")\n", "print(PRE_HUMAN_TASK_LAMBDA2)\n", "print(POST_ANNOTATION_LAMBDA2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specify-Label-Categories-for-Job-2\n", "\n", "You specify the labels that you want workers to use to annotate your data in a label category configuration file. Workers can assign one or more attributes to annotations to give more information about that object. \n", "\n", "For all task types, you can use the following cell to identify the labels you use for your labeling job. To create a label category configuration file with label category attributes, see [Create a Labeling Category Configuration File with Label Category Attributes\n", "](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-label-cat-config-attributes.html) in the Amazon SageMaker developer guide. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add label categories of your choice\n", "LABEL_CATEGORIES = [\"Airplane\", \"Car\", \"Ferry\", \"Helicopter\", \"Motorbike\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following cell will create a label category configuration file using the labels specified above. \n", "\n", "**IMPORTANT**: Make sure you have added label categories above and they appear under `labels` when you run the following cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Specify labels and this notebook will upload and a label category configuration file to S3.\n", "json_body = {\n", " \"document-version\": \"2018-11-28\",\n", " \"labels\": [{\"label\": label} for label in LABEL_CATEGORIES],\n", "}\n", "with open(\"class_labels2.json\", \"w\") as f:\n", " json.dump(json_body, f)\n", "\n", "print(\"Your label category configuration file:\")\n", "print(\"\\n\", json.dumps(json_body, indent=2))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3.upload_file(\"class_labels2.json\", bucket, \"class_labels2.json\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LABEL_CATEGORIES_S3_URI2 = f\"s3://{bucket}/class_labels2.json\"\n", "print(f\"You should now see class_labels2.json in {LABEL_CATEGORIES_S3_URI2}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create-A-Worker-Task-Template-for-Job-2\n", "\n", "Part or all of your images will be annotated by human annotators. It is essential to provide good instructions. Good instructions are:\n", "\n", "1. Concise. We recommend limiting verbal/textual instruction to two sentences and focusing on clear visuals.\n", "2. Visual. In the case of object detection, we recommend providing several labeled examples with different numbers of boxes.\n", "3. When used through the AWS Console, Ground Truth helps you create the instructions using a visual wizard. When using the API, you need to create an HTML template for your instructions. \n", "\n", "NOTE: If you use any images in your template (as we do), they need to be publicly accessible. You can enable public access to files in your S3 bucket through the S3 Console, as described in S3 Documentation. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import HTML, display\n", "\n", "\n", "def make_template(save_fname=\"instructions2.template\"):\n", " template = r\"\"\"\n", "\n", " \n", " \n", "
  1. Read the task carefully and inspect the image.
  2. Read the options and review the examples provided to understand more about the labels.
  3. Choose the appropriate label that best suits the image.
\n", "
\n", " \n", "

Good example

Enter description to explain a correctly done segmentation

Bad example

Enter description of an incorrectly done segmentation

\n", "
\n", "
\n", "
\"\"\".format(\n", " label_attribute_name_from_prior_job=LABEL_ATTRIBUTE_NAME\n", " )\n", " with open(save_fname, \"w\") as f:\n", " f.write(template)\n", "\n", "\n", "make_template(save_fname=\"instructions2.template\")\n", "s3.upload_file(\"instructions2.template\", bucket, \"instructions2.template\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Image, Text, and Custom Labeling Jobs \n", "\n", "For all image and text based built-in task types, you can find a sample worker task template on that task type page. Find the page for your task type on [Built-in Task Types](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html). You will see an example template under the section **Create a {Insert-Task-Type} Job (API)**. Ground Truth uses this template to generate your human task UI. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Important**: If you use your own template with the following `make_template` function to create and upload a worker task template to Amazon S3, you must add an extra pair of `{}` brackets around each Liquid element. For example, if the template contains `{{ task.input.labels | to_json | escape }}`, this line should look as follows in the `make_template` variable `template`: `{{{{ task.input.labels | to_json | escape }}}}`. The following semantic segmentation template already includes an extra pair of `{}` brackets around each Liquid element." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specify-Parameters-for-Labeling-Job-2\n", "\n", "To learn more about these parameters, use the following documentation:\n", "* [TaskTitle](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskTitle)\n", "* [TaskDescription](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskDescription)\n", "* [TaskKeywords](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-TaskKeywords)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TASK_TITLE2 = \"Adjust Polygons\"\n", "\n", "TASK_DESCRIPTION2 = \"Adjust polygons around specified objects in your images\"\n", "\n", "# Keywords for your task, in a string-array. ex) ['image classification', 'image dataset']\n", "TASK_KEYWORDS2 = [\"Semantic Segmentation\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The path in Amazon S3 to your worker task template or human task UI\n", "HUMAN_UI2 = []\n", "\n", "UI_TEMPLATE_S3_URI2 = f\"s3://{bucket}/instructions2.template\"\n", "HUMAN_UI2.append(UI_TEMPLATE_S3_URI2)\n", "UI_CONFIG_PARAM = \"UiTemplateS3Uri\"\n", "\n", "print(f\"{UI_CONFIG_PARAM} resource that will be used: {HUMAN_UI2[0]}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If you want to store your output manifest in a different folder, provide an OUTPUT_PATH.\n", "OUTPUT_FOLDER_PREFIX = \"/gt-adjustment-demo-output\"\n", "OUTPUT_BUCKET = \"s3://\" + bucket + OUTPUT_FOLDER_PREFIX\n", "print(\"Your output data will be stored in:\", OUTPUT_BUCKET)\n", "\n", "# An IAM role with AmazonGroundTruthExecution policies attached.\n", "# This must be the same role that you used to create this notebook instance.\n", "ROLE_ARN = role" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare Input Data for the Adjustment Job\n", "\n", "Pass the Output Manifest from the Initial Labeling job as Input to the Adjustment Job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_output_s3_uri(job_name):\n", " res=sagemaker.describe_labeling_job(LabelingJobName=job_name)\n", " return res['LabelingJobOutput']['OutputDatasetS3Uri']\n", "\n", "def download_output_data(s3_uri): \n", " manifest_contents = S3Downloader.read_file(s3_uri)\n", " manifest_lines = [json.loads(line) for line in manifest_contents.splitlines() if line.strip()]\n", " return manifest_lines" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Get Output manifest from Initial Labeling Job\n", "job_name = LABELING_JOB_NAME" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Prerequisites\n", "from sagemaker.s3 import S3Downloader\n", "import boto3\n", "s3=boto3.client('s3')\n", "sagemaker=boto3.client('sagemaker')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Pass the Output Manifest from the Initial Labeling job as Input to the Adjustment Job\n", "output_manifest_s3_uri=get_output_s3_uri(job_name)\n", "response=download_output_data(output_manifest_s3_uri)\n", "INPUT_MANIFEST_ADJ = output_manifest_s3_uri\n", "print(INPUT_MANIFEST_ADJ)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use-the-CreateLabelingJob-API-to-Create-a-2nd-Labeling-Job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LABEL_ATTRIBUTE_NAME2 = LABELING_JOB_NAME2 + \"-ref\"\n", "\n", "human_task_config = {\n", " \"PreHumanTaskLambdaArn\": PRE_HUMAN_TASK_LAMBDA2,\n", " \"MaxConcurrentTaskCount\": 100, # Maximum of 100 objects will be available to the workteam at any time\n", " \"NumberOfHumanWorkersPerDataObject\": 1, # We will obtain and consolidate 1 human annotationsfor each image.\n", " \"TaskAvailabilityLifetimeInSeconds\": 21600, # Your workteam has 6 hours to complete all pending tasks.\n", " \"TaskDescription\": TASK_DESCRIPTION2,\n", " # If using public workforce, specify \"PublicWorkforceTaskPrice\"\n", " \"WorkteamArn\": WORKTEAM_ARN,\n", " \"AnnotationConsolidationConfig\": {\"AnnotationConsolidationLambdaArn\": POST_ANNOTATION_LAMBDA2},\n", " \"TaskKeywords\": TASK_KEYWORDS2,\n", " \"TaskTimeLimitInSeconds\": 600, # Each image must be labeled within 10 minutes.\n", " \"TaskTitle\": TASK_TITLE2,\n", " \"UiConfig\": {UI_CONFIG_PARAM: HUMAN_UI2[0]},\n", "}\n", "\n", "human_task_config[\"WorkteamArn\"] = WORKTEAM_ARN\n", "\n", "ground_truth_request2 = {\n", " 'InputConfig':{\n", " 'DataSource': {\n", " 'S3DataSource': {\n", " 'ManifestS3Uri': INPUT_MANIFEST_ADJ }\n", " }},\n", " \"HumanTaskConfig\": human_task_config,\n", " \"LabelAttributeName\": LABEL_ATTRIBUTE_NAME2,\n", " \"LabelCategoryConfigS3Uri\": LABEL_CATEGORIES_S3_URI2,\n", " \"LabelingJobName\": LABELING_JOB_NAME2,\n", " \"OutputConfig\": {\"S3OutputPath\": OUTPUT_BUCKET},\n", " \"RoleArn\": ROLE_ARN,\n", "}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### DataAttributes\n", "You should not share explicit, confidential, or personal information or protected health information with the Amazon Mechanical Turk workforce. \n", "\n", "If you are using Amazon Mechanical Turk workforce, you must verify that your data is free of personal, confidential, and explicit content and protected health information using this code cell. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ground_truth_request2[\"InputConfig\"][\"DataAttributes\"] = {\n", " \"ContentClassifiers\": [\"FreeOfPersonallyIdentifiableInformation\", \"FreeOfAdultContent\"]\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Your create labeling job request:\\n\", json.dumps(ground_truth_request2, indent=4))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_client = boto3.client(\"sagemaker\")\n", "sagemaker_client.create_labeling_job(**ground_truth_request2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use the DescribeLabelingJob API to describe 2nd Labeling Job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_client.describe_labeling_job(LabelingJobName=LABELING_JOB_NAME2)" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" } }, "nbformat": 4, "nbformat_minor": 4 }