{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Labeling images using Amazon SageMaker Ground Truth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this lab, you will learn how to create Amazon SageMaker Ground Truth job for labeling images. These labeled images will then be used as dataset for training a classification machine learning model in the following labs.\n", "\n", "## Lab overview\n", "\n", "##### Step 1. Upload the images to Amazon S3 bucket\n", "##### Step 2. Create a labeling job in Amazon SageMaker Ground Truth\n", "##### Step 3. Label images using the labeling tool of Amazon SageMaker Ground Truth\n", "##### Step 4. Download the labeling result and process it for training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> When the notebook launchs for the first time, select **conda_mxnet_p36** for kernel." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 0: Download bear images from Internet\n", "\n", "Before you begin this lab, first let's download some bear and non-bear images from the Internet." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install tqdm" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python download-images.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Upload the images to Amazon S3 Bucket\n", "\n", "Replace the **S3_BUCKET** with your bucket name. We will use that S3 bucket to store images to be labelled and to store the labeling outputs. If the bucket does not exist, the below code will create one for you using the S3 bucket name you provide." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "AWS_REGION = 'us-east-1'\n", "\n", "# NOTE: S3 bucket name must begin with \"deeplens-\" for DeepLens deployment\n", "S3_BUCKET = '_replace_this_with_your_Amazon_S3_bucket_name_here_'\n", "S3_PREFIX = 'ground-truth-img-clf'\n", "GT_JOB_NAME = 'reinvent-deeplens-workshop'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we create the labeling job, it is always good to review the dataset. Here we randomly select 4 images and show them for your review." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import random\n", "import glob\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "\n", "img_fnames = glob.glob('./data/*.jpg')\n", "\n", "plt.figure(figsize=(15,10))\n", "\n", "sample_images = random.sample(img_fnames, 4)\n", "\n", "for i in range(len(sample_images)):\n", " img = mpimg.imread(sample_images[i])\n", " ax = plt.subplot(1, len(sample_images), i+1)\n", " plt.tight_layout()\n", " plt.axis('off')\n", " plt.imshow(img)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are okay with the images shown above, let's upload images to the S3 bucket so that Amazon SageMaker Ground Truth can use them for the labeling job we will soon create." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "from tqdm import tqdm\n", "import sys\n", "import time\n", "\n", "s3 = boto3.resource('s3')\n", "\n", "s3_bucket = s3.Bucket(S3_BUCKET)\n", "\n", "if s3_bucket.creation_date == None:\n", " # create S3 bucket because it does not exist yet\n", " print('Creating S3 bucket {}.'.format(S3_BUCKET))\n", " resp = s3.create_bucket(\n", " ACL='private',\n", " Bucket=S3_BUCKET\n", " )\n", " \n", "print('>> Uploading images to be annotated to Amazon S3 bucket, {}\\n'.format(S3_BUCKET))\n", "time.sleep(1)\n", "for img_fname in tqdm(img_fnames):#, file=sys.stdout):\n", " fname_only = img_fname.split('/')[-1]\n", " s3_key = '{}/images/{}'.format(S3_PREFIX, fname_only)\n", " s3_bucket.upload_file(img_fname, s3_key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2. Create a labeling job in Amazon SageMaker Ground Truth\n", "\n", "Go to \"Amazon SageMaker\" web console, and select 'Ground Truth' > 'Labeling jobs' in the left menu." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Click on \"Create labeling job\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-1.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 2. Fill in the job details\n", "\n", "- **Job name**: labeling job name\n", "- **Image dataset location**: Click on **Create manifest file** link" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-2.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 3. Create **Manifest file**\n", "\n", "- **Input dataset location**: s3://YOUR_S3_BUCKET_NAME/ground-truth-img-clf/images/\n", "- **Data type**: Choose 'Images'\n", "- Then, click on \"Create\" button" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-3.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Click on \"Use this manifest\" button." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-3-2.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you complete the above tasks, fill in **Output dataset location** with the S3 location where you want to store the labeling result. \n", "\n", "Here let's use; __s3://YOUR_S3_BUCKET/ground-truth-img-clf/outputs__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-3-3.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we create an IAM Role which is used by Amazon SageMaker Ground Truth service by\n", "\n", "- Choosing 'Create a new role',\n", "- Selecting \"Any S3 Bucket\" in the pop-up window,\n", "- and clicking on 'Create' button\n", "\n", "Once the pop-up window disapears, select the newly created IAM Role from the list.\n", "\n", "> **Note**: If you want to allow SageMaker Ground Truth to access a certain S3 bucket, put the desired bucket name in \"Specific S3 buckets' rather than choosing \"Any S3 Buckets\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-3-4.png)\n", "![](./images/l400-lab1-3-5.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 4. Select a take type\n", "\n", "In this lab, we build an image classification model. So, please select 'Image' for **Task category'** and 'Image classification' for **Task selection**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-4.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 5. Select workers and configure tool\n", "\n", "* **Worker types**: Select \"Private\" to use your own labeling workers\n", "* **Team name**: Put the team name of your workers\n", "* **Invite private annotators**: Put email addresses of labeling workers seperated by commas\n", "\n", "After this step, you can add more workers via 'Private' > 'Workers' > 'Invite new workers' in **SageMaker > Ground Truth > Labelling workforces**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-5-1.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the 'Image classification labeling tool', fill out the description with the labeling guide, and add 3 labels under 'Select an option'. The three labels are \"brown bear\", \"polar bear\" and \"no bear\", one of which will be selected by workers for a given image.\n", "\n", "Once all are filled, click on \"Create\" button at the bottom." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-5-2.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3. Label images using the labeling tool of Amazon SageMaker Ground Truth\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 1. Check out the email sent to the worker email\n", "\n", "Once the labeling job is created, workers will receive an email which includes log-in information(user name, temporal password) and URL of the labeling tool. Workers can start labeling using these information.\n", "\n", "URL of the labeling tool can be also found in 'Private workforace summary' under 'Private' tab in **SageMaker > Ground Truth > Labelling workforce** menu." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-6.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 2. Annotate images using the labeling tool\n", "\n", "When you login to the labeling tool using the provided URL, the jobs assigned to you will be listed. Select one in the list, and click on \"Start working\" button." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-7.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An image is shown as below. Select one of \"brown bear\", \"polar bear\", and \"no bear\" based on the bear shown in the image. Then, click on \"Submit\" button at the bottom to move to the next image." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](./images/l400-lab1-7-2.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4. Download the labeling result and process it for training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 1. Check the labeling result\n", "\n", "Go to **SageMaker > Ground Truth > Labelling jobs** to see the labeling job progress and the result.\n", "\n", "![](./images/l400-lab1-8.png)\n", "\n", "In the \"Output dataset location\", there are 3 folders; annotation-tool, annotations, and manifests.\n", "\n", "**annotations** folder has all the labeling output done by workers, and **manifests** folder has the final output manifest files. The final manifest file, output.manifest, is in a folder, manifests/output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**manifests/outputs/output.manifest** has the label for each images in JSON format;\n", "\n", "```json\n", "{\n", " \"source-ref\": \"s3://Your-S3-US-EAST1/ground-truth-img-clf/images/0005b6a438d43ecc.jpg\",\n", " \"bear-or-not-bear\": 2,\n", " \"bear-or-not-bear-metadata\":{\n", " \"confidence\":0.71,\n", " \"job-name\":\"labeling-job/bear-or-not-bear\",\n", " \"class-name\":\"no bear\",\n", " \"human-annotated\":\"yes\",\n", " \"creation-date\":\"2019-11-05T07:37:09.461538\",\n", " \"type\":\"groundtruth/image-classification\"\n", " }\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 2. Use output.manifest to prepare dataset\n", "\n", "Using the final manifest file, you prepare the dataset to the format which is required by your training script.\n", "\n", "\n", "```\n", ".\n", "└── train\n", " ├── brown\n", " ├── no\n", " └── polar\n", "```\n", "\n", "In this workshop, we will provide fully annotated dataset with more images. So, you can move to the next lab now." ] } ], "metadata": { "kernelspec": { "display_name": "conda_mxnet_p36", "language": "python", "name": "conda_mxnet_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 4 }