{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Notebook for Evaluating Content Moderation Service\n", "\n", "This notebook is part of the AWS blog series about Evaluating Content Moderation Service, and provides a sample code to streamline steps from creating ground truth labeling job to generating evaluation metrics.\n", " \n", "\n", "Prerequisite: **You must prepare an image dataset (evaluation dataset) ready for evaluation and [upload it to a S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html). The dataset should contain the moderation labels of interest to your use case.** \n", "\n", "Follow the steps in this notebook to evaluate Amazon Rekognition for content moderation:\n", "- [Step 1: Setup Notebook.](#step1)\n", "- [Step 2: Use Amazon SageMaker Ground Truth service to assign ground truth moderation labels to the evaluation dataset.](#step2)\n", "- [Step 3: Use Amazon Rekognition pre-trained moderation API to generate predicted labels for the evaluation dataset.](#step3) \n", "- [Step 4: Assess the performance.](#step4)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Step 1: Setup Notebook " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's get the latest installations of our dependencies." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install --upgrade pip\n", "!pip install boto3 --upgrade\n", "!pip install sagemaker --upgrade" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to start, it's necessary to create a bucket where to host evaluation dataset, then set proper values for following variables." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "import itertools\n", "import json\n", "import time\n", "import boto3\n", "import sagemaker\n", "\n", "BUCKET = '' # S3 bucket holds your evaluation dataset\n", "FILE_PREFIX = '' # The prefix for your evaluation dataset\n", "EXP_NAME = '' # S3 prefix for SageMaker Ground Truth labeling job, please do not add trailing \"/\"\n", "INPUT_MANIFEST = '' # Input manifest filename for SageMaker Ground Truth labeling job e.g. input.manifest\n", "OUTPUT_MANIFEST = '' # Output manifest filename for SageMaker Ground Truth labeling job e.g. output.manifest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure the bucket is in the same region as this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "role = sagemaker.get_execution_role()\n", "region = boto3.session.Session().region_name\n", "s3 = boto3.client(\"s3\")\n", "bucket_region = s3.head_bucket(Bucket=BUCKET)[\"ResponseMetadata\"][\"HTTPHeaders\"][\n", " \"x-amz-bucket-region\"\n", "]\n", "assert (\n", " bucket_region == region\n", "), \"You S3 bucket {} and this notebook need to be in the same region.\".format(BUCKET)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Use Amazon SageMaker Ground Truth service to assign ground truth moderation labels to the evaluation dataset. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Create input manifest file for Ground Truth job\n", "When listing objects in s3 bucket, the list_objects_v2 API by default return up to 1000 objects. If you have more than 1000 images in your bucket, We recommend to check [IsTruncated](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html#AmazonS3-ListObjectsV2-response-IsTruncated) value in the response and use a loop for pagination to get a complete list of objects, please refer to [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) documentation for more details. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate file that contains all images' name in evaluation dataset\n", "list_objects_v2 = s3.list_objects_v2(Bucket=BUCKET, Prefix=FILE_PREFIX, StartAfter=FILE_PREFIX)\n", "objects = list_objects_v2['Contents']\n", "while list_objects_v2['IsTruncated']:\n", " list_objects_v2 = s3.list_objects_v2(Bucket=BUCKET, Prefix=FILE_PREFIX, StartAfter=FILE_PREFIX, ContinuationToken=list_objects_v2['NextContinuationToken'])\n", " objects.extend(list_objects_v2['Contents'])\n", "\n", "filenames = [o['Key'] for o in objects if o['Size'] > 0]\n", "\n", "if os.path.isfile(INPUT_MANIFEST):\n", " os.remove(INPUT_MANIFEST)\n", "\n", "with open(INPUT_MANIFEST, 'w') as fp:\n", " for filename in filenames:\n", " formatted_file = \"s3://{}/{}\".format(BUCKET, filename)\n", " fp.write('{\"source-ref\": \"' + formatted_file + '\"}\\n')\n", " \n", "s3.upload_file(INPUT_MANIFEST, BUCKET, EXP_NAME + \"/\" + INPUT_MANIFEST)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Specify list of moderation labels for Ground Truth job\n", "To run an image classification labeling job, you need to decide on a set of classes the annotators can choose from. In our case, this list is [\"moderation_label_1\", \"moderation_label_2\", \"moderation_label_3\", \"moderation_label_4\", \"moderation_label_5\"]. In your own job you can choose any list of up to [service limit](https://docs.aws.amazon.com/sagemaker/latest/dg/input-data-limits.html#sms-label-quotas). We recommend the classes to be as unambiguous and concrete as possible. The categories should be mutually exclusive, For content moderation, you can reference [AWS Rekognition hierarchical taxonomy](https://docs.aws.amazon.com/rekognition/latest/dg/moderation.html#moderation-api) when creating those labels. In addition, be careful to make the task as objective as possible, unless of course your intention is to obtain subjective labels.\n", "\n", "To work with Ground Truth, this list needs to be converted to a .json file and uploaded to the S3 BUCKET\n", "\n", "_Note: The ordering of the labels or classes in the template governs the class indices that you will see downstream in the output manifest (this numbering is zero-indexed). In other words, the class that appears second in the template will correspond to class \"1\" in the output._" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "CLASS_LIST = [\"\", \"\", \"\", \"\", \"\", \"Safe_Content\"]\n", "print(\"Label space is {}\".format(CLASS_LIST))\n", "\n", "json_body = {\"labels\": [{\"label\": label} for label in CLASS_LIST]}\n", "with open(\"class_labels.json\", \"w\") as f:\n", " json.dump(json_body, f)\n", "\n", "s3.upload_file(\"class_labels.json\", BUCKET, EXP_NAME + \"/class_labels.json\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Create instruction template for Ground Truth Workforce\n", "All of your evaluation dataset will be annotated by human annotators. It is critical to provide clear and concise instructions that help the annotators understand what you want to achieve. When used through the AWS Console, Ground Truth helps you create the instructions using a visual wizard. When using the API, you need to create an HTML template for your instructions. Below, we prepare a very simple template and upload it to your S3 bucket." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### (Optional) Testing your template\n", "When an invalid template is generated, the labeling job will fail and the job will complete with meaningless results (the annotators may not know what to do, or the instructions may be wrong). We highly recommend that you verify that your task is correct. The following cell creates and uploads a file called instructions.template to S3. It also creates instructions.html that you can open in a local browser window. Please do so and inspect the resulting web page; it should correspond to what you want your annotators to see (the actual image to annotate will not be visible). Please refer to [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step2.html) for more details." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def make_template(test_template=False, save_fname=\"instructions.template\"):\n", " template = r\"\"\"\n", " \n", " \n", " \n", " \n", "\n", " \n", "

Dear Annotator, please tell me whether what you can see in the image. Thank you!

\n", "
\n", "\n", " \n", "
\"\"\".format(\n", " categories_str=str(CLASS_LIST)\n", " if test_template\n", " else \"{{ task.input.labels | to_json | escape }}\",\n", " )\n", "\n", " with open(save_fname, \"w\") as f:\n", " f.write(template)\n", " if test_template is False:\n", " print(template)\n", "\n", "\n", "make_template(test_template=True, save_fname=\"instructions.html\")\n", "make_template(test_template=False, save_fname=\"instructions.template\")\n", "s3.upload_file(\"instructions.template\", BUCKET, EXP_NAME + \"/instructions.template\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define pre-built lambda functions for use in the labeling job\n", "Before we submit the job, we need to define the ARNs for key components of the labeling job: 1) the workteam, 2) the annotation consolidation Lambda function, 3) the pre-labeling task Lambda function, These functions are defined by strings with region names and AWS service account numbers, so we will define a mapping below that will enable you to run this notebook in corresponding AWS region (us-east-1 in our example).\n", "\n", "See the official documentation for the available ARNs:\n", "- Set **VERIFY_USING_PRIVATE_WORKFORCE=False** if you choose to use the [public workfofce](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management-public.html) or set **VERIFY_USING_PRIVATE_WORKFORCE=True** if you elect to use a [private workteam](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-create-private-console.html) and check the corresponding ARN and set variable **private_workteam_arn**.\n", "- [Documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#SageMaker-Type-HumanTaskConfig-PreHumanTaskLambdaArn) for available pre-human ARNs. The AWS account (432418664414) is an AWS managed account that hosts AWS Lambda function used for for labeling job.\n", "- [Documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AnnotationConsolidationConfig.html#SageMaker-Type-AnnotationConsolidationConfig-AnnotationConsolidationLambdaArn) for available annotation consolidation ANRs. \n", "- [Documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UiConfig.html) for available public workforce ARN. The AWS account (394669845002) is an AWS managed account that hosts Amazon SageMaker Ground Truth public workforce resource used for labeling job. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "private_workteam_arn = \"\"\n", "\n", "VERIFY_USING_PRIVATE_WORKFORCE = True\n", "\n", "# Specify ARNs for resources needed to run an image classification job.\n", "ac_arn_map = {\n", " \"us-east-1\": \"432418664414\",\n", "}\n", "\n", "prehuman_arn = \"arn:aws:lambda:{}:{}:function:PRE-ImageMultiClass\".format(\n", " region, ac_arn_map[region]\n", ")\n", "\n", "acs_arn = \"arn:aws:lambda:{}:{}:function:ACS-ImageMultiClass\".format(region, ac_arn_map[region])\n", "\n", "workteam_arn = \"arn:aws:sagemaker:{}:394669845002:workteam/public-crowd/default\".format(region)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Create and submit the SageMaker Ground Truth job\n", "Make sure your [SageMaker execution role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) has full access to [Amazon Cognito](https://aws.amazon.com/cognito/) as it is used as an identity provider to manager workforce permission in labeling task. The output manifest file is generated after GT job is complete. Mark the file name and location for later use. You can also adjust [labeling job parameters](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateLabelingJob.html#API_CreateLabelingJob_RequestParameters) to meet your specific business requirements. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "task_description = \"What do you see: a {}?\".format(\" a \".join(CLASS_LIST))\n", "task_keywords = [\"image\", \"classification\", \"humans\"]\n", "task_title = task_description\n", "job_name = \"ground-truth-cm-\" + str(int(time.time()))\n", "\n", "human_task_config = {\n", " \"AnnotationConsolidationConfig\": {\n", " \"AnnotationConsolidationLambdaArn\": acs_arn,\n", " },\n", " \"PreHumanTaskLambdaArn\": prehuman_arn,\n", " \"MaxConcurrentTaskCount\": 200, # 200 images will be sent at a time to the workteam.\n", " \"NumberOfHumanWorkersPerDataObject\": 3, # 3 separate workers will be required to label each image.\n", " \"TaskAvailabilityLifetimeInSeconds\": 21600, # Your workteam has 6 hours to complete all pending tasks.\n", " \"TaskDescription\": task_description,\n", " \"TaskKeywords\": task_keywords,\n", " \"TaskTimeLimitInSeconds\": 60, # Each image must be labeled within 1 minutes.\n", " \"TaskTitle\": task_title,\n", " \"UiConfig\": {\n", " \"UiTemplateS3Uri\": \"s3://{}/{}/instructions.template\".format(BUCKET, EXP_NAME),\n", " },\n", "}\n", "\n", "if not VERIFY_USING_PRIVATE_WORKFORCE:\n", " human_task_config[\"PublicWorkforceTaskPrice\"] = {\n", " \"AmountInUsd\": {\n", " \"Dollars\": 0,\n", " \"Cents\": 1,\n", " \"TenthFractionsOfACent\": 2,\n", " }\n", " }\n", " human_task_config[\"WorkteamArn\"] = workteam_arn\n", "else:\n", " human_task_config[\"WorkteamArn\"] = private_workteam_arn\n", "\n", "ground_truth_request = {\n", " \"InputConfig\": {\n", " \"DataSource\": {\n", " \"S3DataSource\": {\n", " \"ManifestS3Uri\": \"s3://{}/{}/{}\".format(BUCKET, EXP_NAME, INPUT_MANIFEST),\n", " }\n", " },\n", " \"DataAttributes\": {\n", " \"ContentClassifiers\": [\"FreeOfPersonallyIdentifiableInformation\", \"FreeOfAdultContent\"]\n", " },\n", " },\n", " \"OutputConfig\": {\n", " \"S3OutputPath\": \"s3://{}/{}/output/\".format(BUCKET, EXP_NAME),\n", " },\n", " \"HumanTaskConfig\": human_task_config,\n", " \"LabelingJobName\": job_name,\n", " \"RoleArn\": role,\n", " \"LabelAttributeName\": \"category\",\n", " \"LabelCategoryConfigS3Uri\": \"s3://{}/{}/class_labels.json\".format(BUCKET, EXP_NAME),\n", "}\n", "\n", "sagemaker_client = boto3.client(\"sagemaker\")\n", "response = sagemaker_client.create_labeling_job(**ground_truth_request)\n", "labelingjob = response['LabelingJobArn'].split(\"/\")\n", "JOB_NAME = labelingjob[-1]\n", "\n", "OUTPUT_MANIFEST_KEY = \"{}/output/{}/manifests/output/{}\".format(EXP_NAME, JOB_NAME, OUTPUT_MANIFEST)\n", "\n", "print(JOB_NAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Monitor job progress\n", "A Ground Truth job can take a few hours to complete depending on the number of images that need to be labeled. One way to monitor the job's progress is via AWS Console, or you can run the next cell repeatedly to check **LabelingJobStatus** value in Json response. Wait for a successful completion of the labeling job on evaluation dataset and continue to the next step" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_client.describe_labeling_job(LabelingJobName=JOB_NAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Use Amazon Rekognition pre-trained moderation API to generate predicted labels for the evaluation dataset. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create function to generate predicted moderation labels on evaluation datasets using Amazon Rekognition moderation API. Optionally, you can adjust [MinConfidence](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectModerationLabels.html#rekognition-DetectModerationLabels-request-MinConfidence) that Amazon Rekognition must have in order to return a moderated content label." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "client=boto3.client('rekognition')\n", "\n", "def moderate_image(photo, bucket):\n", " response = client.detect_moderation_labels(Image={'S3Object':{'Bucket':bucket,'Name':photo}})\n", " return len(response['ModerationLabels'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Assess the performance. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You first retrieve ground truth moderation labels from SageMaker Ground Truth labeling job results for evaluation dataset, then run Amazon Rekognition moderation API to get predicted moderation labels for the same dataset. Considering this is a binary classification problem (safe vs unsafe content), we’re going to calculate following metrics (assuming unsafe content is positive):\n", "\n", "- [True Positive (TP)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#true_positive)\n", "- [False Positive (FP)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_negative_error)\n", "- [True Negative (TN)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#true_negative)\n", "- [False Negative (FN)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_negative_error)\n", "\n", "and corresponding evaluation metrics such as: \n", "\n", "- [False Positive Rate (FPR)](https://en.wikipedia.org/wiki/False_positive_rate)\n", "- [False Negative Rate (FNR)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#false_negative_rate)\n", "- [Recall](https://en.wikipedia.org/wiki/Precision_and_recall)\n", "- [Precision](https://en.wikipedia.org/wiki/Precision_and_recall)\n", "\n", "Depends on the size of your evaluation dataset, this step will take some time to complete, keep monitoring the progress bar till \"Processing is complete\" message is displayed." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# assume detected unsafe content is positive\n", "gt_exception_str='InternalServiceException'\n", "error_count=0\n", "safe_count=0\n", "unsafe_count=0\n", "gt_exception_count=0\n", "TP=0\n", "TN=0\n", "FP=0\n", "FN=0\n", "\n", "s3.download_file(BUCKET, OUTPUT_MANIFEST_KEY, OUTPUT_MANIFEST)\n", "\n", "f = open(OUTPUT_MANIFEST, \"r\")\n", "print('Processing is in progress')\n", "for x in f:\n", " print('...')\n", " info_list = x.split(\",\")\n", " s3_filename='images/' + info_list[0].split(\"/\")[-1].replace(\"\\\"\",'')\n", " gt_label=info_list[2].split(\":\")[-1].replace(\"\\\"\",'')\n", " cm_label_count=moderate_image(s3_filename, BUCKET)\n", " if gt_label == \"Safe_Content\":\n", " safe_count = safe_count + 1\n", " if cm_label_count == 0:\n", " TN = TN + 1\n", " else:\n", " FP = FP + 1\n", " elif gt_exception_str in gt_label:\n", " gt_exception_count = gt_exception_count + 1\n", " else:\n", " unsafe_count = unsafe_count + 1\n", " if cm_label_count == 0:\n", " FN = FN + 1\n", " else:\n", " TP = TP + 1\n", "\n", "print('Processing is complete')\n", "print(str(gt_exception_count) + \" GT tasks are failed\")\n", "print(\"TN is: \" + str(TN))\n", "print(\"FP is: \" + str(FP))\n", "print(\"FN is: \" + str(FN))\n", "print(\"TP is: \" + str(TP))\n", "\n", "# calculate evaluation metrics\n", "FPR = FP / (FP + TN)\n", "FNR = FN / (FN + TP)\n", "Recall = TP / (TP + FN)\n", "Precision = TP / (TP + FP)\n", "print(\"False Positive Rate is: \" + str(FPR))\n", "print(\"False Negative Rate is: \" + str(FNR))\n", "print(\"True Positive Rate(Recall) is: \" + str(Recall))\n", "print(\"Precision is: \" + str(Precision))" ] } ], "metadata": { "instance_type": "", "kernelspec": { "display_name": "", "language": "", "name": "" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }