# Notebook for Evaluating Content Moderation Service

This notebook is part of the AWS blog series about Evaluating Content Moderation Service, and provides a sample code to streamline steps from creating ground truth labeling job to generating evaluation metrics.
 

Prerequisite: **You must prepare an image dataset (evaluation dataset) ready for evaluation and [upload it to a S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html). The dataset should contain the moderation labels of interest to your use case.** 

Follow the steps in this notebook to evaluate Amazon Rekognition for content moderation:
- [Step 1: Setup Notebook.](#step1)
- [Step 2: Use Amazon SageMaker Ground Truth service to assign ground truth moderation labels to the evaluation dataset.](#step2)
- [Step 3: Use Amazon Rekognition pre-trained moderation API to generate predicted labels for the evaluation dataset.](#step3) 
- [Step 4: Assess the performance.](#step4)

## Step 1: Setup Notebook <a id="step1"></a>

First, let's get the latest installations of our dependencies.

In [None]:
!pip install --upgrade pip
!pip install boto3 --upgrade
!pip install sagemaker --upgrade

In order to start, it's necessary to create a bucket where to host evaluation dataset, then set proper values for following variables.

In [None]:
import os
import itertools
import json
import time
import boto3
import sagemaker

BUCKET = '<YOUR S3 BUCKET NAME>'          # S3 bucket holds your evaluation dataset
FILE_PREFIX = '<IMAGE PREFIX>'              # The prefix for your evaluation dataset
EXP_NAME = '<JOB PREFIX>'                   # S3 prefix for SageMaker Ground Truth labeling job, please do not add trailing "/"
INPUT_MANIFEST = '<INPUT FILENAME>'         # Input manifest filename for SageMaker Ground Truth labeling job e.g. input.manifest
OUTPUT_MANIFEST = '<OUTPUT FILENAME>'       # Output manifest filename for SageMaker Ground Truth labeling job e.g. output.manifest

Make sure the bucket is in the same region as this notebook.

In [None]:
role = sagemaker.get_execution_role()
region = boto3.session.Session().region_name
s3 = boto3.client("s3")
bucket_region = s3.head_bucket(Bucket=BUCKET)["ResponseMetadata"]["HTTPHeaders"][
    "x-amz-bucket-region"
]
assert (
    bucket_region == region
), "You S3 bucket {} and this notebook need to be in the same region.".format(BUCKET)

## Step 2: Use Amazon SageMaker Ground Truth service to assign ground truth moderation labels to the evaluation dataset. <a id="step2"></a>

### 1. Create input manifest file for Ground Truth job
When listing objects in s3 bucket, the list_objects_v2 API by default return up to 1000 objects. If you have more than 1000 images in your bucket, We recommend to check [IsTruncated](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html#AmazonS3-ListObjectsV2-response-IsTruncated) value in the response and use a loop for pagination to get a complete list of objects, please refer to [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) documentation for more details. 

In [None]:
# Generate file that contains all images' name in evaluation dataset
list_objects_v2 = s3.list_objects_v2(Bucket=BUCKET, Prefix=FILE_PREFIX, StartAfter=FILE_PREFIX)
objects = list_objects_v2['Contents']
while list_objects_v2['IsTruncated']:
    list_objects_v2 = s3.list_objects_v2(Bucket=BUCKET, Prefix=FILE_PREFIX, StartAfter=FILE_PREFIX, ContinuationToken=list_objects_v2['NextContinuationToken'])
    objects.extend(list_objects_v2['Contents'])

filenames = [o['Key'] for o in objects if o['Size'] > 0]

if os.path.isfile(INPUT_MANIFEST):
  os.remove(INPUT_MANIFEST)

with open(INPUT_MANIFEST, 'w') as fp:
    for filename in filenames:
            formatted_file = "s3://{}/{}".format(BUCKET, filename)
            fp.write('{"source-ref": "' + formatted_file + '"}\n')
            
s3.upload_file(INPUT_MANIFEST, BUCKET, EXP_NAME + "/" + INPUT_MANIFEST)

### 2. Specify list of moderation labels for Ground Truth job
To run an image classification labeling job, you need to decide on a set of classes the annotators can choose from. In our case, this list is ["moderation_label_1", "moderation_label_2", "moderation_label_3", "moderation_label_4", "moderation_label_5"]. In your own job you can choose any list of up to [service limit](https://docs.aws.amazon.com/sagemaker/latest/dg/input-data-limits.html#sms-label-quotas). We recommend the classes to be as unambiguous and concrete as possible. The categories should be mutually exclusive, For content moderation, you can reference [AWS Rekognition hierarchical taxonomy](https://docs.aws.amazon.com/rekognition/latest/dg/moderation.html#moderation-api) when creating those labels. In addition, be careful to make the task as objective as possible, unless of course your intention is to obtain subjective labels.

To work with Ground Truth, this list needs to be converted to a .json file and uploaded to the S3 BUCKET

_Note: The ordering of the labels or classes in the template governs the class indices that you will see downstream in the output manifest (this numbering is zero-indexed). In other words, the class that appears second in the template will correspond to class "1" in the output._

In [None]:
CLASS_LIST = ["<moderation_label_1>", "<moderation_label_2>", "<moderation_label_3>", "<moderation_label_4>", "<moderation_label_5>", "Safe_Content"]
print("Label space is {}".format(CLASS_LIST))

json_body = {"labels": [{"label": label} for label in CLASS_LIST]}
with open("class_labels.json", "w") as f:
    json.dump(json_body, f)

s3.upload_file("class_labels.json", BUCKET, EXP_NAME + "/class_labels.json")

### 3. Create instruction template for Ground Truth Workforce
All of your evaluation dataset will be annotated by human annotators. It is critical to provide clear and concise instructions that help the annotators understand what you want to achieve. When used through the AWS Console, Ground Truth helps you create the instructions using a visual wizard. When using the API, you need to create an HTML template for your instructions. Below, we prepare a very simple template and upload it to your S3 bucket.

#### (Optional) Testing your template
When an invalid template is generated, the labeling job will fail and the job will complete with meaningless results (the annotators may not know what to do, or the instructions may be wrong). We highly recommend that you verify that your task is correct. The following cell creates and uploads a file called instructions.template to S3. It also creates instructions.html that you can open in a local browser window. Please do so and inspect the resulting web page; it should correspond to what you want your annotators to see (the actual image to annotate will not be visible). Please refer to [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step2.html) for more details.

In [None]:
def make_template(test_template=False, save_fname="instructions.template"):
    template = r"""<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
    <crowd-form>
      <crowd-image-classifier
        name="crowd-image-classifier"
        src="{{{{ task.input.taskObject | grant_read_access }}}}"
        header="Dear Annotator, please tell me what you can see in the image. Thank you!"
        categories="{categories_str}"
      >
        <full-instructions header="Image classification instructions">
        </full-instructions>

        <short-instructions>
          <p>Dear Annotator, please tell me whether what you can see in the image. Thank you!</p>
        </short-instructions>

      </crowd-image-classifier>
    </crowd-form>""".format(
        categories_str=str(CLASS_LIST)
        if test_template
        else "{{ task.input.labels | to_json | escape }}",
    )

    with open(save_fname, "w") as f:
        f.write(template)
    if test_template is False:
        print(template)


make_template(test_template=True, save_fname="instructions.html")
make_template(test_template=False, save_fname="instructions.template")
s3.upload_file("instructions.template", BUCKET, EXP_NAME + "/instructions.template")

#### Define pre-built lambda functions for use in the labeling job
Before we submit the job, we need to define the ARNs for key components of the labeling job: 1) the workteam, 2) the annotation consolidation Lambda function, 3) the pre-labeling task Lambda function, These functions are defined by strings with region names and AWS service account numbers, so we will define a mapping below that will enable you to run this notebook in corresponding AWS region (us-east-1 in our example).

See the official documentation for the available ARNs:
- Set **VERIFY_USING_PRIVATE_WORKFORCE=False** if you choose to use the [public workfofce](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management-public.html) or set **VERIFY_USING_PRIVATE_WORKFORCE=True** if you elect to use a [private workteam](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-create-private-console.html) and check the corresponding ARN and set variable **private_workteam_arn**.
- [Documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#SageMaker-Type-HumanTaskConfig-PreHumanTaskLambdaArn) for available pre-human ARNs. The AWS account (432418664414) is an AWS managed account that hosts AWS Lambda function used for for labeling job.
- [Documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AnnotationConsolidationConfig.html#SageMaker-Type-AnnotationConsolidationConfig-AnnotationConsolidationLambdaArn) for available annotation consolidation ANRs. 
- [Documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UiConfig.html) for available public workforce ARN. The AWS account (394669845002) is an AWS managed account that hosts Amazon SageMaker Ground Truth public workforce resource used for labeling job. 

In [None]:
private_workteam_arn = "<ARN OF PRIVATE WORKFORCE>"

VERIFY_USING_PRIVATE_WORKFORCE = True

# Specify ARNs for resources needed to run an image classification job.
ac_arn_map = {
    "us-east-1": "432418664414",
}

prehuman_arn = "arn:aws:lambda:{}:{}:function:PRE-ImageMultiClass".format(
    region, ac_arn_map[region]
)

acs_arn = "arn:aws:lambda:{}:{}:function:ACS-ImageMultiClass".format(region, ac_arn_map[region])

workteam_arn = "arn:aws:sagemaker:{}:394669845002:workteam/public-crowd/default".format(region)

### 4. Create and submit the SageMaker Ground Truth job
Make sure your [SageMaker execution role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) has full access to [Amazon Cognito](https://aws.amazon.com/cognito/) as it is used as an identity provider to manager workforce permission in labeling task. The output manifest file is generated after GT job is complete. Mark the file name and location for later use. You can also adjust [labeling job parameters](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateLabelingJob.html#API_CreateLabelingJob_RequestParameters) to meet your specific business requirements. 

In [None]:
task_description = "What do you see: a {}?".format(" a ".join(CLASS_LIST))
task_keywords = ["image", "classification", "humans"]
task_title = task_description
job_name = "ground-truth-cm-" + str(int(time.time()))

human_task_config = {
    "AnnotationConsolidationConfig": {
        "AnnotationConsolidationLambdaArn": acs_arn,
    },
    "PreHumanTaskLambdaArn": prehuman_arn,
    "MaxConcurrentTaskCount": 200,               # 200 images will be sent at a time to the workteam.
    "NumberOfHumanWorkersPerDataObject": 3,      # 3 separate workers will be required to label each image.
    "TaskAvailabilityLifetimeInSeconds": 21600,  # Your workteam has 6 hours to complete all pending tasks.
    "TaskDescription": task_description,
    "TaskKeywords": task_keywords,
    "TaskTimeLimitInSeconds": 60,                # Each image must be labeled within 1 minutes.
    "TaskTitle": task_title,
    "UiConfig": {
        "UiTemplateS3Uri": "s3://{}/{}/instructions.template".format(BUCKET, EXP_NAME),
    },
}

if not VERIFY_USING_PRIVATE_WORKFORCE:
    human_task_config["PublicWorkforceTaskPrice"] = {
        "AmountInUsd": {
            "Dollars": 0,
            "Cents": 1,
            "TenthFractionsOfACent": 2,
        }
    }
    human_task_config["WorkteamArn"] = workteam_arn
else:
    human_task_config["WorkteamArn"] = private_workteam_arn

ground_truth_request = {
    "InputConfig": {
        "DataSource": {
            "S3DataSource": {
                "ManifestS3Uri": "s3://{}/{}/{}".format(BUCKET, EXP_NAME, INPUT_MANIFEST),
            }
        },
        "DataAttributes": {
            "ContentClassifiers": ["FreeOfPersonallyIdentifiableInformation", "FreeOfAdultContent"]
        },
    },
    "OutputConfig": {
        "S3OutputPath": "s3://{}/{}/output/".format(BUCKET, EXP_NAME),
    },
    "HumanTaskConfig": human_task_config,
    "LabelingJobName": job_name,
    "RoleArn": role,
    "LabelAttributeName": "category",
    "LabelCategoryConfigS3Uri": "s3://{}/{}/class_labels.json".format(BUCKET, EXP_NAME),
}

sagemaker_client = boto3.client("sagemaker")
response = sagemaker_client.create_labeling_job(**ground_truth_request)
labelingjob = response['LabelingJobArn'].split("/")
JOB_NAME = labelingjob[-1]

OUTPUT_MANIFEST_KEY = "{}/output/{}/manifests/output/{}".format(EXP_NAME, JOB_NAME, OUTPUT_MANIFEST)

print(JOB_NAME)

### 5. Monitor job progress
A Ground Truth job can take a few hours to complete depending on the number of images that need to be labeled. One way to monitor the job's progress is via AWS Console, or you can run the next cell repeatedly to check **LabelingJobStatus** value in Json response. Wait for a successful completion of the labeling job on evaluation dataset and continue to the next step

In [None]:
sagemaker_client.describe_labeling_job(LabelingJobName=JOB_NAME)

## Step 3: Use Amazon Rekognition pre-trained moderation API to generate predicted labels for the evaluation dataset. <a id="step3"></a>

Create function to generate predicted moderation labels on evaluation datasets using Amazon Rekognition moderation API. Optionally, you can adjust [MinConfidence](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectModerationLabels.html#rekognition-DetectModerationLabels-request-MinConfidence) that Amazon Rekognition must have in order to return a moderated content label.

In [None]:
client=boto3.client('rekognition')

def moderate_image(photo, bucket):
    response = client.detect_moderation_labels(Image={'S3Object':{'Bucket':bucket,'Name':photo}})
    return len(response['ModerationLabels'])

## Step 4: Assess the performance. <a id="step4"></a>

You first retrieve ground truth moderation labels from SageMaker Ground Truth labeling job results for evaluation dataset, then run Amazon Rekognition moderation API to get predicted moderation labels for the same dataset. Considering this is a binary classification problem (safe vs unsafe content), weâ€™re going to calculate following metrics (assuming unsafe content is positive):

- [True Positive (TP)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#true_positive)
- [False Positive (FP)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_negative_error)
- [True Negative (TN)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#true_negative)
- [False Negative (FN)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_negative_error)

and corresponding evaluation metrics such as: 

- [False Positive Rate (FPR)](https://en.wikipedia.org/wiki/False_positive_rate)
- [False Negative Rate (FNR)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#false_negative_rate)
- [Recall](https://en.wikipedia.org/wiki/Precision_and_recall)
- [Precision](https://en.wikipedia.org/wiki/Precision_and_recall)

Depends on the size of your evaluation dataset, this step will take some time to complete, keep monitoring the progress bar till "Processing is complete" message is displayed.

In [None]:
# assume detected unsafe content is positive
gt_exception_str='InternalServiceException'
error_count=0
safe_count=0
unsafe_count=0
gt_exception_count=0
TP=0
TN=0
FP=0
FN=0

s3.download_file(BUCKET, OUTPUT_MANIFEST_KEY, OUTPUT_MANIFEST)

f = open(OUTPUT_MANIFEST, "r")
print('Processing is in progress')
for x in f:
    print('...')
    info_list = x.split(",")
    s3_filename='images/' + info_list[0].split("/")[-1].replace("\"",'')
    gt_label=info_list[2].split(":")[-1].replace("\"",'')
    cm_label_count=moderate_image(s3_filename, BUCKET)
    if gt_label == "Safe_Content":
        safe_count = safe_count + 1
        if cm_label_count == 0:
            TN = TN + 1
        else:
            FP = FP + 1
    elif gt_exception_str in gt_label:
        gt_exception_count = gt_exception_count + 1
    else:
        unsafe_count = unsafe_count + 1
        if cm_label_count == 0:
            FN = FN + 1
        else:
            TP = TP + 1

print('Processing is complete')
print(str(gt_exception_count) + " GT tasks are failed")
print("TN is: " + str(TN))
print("FP is: " + str(FP))
print("FN is: " + str(FN))
print("TP is: " + str(TP))

# calculate evaluation metrics
FPR = FP / (FP + TN)
FNR = FN / (FN + TP)
Recall = TP / (TP + FN)
Precision = TP / (TP + FP)
print("False Positive Rate is: " + str(FPR))
print("False Negative Rate is: " + str(FNR))
print("True Positive Rate(Recall) is: " + str(Recall))
print("Precision is: " + str(Precision))