# Amazon Augmented AI (Amazon A2I) integration with Amazon Translate [Example]

## Introduction

Amazon Translate is constantly learning and evolving to provide the “perfect” output. In domain sensitive applications such as legal, medical, construction, engineering, etc., customers can always improve the translation quality by using custom terminology (https://aws.amazon.com/blogs/machine-learning/introducing-amazon-translate-custom-terminology/). This is a great approach for most of the cases but there are some outliers which might require light post-editing by human teams. The post-editing process helps businesses to understand the needs of their customers better by capturing the nuances of local language that can be lost in translation.

For such businesses and organizations who want to augment the output of Amazon Translate (and other Amazon AI services), Amazon Augmented AI (https://aws.amazon.com/augmented-ai/) (A2I) provides a managed approach to build human driven post-editing workflows. Amazon A2I brings human review to all developers, removing the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers.

In this tutorial, we will show how you can use **Amazon Augmented A2I and Amazon Translate to create a human review workflow which allows your private workforce to effectively review, correct and tag the documents translated by Amazon Translate, at scale**.

To incorporate A2I in your Amazon Translate Workflows, you will the following resources:

1. An **S3 Bucket** to store the files that you need to translate and process the output generated from the Human Review Workflow after the Human Loop has completed. 

2. A **Worker Team** to review and improve the translations done using Amazon Translate. To learn more about Private Worker Teams, see https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-private.html

3. A **Worker Task Template** to create a worker UI. The worker UI displays your input data, such as documents or images, and instructions to workers. It also provides interactive tools that the worker uses to complete your tasks. For more information, see https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-instructions-overview.html

4. A **Human Review Workflow**, also referred to as a flow definition. You use the flow definition to configure your human workforce and provide information about how to accomplish the human review task. You can create a flow definition in the Amazon Augmented AI console or with Amazon A2I APIs. To learn more about both of these options, see https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-flow-definition.html
    
When using a custom task type, as this tutorial will show, you start a human loop using the Amazon Augmented AI Runtime API. When you call `start_human_loop()` in your custom application, a task is sent to human reviewers.

## Prerequisite Setup

In [None]:
# First, let's get the latest installations of our dependencies
!pip install --upgrade pip
!pip install boto3 --upgrade
!pip install -U botocore

### Environment Setup

We need to set up the following data:
* `REGION` - Region to call A2I.
* `BUCKET_NAME` - A S3 bucket accessible by the given role
    * Used to store the input files and output results
    * Must be within the same region A2I is called from
* `WORKTEAM_ARN` - To create your **Private Workteam**, visit the instructions here: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-private.html After you have created your workteam, replace *\<YOUR-WORKTEAM-ARN\>* below
* `ROLE` - The IAM role used as part of StartHumanLoop. By default, this notebook will use the execution role. You can learn more about IAM Policies here https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html

In [None]:
REGION = '<REGION-ID>'
BUCKET_NAME = '<BUCKET-NAME>'
WORKTEAM_ARN= "<YOUR-WORKTEAM-ARN>"

### Role and Permissions

The AWS IAM Role used to execute the notebook needs to have the following policies attached:

* SagemakerFullAccess
* TranslateFullAccess  

In [None]:
from sagemaker import get_execution_role
import sagemaker

# Setting Role to the default SageMaker Execution Role
ROLE = get_execution_role()
display(ROLE)

### Setup Bucket and Paths

In [None]:
import os
import boto3
import botocore

sess = sagemaker.Session()

### Client Setup

Let's setup the clients for Amazon S3, Amazon SageMaker A2I Runtime and Amazon Translate.

In [None]:
import boto3
import io
import json
import uuid
import botocore
import time
import botocore

# Amazon SageMaker client
sagemaker = boto3.client('sagemaker', REGION)

# Amazon Translate client
translate = boto3.client('translate', REGION)

# S3 client
s3 = boto3.client('s3', REGION)

# A2I Runtime client
a2i_runtime_client = boto3.client('sagemaker-a2i-runtime', REGION)

Set up a pretty printer for the AWS SDK responses

In [None]:
import pprint

# Pretty print setup
pp = pprint.PrettyPrinter(indent=2)

# Function to pretty-print AWS SDK responses
def print_response(response):
    if 'ResponseMetadata' in response:
        del response['ResponseMetadata']
    pp.pprint(response)

## Sample Data

Let's create some sample text that we would test our translation with and store it in S3.

In [None]:
translation_text = """
Just then another visitor entered the drawing room: Prince Andrew Bolkónski, the little princess’ husband. He was a very handsome young man, of medium height, with firm, clearcut features. Everything about him, from his weary, bored expression to his quiet, measured step, offered a most striking contrast to his quiet, little wife. It was evident that he not only knew everyone in the drawing room, but had found them to be so tiresome that it wearied him to look at or listen to them. And among all these faces that he found so tedious, none seemed to bore him so much as that of his pretty wife. He turned away from her with a grimace that distorted his handsome face, kissed Anna Pávlovna’s hand, and screwing up his eyes scanned the whole company.
"""

key = "input/test.txt"

s3.put_object(Bucket=BUCKET_NAME, Key=key, Body=translation_text)

## Create Control Plane Resources

### Create a Worker Task Tempalte

Create a human task UI resource, giving a UI template in liquid html. This template will be rendered to the human workers whenever human loop is required.

For over 70 pre built UIs, check: https://github.com/aws-samples/amazon-a2i-sample-task-uis.

We will be taking [translation review and correction UI](https://github.com/aws-samples/amazon-a2i-sample-task-uis/blob/master/text/translation-review-and-correction.liquid.html) and filling in the object categories in the labels variable in the template.

In [None]:
template = """

<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>

<style>
  table, tr, th, td {
    border: 1px solid black;
    border-collapse: collapse;
    padding: 5px;
  }
</style>

<crowd-form>
    <div>
        <h1>Instructions</h1>
        <p>Please review the below translations and make corrections and improvements.</p>
        <p>Your corrections should:
          <ol>
           <li>Make the translated text more accurately express the meaning of the original text</li>
           <li>Make the translated text read more like something a person would write rather than an automated translation</li>
          </ol>
        </p>
    </div>

    <table>
      <tr>
        <th>Original</th>
        <th>Translation</th>
        <th style="width: 70px">Rating</th>
      </tr>

      {% for pair in task.input.translationPairs %}

        <tr>
          <td>{{ pair.originalText }}</td>
          <td><crowd-text-area name="translation{{ forloop.index }}" value="{{ pair.translation }}"></crowd-text-area></td>
          <td>
            <p>
              <input type="radio" id="good{{ forloop.index }}" name="rating{{ forloop.index }}" value="good" required>
              <label for="good{{ forloop.index }}">Good</label>
            </p>
            <p>
              <input type="radio" id="bad{{ forloop.index }}" name="rating{{ forloop.index }}" value="bad" required>
              <label for="bad{{ forloop.index }}">Bad</label>       
            </p>
          </td>
        </tr>

      {% endfor %}

    </table>
</crowd-form>

"""

### Create a Worker Task Template Creator Function

This function would be a higher level abstration, on the SageMaker package's method to create the Worker Task Template which we will use in the next step to create a human review workflow.

In [None]:
def create_task_ui(task_ui_name, template):
    '''
    Creates a Human Task UI resource.

    Returns:
    struct: HumanTaskUiArn
    '''
    response = sagemaker.create_human_task_ui(
        HumanTaskUiName=task_ui_name,
        UiTemplate={'Content': template})
    return response

In [None]:
# Task UI name - this value is unique per account and region. You can also provide your own value here.
taskUIName = 'a2i-translate-test-01-ue-1'

# Create task UI
humanTaskUiResponse = create_task_ui(taskUIName, template)
humanTaskUiArn = humanTaskUiResponse['HumanTaskUiArn']
print(humanTaskUiArn)

### Creating the Flow Definition

In this section, we're going to create a flow definition definition. Flow Definitions allow us to specify:

* The workforce that your tasks will be sent to.
* The instructions that your workforce will receive. This is called a worker task template.
* Where your output data will be stored.

This demo is going to use the API, but you can optionally create this workflow definition in the console as well. 

For more details and instructions, see: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-flow-definition.html.

In [None]:
def create_flow_definition(flow_definition_name):
    '''
    Creates a Flow Definition resource

    Returns:
    struct: FlowDefinitionArn
    '''
    response = sagemaker.create_flow_definition(
            FlowDefinitionName= flow_definition_name,
            RoleArn= ROLE,
            HumanLoopConfig= {
                "WorkteamArn": WORKTEAM_ARN,
                "HumanTaskUiArn": humanTaskUiArn,
                "TaskCount": 1,
                "TaskDescription": "Please review the translations done using Amazon Translate and make corrections and improvements.",
                "TaskTitle": "Review and Improve translations."
            },
            OutputConfig={
                "S3OutputPath" : "s3://"+BUCKET_NAME+"/"
            }
        )
    
    return response['FlowDefinitionArn']

Now we are ready to create our flow definition

In [None]:
# Flow definition name - this value is unique per account and region. You can also provide your own value here.
uniqueId = str(uuid.uuid4())
flowDefinitionName = f'translate-a2i-{uniqueId}' 

flowDefinitionArn = create_flow_definition(flowDefinitionName)
print(flowDefinitionArn)

### Translate Documents

Now that we have the Human Review Workflow set up, we can translate our documents and pass them over to a Human Loop for review.

In [None]:
# Get file from S3 and load it into a variable
file_contents = s3.get_object(Bucket=BUCKET_NAME, Key=key)['Body'].read().decode("utf-8", 'ignore')

# Get just the filename without prefix or suffix
fileName = key[key.rindex('/')+1:key.rindex('.')]

# Create the human loop input JSON object
humanLoopInput = {
    'SourceLanguage' : 'English',
    'TargetLanguage' : 'Spanish',
    'sourceLanguageCode':'en',
    'targetLanguageCode' : 'es',
    'translationPairs' : [],
    'rowCount': 0,
    'bucketName': BUCKET_NAME,
    'keyName': key
}

translatedText = ''
rowCount = 0

print('Splitting file and performing translation')    

# split the body by period to get individual sentences
for sentence in file_contents.split('.'):
    if len(sentence.lstrip()) > 0:
        # call translation
        translate_response = translate.translate_text(
                                Text=sentence + '.',
                                SourceLanguageCode='en',
                                TargetLanguageCode='es'
                            )

        translatedSentence = translate_response['TranslatedText']

        translationPair = {
                            'originalText': sentence + '.',
                            'translation': translatedSentence
                            }
        humanLoopInput['translationPairs'].append(translationPair)
        rowCount+=1
        translatedText = translatedText + translatedSentence + ' '

humanLoopInput['rowCount'] = rowCount

humanLoopName = 'Translate-A2I-Text' + str(int(round(time.time() * 1000)))
print('Starting human loop - ' + humanLoopName)
response = a2i_runtime_client.start_human_loop(
                            HumanLoopName=humanLoopName,
                            FlowDefinitionArn= flowDefinitionArn,
                            HumanLoopInput={
                                'InputContent': json.dumps(humanLoopInput)
                                }
                            )

# write the machine translated file to S3 bucket.
targetKey = ('machine_output/MO-{0}.txt').format(fileName)
print ('Writing translated text to '+ BUCKET_NAME + '/' + targetKey)
s3.put_object(Bucket=BUCKET_NAME, Key=targetKey, Body=translatedText.encode('utf-8'))

### Check Status of Human Loop

Let's define a function that allows us to check the status of Human Loop progress.

In [None]:
resp = a2i_runtime_client.describe_human_loop(HumanLoopName=humanLoopName)
print(f'HumanLoop Name: {humanLoopName}')
print(f'HumanLoop Status: {resp["HumanLoopStatus"]}')
print(f'HumanLoop Output Destination: {resp["HumanLoopOutput"]}')
print('\n')

humanLoopStatus = resp["HumanLoopStatus"]
outputFilePath = resp["HumanLoopOutput"]

### Wait For Work Team to Complete Task

In [None]:
workteamName = WORKTEAM_ARN[WORKTEAM_ARN.rfind('/') + 1:]
print("Navigate to the private worker portal and do the tasks. Make sure you've invited yourself to your workteam!")
print('https://' + sagemaker.describe_workteam(WorkteamName=workteamName)['Workteam']['SubDomain'])

### Check Status of Human Loop Again and process Task Results

Once the Human Loop Status has changed to completed, you can post process the results to build the final file, with Human Reviewed corrections, for future use. 

In [None]:
resp = a2i_runtime_client.describe_human_loop(HumanLoopName=humanLoopName)
humanLoopStatus = resp["HumanLoopStatus"]
outputFilePath = resp["HumanLoopOutput"]['OutputS3Uri']

if humanLoopStatus == "Completed":
    # Remove s3:// from S3 File Path
    outputFilePath = outputFilePath.replace("s3://", "")

    # recreate the output text document, including post edits.
    tmsFile = s3.get_object(Bucket=outputFilePath.split('/')[0],
                                Key="/".join(outputFilePath.split('/')[1:]))['Body'].read()

    tmsFile = json.loads(tmsFile.decode('utf-8'))
    inputContent = tmsFile['inputContent']
    rowcount = inputContent['rowCount']
    answerContent = tmsFile['humanAnswers'][0]['answerContent']
    editedContent = ''
    for index in range(1, rowcount):
        editedContent += (answerContent['translation'+str(index)] + " ")

    # extract the file name
    targetKeyName = inputContent['keyName']
    targetKeyName = targetKeyName[targetKeyName.index('/') + 1: len(targetKeyName)]

    # save the file.
    s3.put_object(Bucket=BUCKET_NAME,
                      Key='post_edits/PO-{0}'.format(targetKeyName),
                    Body=editedContent.encode('utf-8'))

    print("Output File successfully stored in s3://{0}/post_edits/PO-{1}".format(BUCKET_NAME,targetKeyName))
elif humanLoopStatus == "InProgress":
    print("Navigate to the private worker portal and do the tasks. Make sure you've invited yourself to your workteam!")
    print('https://' + sagemaker.describe_workteam(WorkteamName=workteamName)['Workteam']['SubDomain'])

Your translated and human reviewed files, are now available in your S3 Bucket 

## The End