# Amazon Translate Content Moderation with Profanity Mask
Amazon Translate typically chooses clean words for your translation output. But in some situations, you want to prevent words that are commonly considered as profane terms from appearing in the translated output. For example, when you’re translating video captions or subtitle content, or enabling in-game chat, and you want the translated content to be age appropriate and clear of any profanity, Amazon Translate allows you to mask the profane words and phrases using the profanity masking setting.

You can learn more about profanity masking with Amazon Translate [here](https://docs.aws.amazon.com/translate/latest/dg/customizing-translations-profanity.html).

You can learn about Amazon translate [here](https://docs.aws.amazon.com/translate/latest/dg/what-is.html).

In this tutorial we will learn how to apply profanity mask to a real time translation call. 

![Translate-text-profanity](../images/TranslateModeration.png)

- [Step 1: Setup Notebook](#step1)
- [Step 2: Setup input text file & Run Translate without profanity masking](#step2)
- [Step 3: Use Profanity mask Settings with Translate](#step3)


# Step 1: Setup Notebook 
Run the below cell to install/update Python dependencies if you run the lab using a local IDE. It is optional if you use a SageMaker Studio Juypter Notebook, which already includes the dependencies in the kernel. 

In [None]:
%pip install -qU pip
%pip install boto3 -qU

Setup Variables and import packages

In [None]:
#import packages
import boto3
import sagemaker as sm
import os
import datetime
import time
import json

# variables
data_bucket = sm.Session().default_bucket()
region = boto3.session.Session().region_name

os.environ["BUCKET"] = data_bucket
os.environ["REGION"] = region
role = sm.get_execution_role()
#The role should have SagemakerFullAccess and TranscribeFullAccess
print(f"SageMaker role is: {role}\nDefault SageMaker Bucket: s3://{data_bucket}")

s3=boto3.client('s3')
translate_client=boto3.client('translate', region_name=region)

# Step 2: Setup input text file & Run Translate without profanity masking 
You can open translate-input-text.txt file in datasets directory and see the input text. Run the below cell to upload a sample text file to the default S3 bucket for Translate to access.

In [None]:
s3_key = 'content-moderation-im/translate-text-moderation/translate-input-text.txt'
s3.upload_file('../datasets/translate-input-text.txt', data_bucket, s3_key)
file_uri = 's3://'+data_bucket+'/'+s3_key
print(file_uri)

Call Translate **TranslateText** API to translate the text from the source language (English) to the target language (French). 

In [None]:
#setting output language as French. You can change this to the desired output language
OUTPUT_LANG_CODE = 'fr'
input_text = s3.get_object(Bucket=data_bucket, Key=s3_key) 

#Read a text file line by line using splitlines object
for line in input_text["Body"].read().splitlines():
 each_line = line.decode('utf-8')
 print("Input Text:")
 print(each_line)
 print()
 
 translated_text = translate_client.translate_text(
 Text=each_line,
 SourceLanguageCode='auto',
 TargetLanguageCode=OUTPUT_LANG_CODE
 )
 print("Translated Text:{}".format(translated_text['TranslatedText']))

It looks good. But the input text has profane word which are getting translated into french. 

# Step 3: Use Profanity mask Settings with Translate 

In the below cell, we call the same Translate `TranslateText` API by passing an additional profanity parameter as part of `Settings`. This will enable the profanity setting and Amazon Translate will mask profane words and phrases in your translation output.

To mask profane words and phrases, Amazon Translate replaces them with the grawlix string “?$#@$“. This 5-character sequence is used for each profane word or phrase, regardless of the length or number of words.

Amazon Translate doesn't detect profanity in all of its supported languages. For languages that support profanity detection, see [Supported Languages and Language Codes in the Amazon Translate Developer Guide](https://docs.aws.amazon.com/translate/latest/dg/what-is.html).

In [None]:
#setting output language as French. You can change this to the desired output language
OUTPUT_LANG_CODE = 'fr'
input_text = s3.get_object(Bucket=data_bucket, Key=s3_key) 

#Read a text file line by line using splitlines object
for line in input_text["Body"].read().splitlines():
 each_line = line.decode('utf-8')
 print("Input Text:")
 print(each_line)
 print()
 
 translated_text = translate_client.translate_text(
 Text=each_line,
 SourceLanguageCode='auto',
 TargetLanguageCode=OUTPUT_LANG_CODE,
 Settings={'Profanity': 'MASK'}
 )
 print("Translated Text with profanity masked:{}".format(translated_text['TranslatedText']))

Now we see Amazon Translate translate the text to French and mask the profane word(s) with the grawlix string `?$#@$`.

# Conclusion 
In this lab we learnt how to use profanity mask with Amazon translate to filter out unsuitable and profane words 