{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Amazon Translate Content Moderation with Profanity Mask\n", "Amazon Translate typically chooses clean words for your translation output. But in some situations, you want to prevent words that are commonly considered as profane terms from appearing in the translated output. For example, when you’re translating video captions or subtitle content, or enabling in-game chat, and you want the translated content to be age appropriate and clear of any profanity, Amazon Translate allows you to mask the profane words and phrases using the profanity masking setting.\n", "\n", "You can learn more about profanity masking with Amazon Translate [here](https://docs.aws.amazon.com/translate/latest/dg/customizing-translations-profanity.html).\n", "\n", "You can learn about Amazon translate [here](https://docs.aws.amazon.com/translate/latest/dg/what-is.html).\n", "\n", "In this tutorial we will learn how to apply profanity mask to a real time translation call. \n", "\n", "![Translate-text-profanity](../images/TranslateModeration.png)\n", "\n", "- [Step 1: Setup Notebook](#step1)\n", "- [Step 2: Setup input text file & Run Translate without profanity masking](#step2)\n", "- [Step 3: Use Profanity mask Settings with Translate](#step3)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 1: Setup Notebook \n", "Run the below cell to install/update Python dependencies if you run the lab using a local IDE. It is optional if you use a SageMaker Studio Juypter Notebook, which already includes the dependencies in the kernel. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pip\n", "%pip install boto3 -qU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setup Variables and import packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#import packages\n", "import boto3\n", "import sagemaker as sm\n", "import os\n", "import datetime\n", "import time\n", "import json\n", "\n", "# variables\n", "data_bucket = sm.Session().default_bucket()\n", "region = boto3.session.Session().region_name\n", "\n", "os.environ[\"BUCKET\"] = data_bucket\n", "os.environ[\"REGION\"] = region\n", "role = sm.get_execution_role()\n", "#The role should have SagemakerFullAccess and TranscribeFullAccess\n", "print(f\"SageMaker role is: {role}\\nDefault SageMaker Bucket: s3://{data_bucket}\")\n", "\n", "s3=boto3.client('s3')\n", "translate_client=boto3.client('translate', region_name=region)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 2: Setup input text file & Run Translate without profanity masking \n", "You can open translate-input-text.txt file in datasets directory and see the input text. Run the below cell to upload a sample text file to the default S3 bucket for Translate to access." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3_key = 'content-moderation-im/translate-text-moderation/translate-input-text.txt'\n", "s3.upload_file('../datasets/translate-input-text.txt', data_bucket, s3_key)\n", "file_uri = 's3://'+data_bucket+'/'+s3_key\n", "print(file_uri)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Call Translate **TranslateText** API to translate the text from the source language (English) to the target language (French). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#setting output language as French. You can change this to the desired output language\n", "OUTPUT_LANG_CODE = 'fr'\n", "input_text = s3.get_object(Bucket=data_bucket, Key=s3_key) \n", "\n", "#Read a text file line by line using splitlines object\n", "for line in input_text[\"Body\"].read().splitlines():\n", " each_line = line.decode('utf-8')\n", " print(\"Input Text:\")\n", " print(each_line)\n", " print()\n", " \n", " translated_text = translate_client.translate_text(\n", " Text=each_line,\n", " SourceLanguageCode='auto',\n", " TargetLanguageCode=OUTPUT_LANG_CODE\n", " )\n", " print(\"Translated Text:{}\".format(translated_text['TranslatedText']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks good. But the input text has profane word which are getting translated into french. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 3: Use Profanity mask Settings with Translate \n", "\n", "In the below cell, we call the same Translate `TranslateText` API by passing an additional profanity parameter as part of `Settings`. This will enable the profanity setting and Amazon Translate will mask profane words and phrases in your translation output.\n", "\n", "To mask profane words and phrases, Amazon Translate replaces them with the grawlix string “?$#@$“. This 5-character sequence is used for each profane word or phrase, regardless of the length or number of words.\n", "\n", "Amazon Translate doesn't detect profanity in all of its supported languages. For languages that support profanity detection, see [Supported Languages and Language Codes in the Amazon Translate Developer Guide](https://docs.aws.amazon.com/translate/latest/dg/what-is.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#setting output language as French. You can change this to the desired output language\n", "OUTPUT_LANG_CODE = 'fr'\n", "input_text = s3.get_object(Bucket=data_bucket, Key=s3_key) \n", "\n", "#Read a text file line by line using splitlines object\n", "for line in input_text[\"Body\"].read().splitlines():\n", " each_line = line.decode('utf-8')\n", " print(\"Input Text:\")\n", " print(each_line)\n", " print()\n", " \n", " translated_text = translate_client.translate_text(\n", " Text=each_line,\n", " SourceLanguageCode='auto',\n", " TargetLanguageCode=OUTPUT_LANG_CODE,\n", " Settings={'Profanity': 'MASK'}\n", " )\n", " print(\"Translated Text with profanity masked:{}\".format(translated_text['TranslatedText']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we see Amazon Translate translate the text to French and mask the profane word(s) with the grawlix string `?$#@$`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conclusion \n", "In this lab we learnt how to use profanity mask with Amazon translate to filter out unsuitable and profane words " ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "vscode": { "interpreter": { "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" } } }, "nbformat": 4, "nbformat_minor": 5 }