# Machine Translation English-German Example Using SageMaker Seq2Seq

1. [Introduction](#Introduction)
2. [Setup](#Setup)
3. [Download dataset and preprocess](#Download-dataset-and-preprocess)
3. [Training the Machine Translation model](#Training-the-Machine-Translation-model)
4. [Inference](#Inference)

## Introduction

Welcome to our Machine Translation end-to-end example! In this demo, we will use a pre-trained English-German translation model and will deploy it for an internet-facing App. 

SageMaker Seq2Seq algorithm is built on top of [Sockeye](https://github.com/awslabs/sockeye), a sequence-to-sequence framework for Neural Machine Translation based on MXNet. SageMaker Seq2Seq implements state-of-the-art encoder-decoder architectures which can also be used for tasks like Abstractive Summarization in addition to Machine Translation.

To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on.

## Setup

Let's start by specifying:
- The S3 bucket and prefix that you want to use for training and model data. **This should be within the same region as the Notebook Instance, training, and hosting.**
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto regexp in the cell below with a the appropriate full IAM role arn string(s).

### Lab time
This notebook will take about 12 to 15 minutes to complete.

In [None]:
import timeit
start_time = timeit.default_timer()

In [None]:
import boto3
import re
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()

# S3 bucket and prefix
bucket = '' # replace with an existing bucket if needed
prefix = 'sagemaker/seq2seq/eng-german' # E.g.'sagemaker/seq2seq/eng-german'

Next, we'll import the Python libraries we'll need for the remainder of the exercise.

In [None]:
from time import gmtime, strftime
import time
import numpy as np
import os
import json

# For plotting attention matrix later on
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
def upload_to_s3(bucket, prefix, channel, file):
 s3 = boto3.resource('s3')
 data = open(file, "rb")
 key = prefix + "/" + channel + '/' + os.path.basename(file)
 s3.Bucket(bucket).put_object(Key=key, Body=data)

In [None]:
region_name = boto3.Session().region_name

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, 'seq2seq', "latest")

print('Using SageMaker Seq2Seq container: {} ({})'.format(container, region_name))

## Inference

### Use a pretrained model

In [None]:
use_pretrained_model = True
model_name = "pretrained-en-de-model"
!curl https://s3.ap-northeast-2.amazonaws.com/pilho-immersionday-public-material/download/model.tar.gz > /tmp/model.tar.gz
!curl https://s3.ap-northeast-2.amazonaws.com/pilho-immersionday-public-material/download/vocab.src.json > /tmp/vocab.src.json
!curl https://s3.ap-northeast-2.amazonaws.com/pilho-immersionday-public-material/download/vocab.trg.json > /tmp/vocab.trg.json

In [None]:
upload_to_s3(bucket, prefix, 'pretrained_model', '/tmp/model.tar.gz')
model_data = "s3://{}/{}/pretrained_model/model.tar.gz".format(bucket, prefix)

In [None]:
%%time

sage = boto3.client('sagemaker')

if not use_pretrained_model:
 info = sage.describe_training_job(TrainingJobName=job_name)
 model_name=job_name
 model_data = info['ModelArtifacts']['S3ModelArtifacts']

print(model_name)
print(model_data)

primary_container = {
 'Image': container,
 'ModelDataUrl': model_data
}

create_model_response = sage.create_model(
 ModelName = model_name,
 ExecutionRoleArn = role,
 PrimaryContainer = primary_container)

print(create_model_response['ModelArn'])

### Create endpoint configuration
Use the model to create an endpoint configuration. The endpoint configuration also contains information about the type and number of EC2 instances to use when hosting the model.

Since SageMaker Seq2Seq is based on Neural Nets, we could use an ml.p2.xlarge (GPU) instance, but for this example we will use a free tier eligible ml.m4.xlarge.

In [None]:
from time import gmtime, strftime

endpoint_config_name = 'Seq2SeqEndpointConfig-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_config_name)
create_endpoint_config_response = sage.create_endpoint_config(
 EndpointConfigName = endpoint_config_name,
 ProductionVariants=[{
 'InstanceType':'ml.m4.xlarge',
 'InitialInstanceCount':1,
 'ModelName':model_name,
 'VariantName':'AllTraffic'}])

print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

### Create endpoint
Lastly, we create the endpoint that serves up model, through specifying the name and configuration defined above. The end result is an endpoint that can be validated and incorporated into production applications. This takes 10-15 minutes to complete.

In [None]:
%%time
import time

endpoint_name = 'Seq2SeqEndpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_name)
create_endpoint_response = sage.create_endpoint(
 EndpointName=endpoint_name,
 EndpointConfigName=endpoint_config_name)
print(create_endpoint_response['EndpointArn'])

resp = sage.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Status: " + status)

# wait until the status has changed
sage.get_waiter('endpoint_in_service').wait(EndpointName=endpoint_name)

# print the status of the endpoint
endpoint_response = sage.describe_endpoint(EndpointName=endpoint_name)
status = endpoint_response['EndpointStatus']
print('Endpoint creation ended with EndpointStatus = {}'.format(status))

if status != 'InService':
 raise Exception('Endpoint creation failed.')

If you see the message,
> Endpoint creation ended with EndpointStatus = InService

then congratulations! You now have a functioning inference endpoint. You can confirm the endpoint configuration and status by navigating to the "Endpoints" tab in the AWS SageMaker console. 

We will finally create a runtime object from which we can invoke the endpoint.

In [None]:
runtime = boto3.client(service_name='runtime.sagemaker') 

# Perform Inference

### Using JSON format for inference (Suggested for a single or small number of data instances)

#### Note that you don't have to convert string to text using the vocabulary mapping for inference using JSON mode

In [None]:
sentences = ["you are so good !",
 "can you drive a car ?",
 "i want to watch a movie ."
 ]

payload = {"instances" : []}
for sent in sentences:
 payload["instances"].append({"data" : sent})

response = runtime.invoke_endpoint(EndpointName=endpoint_name, 
 ContentType='application/json', 
 Body=json.dumps(payload))

response = response["Body"].read().decode("utf-8")
response = json.loads(response)
print(response)

### Retrieving the Attention Matrix

Passing `"attention_matrix":"true"` in `configuration` of the data instance will return the attention matrix.

In [None]:
sentence = 'can you drive a car ?'

payload = {"instances" : [{
 "data" : sentence,
 "configuration" : {"attention_matrix":"true"}
 }
 ]}

response = runtime.invoke_endpoint(EndpointName=endpoint_name, 
 ContentType='application/json', 
 Body=json.dumps(payload))

response = response["Body"].read().decode("utf-8")
response = json.loads(response)['predictions'][0]

source = sentence
target = response["target"]
attention_matrix = np.array(response["matrix"])

print("Source: %s \nTarget: %s" % (source, target))

In [None]:
# Define a function for plotting the attentioan matrix
def plot_matrix(attention_matrix, target, source):
 source_tokens = source.split()
 target_tokens = target.split()
 assert attention_matrix.shape[0] == len(target_tokens)
 plt.imshow(attention_matrix.transpose(), interpolation="nearest", cmap="Greys")
 plt.xlabel("target")
 plt.ylabel("source")
 plt.gca().set_xticks([i for i in range(0, len(target_tokens))])
 plt.gca().set_yticks([i for i in range(0, len(source_tokens))])
 plt.gca().set_xticklabels(target_tokens)
 plt.gca().set_yticklabels(source_tokens)
 plt.tight_layout()

In [None]:
plot_matrix(attention_matrix, target, source)

# Stop / Close the Endpoint (Optional)

Finally, we should delete the endpoint before we close the notebook.

In [None]:
#sage.delete_endpoint(EndpointName=endpoint_name)

In [None]:
# code you want to evaluate
elapsed = timeit.default_timer() - start_time
print(elapsed/60)