---

# Contents

7. [How to Build the Custom Sagemaker Container for Model Deployment](#7.-How-to-Build-the-Custom-Sagemaker-Container-for-Model-Deployment)
8. [How to Deploy Models as Sagemaker Multi Model Endpoint and Invoke the Endpoint](#8.-How-to-Deploy-Models-as-Sagemaker-Multi-Model-Endpoint-and-Invoke-the-Endpoint)
9. [How to Do Batch Transform in the Multi Model Server Framework](#9.-How-to-Do-Batch-Transform-in-the-Multi-Model-Server-Framework)
10. [Clean up the resources](#10.-Clean-up-the-resources)
11. [Conclusion](#11.-Conclusion)

---

# 7. How to Build the Custom Sagemaker Container for Model Deployment

Inspired by [an example of bringing your own container for deployment to a multi-model endpoint.](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/multi_model_bring_your_own), here we use the [Multi Model Server](https://github.com/awslabs/multi-model-server) framework and the [SageMaker Inference Toolkit](https://github.com/aws/sagemaker-inference-toolkit) for hosting the multiple forecasting models at the same time using one endpoint:

- Multi Model Server (MMS) is an open source framework for serving machine learning models. MMS supports a pluggable custom backend handler where you can implement your own algorithm. It provides the HTTP frontend and model management capabilities required by multi-model endpoints to host multiple models within a single container, load models into and unload models out of the container dynamically, and performing inference on a specified loaded model. MMS supports [various settings](https://github.com/awslabs/multi-model-server/blob/master/docker/advanced_settings.md#description-of-config-file-settings) for the frontend server it starts.
- SageMaker Inference Toolkit
[SageMaker Inference Toolkit](https://github.com/aws/sagemaker-inference-toolkit) is a library that bootstraps MMS in a way that is compatible with SageMaker multi-model endpoints, while still allowing you to tweak important performance parameters, such as the number of workers per model.

In this way, we can compare all the model forecasts in real-time more efficiently, and can save the cost of creating multiple endpoints.

The main steps for building the custom Sagemaker container are described as below.

## Step 1: Import libraries
Before beggining, first import all the python modules needed. 

In [1]:
import boto3
import jsonlines
import json
import time

from sagemaker import get_execution_role
from time import gmtime, strftime

## Step 2: Define model handler
The code snippet __`container/model_handler.py`__ below shows how we define a custom handler that supports loading and inference for the GluonTs models.
- The `initialize` method will be called when a model is loaded into memory. In this example, it loads the model artifacts at `model_dir` into the GluonTS Predictor class.

- The `handle` method will be called when invoking the model. In this example, it validates the input payload and then forwards the input to the GluonTS Predictor class, returning the output. This handler class is instantiated for every model loaded into the container, so state in the handler is not shared across models.

In [2]:
!cat container/model_handler.py

"""
ModelHandler defines an example model handler for load and inference requests for MXNet CPU models
"""
from collections import namedtuple
import glob
import json
import logging
import io
import os
import re

import mxnet as mx
import numpy as np
import sys

from pathlib import Path
from gluonts.model.predictor import Predictor
from gluonts.dataset.common import ListDataset

class ModelHandler(object):
    """
    A sample Model handler implementation.
    """

    def __init__(self):
        self.initialized = False
        self.mx_model = None
        self.shapes = None
    
    def load_model(self, model_path):
        try:
            predictor = Predictor.deserialize(Path(model_path))
            print('Model loaded from %s'%model_path)
        except:
            print('Unable to load the model %s'%model_path)
            sys.exit(1)
        return predictor

    def initialize(self, context):
        """
        Initialize model. This will be called during model loading time


### Step 3: Unit testing for the model handler
Before we build the custom docker container, it is good habit to do some unit testing (__`container/test_model_handler.py`__) as below.

In [3]:
%%bash

cd container
pytest -v test_model_handler.py

platform linux -- Python 3.6.10, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 -- /home/ec2-user/SageMaker/timeseries_blog_fork_yin/2_Predict_electricity_demand_with_the_GluonTS_and_SageMaker_custom_containers/.myenv/miniconda/envs/gluonts-multimodel/bin/python
cachedir: .pytest_cache
rootdir: /home/ec2-user/SageMaker/timeseries_blog_fork_yin/2_Predict_electricity_demand_with_the_GluonTS_and_SageMaker_custom_containers/container
collecting ... collected 5 items

test_model_handler.py::test_load_model PASSED                            [ 20%]
test_model_handler.py::test_initialize PASSED                            [ 40%]
test_model_handler.py::test_preprocess PASSED                            [ 60%]
test_model_handler.py::test_handle[5-quantiles0] PASSED                  [ 80%]
test_model_handler.py::test_handle[25-quantiles1] PASSED                 [100%]



## Step 4: Define Docker Entrypoint
The inference container in this example uses the Inference Toolkit to start MMS which can be seen in the __`container/dockerd-entrypoint.py`__ file as below.

In [4]:
!cat container/dockerd-entrypoint.py

import subprocess
import sys
import shlex
import os
from retrying import retry
from subprocess import CalledProcessError
from sagemaker_inference import model_server

def _retry_if_error(exception):
    return isinstance(exception, CalledProcessError or OSError)

@retry(stop_max_delay=1000 * 50,
       retry_on_exception=_retry_if_error)
def _start_mms():
    # by default the number of workers per model is 1, but we can configure it through the
    # environment variable below if desired.
    # os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2'
    model_server.start_model_server(handler_service='/home/model-server/model_handler.py:handle')

def main():
    if sys.argv[1] == 'serve':
        _start_mms()
    else:
        subprocess.check_call(shlex.split(' '.join(sys.argv[1:])))

    # prevent docker exit
    subprocess.call(['tail', '-f', '/dev/null'])
    
main()


## Step 5: Building and registering a container

The shell script below will first build a custome Docker image which uses MMS as the front end (configured through SageMaker Inference Toolkit in `container/dockerd-entrypoint.py`), and `container/model_handler.py` shown above as the backend handler. It will then upload the image to an ECR repository in your account. `This step may take a bit long when running for the first time.`

In [5]:
%%bash

# The name of our algorithm
algorithm_name=demo-sagemaker-multimodel-gluonts

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -q -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
sha256:d3bb3ca1cfc27a37f663d3f723c60d9977427eeb273a65beda1dcb6f8e94503e
The push refers to repository [783128296767.dkr.ecr.ap-southeast-2.amazonaws.com/demo-sagemaker-multimodel-gluonts]
757c1ae22515: Preparing
413a6f04b09c: Preparing
37d91a3c4cee: Preparing
12db22adf339: Preparing
6d7e581173db: Preparing
97d7225ac80a: Preparing
db7a601aadeb: Preparing
9012222a80e1: Preparing
e40ff1b74835: Preparing
f383861831ea: Preparing
96cb223ef3e4: Preparing
ebb3daf902ad: Preparing
dcc0cc99372e: Preparing
87c128261339: Preparing
41a253a417e6: Preparing
e06660e80cf4: Preparing
97d7225ac80a: Waiting
db7a601aadeb: Waiting
9012222a80e1: Waiting
dcc0cc99372e: Waiting
e40ff1b74835: Waiting
f383861831ea: Waiting
96cb223ef3e4: Waiting
87c128261339: Waiting
ebb3daf902ad: Waiting
41a253a417e6: Waiting
757c1ae22515: Layer already exists
12db22adf339: Layer already exists
6d7e581173db: Layer already exists
413a6f04b09c: Layer already exists
37d91a3c4cee: Layer already exists
97d7225ac80a: Lay

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



# 8. How to Deploy Models as Sagemaker Multi Model Endpoint and Invoke the Endpoint

After building and registering the custom Sagemaker container, we can start to deploy models as Sagemaker multi-model endpoint and invoke the endpoint. The main steps are outlined as below:

## Step 1: Set up the environment

First, we need to define the S3 bucket and prefix of the model artifacts that will be invoked by the multi-model endpoint. we also need to define the IAM role that will give SageMaker access to the model artifacts and ECR image that was created above.

In [6]:
sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

account_id = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name

bucket = 'sagemaker-{}-{}'.format(region, account_id)
prefix = 'demo-multimodel-gluonts-endpoint'

role = get_execution_role()

models_dir = "models"

## Step 2: Create a multi-model endpoint
### Step 2-1: Import models into hosting
When creating the Model entity for multi-model endpoints, the container's `ModelDataUrl` is the S3 prefix where the model artifacts that are invokable by the endpoint are located. The rest of the S3 path will be specified when invoking the model.

The `Mode` of container is specified as `MultiModel` to signify that the container will host multiple models.

In [7]:
model_name = 'DEMO-MultiModelGluonTSModel' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = 'https://s3-{}.amazonaws.com/{}/{}/{}/'.format(region, bucket, prefix, models_dir)
container = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id, region, 'demo-sagemaker-multimodel-gluonts')

print('Model name: ' + model_name)
print('Model data Url: ' + model_url)
print('Container image: ' + container)

container = {
    'Image': container,
    'ModelDataUrl': model_url,
    'Mode': 'MultiModel'
}

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

Model name: DEMO-MultiModelGluonTSModel2020-09-10-01-18-56
Model data Url: https://s3-ap-southeast-2.amazonaws.com/sagemaker-ap-southeast-2-783128296767/demo-multimodel-gluonts-endpoint/models/
Container image: 783128296767.dkr.ecr.ap-southeast-2.amazonaws.com/demo-sagemaker-multimodel-gluonts:latest
Model Arn: arn:aws:sagemaker:ap-southeast-2:783128296767:model/demo-multimodelgluontsmodel2020-09-10-01-18-56


### Step 2-2: Create endpoint configuration
Endpoint config creation works the same way it does as single model endpoints.

In [8]:
endpoint_config_name = 'DEMO-MultiModelGluonTSEndpointConfig-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint config name: ' + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': 'ml.m5.xlarge',
        'InitialInstanceCount': 2,
        'InitialVariantWeight': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])

print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

Endpoint config name: DEMO-MultiModelGluonTSEndpointConfig-2020-09-10-01-18-57
Endpoint config Arn: arn:aws:sagemaker:ap-southeast-2:783128296767:endpoint-config/demo-multimodelgluontsendpointconfig-2020-09-10-01-18-57


### Step 2-3: Create the multi model endpoint
Similarly, endpoint creation works the same way as for single model endpoints.

In [9]:
endpoint_name = 'DEMO-MultiModelGluonTSEndpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint name: ' + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Endpoint Status: " + status)

print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

Endpoint name: DEMO-MultiModelGluonTSEndpoint-2020-09-10-01-18-57
Endpoint Arn: arn:aws:sagemaker:ap-southeast-2:783128296767:endpoint/demo-multimodelgluontsendpoint-2020-09-10-01-18-57
Endpoint Status: Creating
Waiting for DEMO-MultiModelGluonTSEndpoint-2020-09-10-01-18-57 endpoint to be in service...


## Step 4: Invoke models
Now we invoke the models that we uploaded to S3 previously. The first invocation of a model may be slow, since behind the scenes, SageMaker is downloading the model artifacts from S3 to the instance and loading it into the container.

### Invoke the Mean Model

First we will prepare two time series as the payload to invoke the model, then call InvokeEndpoint to invoke the Mean model to forecast. The `TargetModel` field is concatenated with the S3 prefix specified in `ModelDataUrl` when creating the model, to generate the location of the model in S3.

In [10]:
def read_data(file_path):
    data = []
    with jsonlines.open(file_path) as reader:
        for obj in reader:
            data.append(obj)
    return data

payload_jsonline = read_data('data/test.json')

In [11]:
n_time_series = 2 # select 2 time series for quick response
payload_list = []
for p in payload_jsonline[:n_time_series]:
    payload_list.append(json.dumps(p))
payload = '\n'.join(payload_list)

In [12]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='MeanPredictor.tar.gz', # this is the rest of the S3 path where the model artifacts are located
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": "0", "quantiles": {"0.1": [1.6836708774500941, 1.3459220587634255, 1.2794438968620294, 1.5685811548861057, 1.4762431789454158, 1.5028084040413205, 1.8883717942488332, 1.6330475467354955, 1.5684890987332918, 1.4826444264501741, 1.7830962003789306, 1.5939098498840787], "0.2": [1.920602424203765, 1.9718953001761617, 1.6995023159565712, 1.8263504590140518, 1.902724896314878, 1.8205976567242148, 2.0653552536703534, 1.9735749351493324, 1.9506841365559424, 1.888436017604459, 2.0472556393961723, 1.9082684966100345], "0.3": [2.1623539369328393, 2.1639666802506414, 2.03170568575243, 2.0323324750050675, 2.1709550421378645, 2.1627820452559514, 2.2317351500392477, 2.312439851736791, 2.1969237389391787, 2.188039879769499, 2.1810808210984067, 2.141758071199021], "0.4": [2.535394003539569, 2.4028181606280374, 2.271119473993988, 2.2963657866435074, 2.399004529626709, 2.4504841027529, 2.376205256756106, 2.5649156033833234, 2.434253137431879, 2.368485043253767, 2.3594630672039796, 2.281536606

When we invoke the same models a __`2nd`__ time, it is already downloaded to the instance and loaded in the container, so __`inference is faster`__.

In [13]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='MeanPredictor.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": "0", "quantiles": {"0.1": [1.423646103635795, 1.4244855612323308, 1.7955984645359282, 1.6185297296529932, 1.4971168910949333, 1.409791466128483, 1.5487292063807596, 1.489198716173964, 1.6668053206242415, 1.4101198318373516, 1.6540691627134603, 1.5076646914320924], "0.2": [1.805119108074844, 1.9672950606047408, 2.100561623173293, 1.979248640570927, 1.7915881498935398, 1.7220097630362785, 1.9769095657259328, 1.8605283661047287, 1.9150668139811335, 1.962038534074229, 1.9466996698154517, 1.9075864550234036], "0.3": [2.089584708878582, 2.15403457693724, 2.3803876996449014, 2.131217995525174, 2.1257642882637264, 1.8946281606558881, 2.206056796087563, 2.193483912385975, 2.2115369082876377, 2.235224572942, 2.1619749927372847, 2.0664737174619656], "0.4": [2.299868019461437, 2.2765289159240627, 2.549544892279797, 2.3037481683350904, 2.22877167733201, 2.187668900749337, 2.287585380384888, 2.415537593353405, 2.485359781056723, 2.502719394770962, 2.3651967944838463, 2.2932707341393552],

### Invoke other models
Exercising the power of a multi-model endpoint, we can specify different models (e.g., DeepAREstimator.tar.gz) as `TargetModel` and perform inference on it using the same endpoint.

In [14]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='DeepAREstimator.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": "0", "quantiles": {"0.1": [2.0058059692382812, 2.0848548412323, 2.1319191455841064, 1.7626827955245972, 2.1028003692626953, 2.1341395378112793, 2.153954029083252, 1.4223262071609497, 1.8531743288040161, 2.1007485389709473, 2.023390293121338, 1.9111007452011108], "0.2": [2.250101089477539, 2.2457194328308105, 2.2871243953704834, 1.9063371419906616, 2.3267104625701904, 2.3346588611602783, 2.345787286758423, 1.5654994249343872, 2.0083627700805664, 2.2628791332244873, 2.159290075302124, 2.080110549926758], "0.3": [2.3288064002990723, 2.4233856201171875, 2.3600692749023438, 1.9544258117675781, 2.4445018768310547, 2.5232415199279785, 2.528930187225342, 1.7129929065704346, 2.116767168045044, 2.4026944637298584, 2.377838134765625, 2.2108047008514404], "0.4": [2.41412615776062, 2.476562738418579, 2.456324577331543, 2.087014675140381, 2.514258623123169, 2.5901424884796143, 2.676973342895508, 1.8229917287826538, 2.2375881671905518, 2.5167603492736816, 2.456094264984131, 2.299031496047

In [15]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='RForecastPredictor.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": null, "quantiles": {"0.1": [0.18148276617262526, 0.3198412483517252, 0.11849932864266854, 0.411343906972496, 0.004403061488035487, -0.04578059200794696, -0.27764912872607517, -0.09400640819804756, -0.504227514273526, -0.13732209137043672, -0.15145200349425147, -0.847737190886434], "0.2": [1.2438235772273658, 1.0318727403506809, 0.8608082994869592, 1.2409827683028785, 0.6357459088267154, 0.8255816648621153, 0.7127809167764643, 0.9434364976374717, 0.6198691127527405, 0.5638587350804194, 0.3933465240425702, 0.5061232380146841], "0.3": [1.752839928106999, 1.5651237848625699, 1.2734625513551545, 1.6892401039977127, 1.1456370615316522, 1.369433178060636, 1.3880850771848765, 1.4925396445319707, 1.1776492815623814, 1.3361671086434446, 1.0216989782950103, 0.9802164539115574], "0.4": [2.2029614546021365, 1.9157150441236104, 1.7357722803468945, 1.9374095438189973, 2.016817544158278, 1.8715178133726045, 2.123673137681477, 1.9625478317091734, 2.073336239688927, 1.7778421123843846, 1.895

In [16]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='ProphetPredictor.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": null, "quantiles": {"0.1": [0.5930952134767706, 0.6731662001592016, 0.3695854392420037, 0.7794866983198159, 0.903604868377399, 0.8471579536059082, 0.8699702973042449, -0.03352734240836752, 0.06541579696189292, 0.35512063650832904, 0.14095854195632262, 0.381850419260251], "0.2": [1.14577455973552, 1.3339385008257056, 0.7803199516421508, 1.2918720851876282, 1.398997204940614, 1.2620057989794327, 1.3358825084023809, 0.7742404827857581, 0.5588027572296927, 0.5569706272283665, 0.661942395575535, 0.8351638852697262], "0.3": [1.4987660422398452, 1.668511016064593, 1.272430998864056, 1.6955004024231497, 1.880906929608446, 1.563464021059021, 1.657705629441228, 1.3925979695603954, 0.7537146597888734, 1.063703569386945, 1.1928283117285745, 1.3469376591821351], "0.4": [1.7925902986900557, 1.9173754001515324, 1.4178542972431227, 2.0751247003088777, 2.0594082907028235, 2.0187464565195827, 2.061139721812967, 1.6784624410738136, 0.9965063927024583, 1.2894270974807087, 1.3429899793948428, 1

In [17]:
%%time

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='application/json',
    TargetModel='SeasonalNaivePredictor.tar.gz',
    Body=payload)

print(response['Body'].read().decode("utf-8"), sep = '\n')

{"item_id": null, "quantiles": {"0.1": [2.855329990386963, 2.855329990386963, 2.855329990386963, 2.062182664871216, 3.4898476600646973, 2.5380711555480957, 1.7449238300323486, 1.4276649951934814, 2.2208120822906494, 2.5380711555480957, 2.062182664871216, 2.379441738128662], "0.2": [2.855329990386963, 2.855329990386963, 2.855329990386963, 2.062182664871216, 3.4898476600646973, 2.5380711555480957, 1.7449238300323486, 1.4276649951934814, 2.2208120822906494, 2.5380711555480957, 2.062182664871216, 2.379441738128662], "0.3": [2.855329990386963, 2.855329990386963, 2.855329990386963, 2.062182664871216, 3.4898476600646973, 2.5380711555480957, 1.7449238300323486, 1.4276649951934814, 2.2208120822906494, 2.5380711555480957, 2.062182664871216, 2.379441738128662], "0.4": [2.855329990386963, 2.855329990386963, 2.855329990386963, 2.062182664871216, 3.4898476600646973, 2.5380711555480957, 1.7449238300323486, 1.4276649951934814, 2.2208120822906494, 2.5380711555480957, 2.062182664871216, 2.37944173812866

# 9. How to Do Batch Transform in the Multi Model Server Framework

The MMS does not support batch transform directly, to perform batch tranform. We need to create models seperately in Sagemaker, and do the batch transform for each model one by one. Below shows an example of how to do batch transoform for one model.

## Step 1: Create the Sagemaker model from the model artifact.

In [18]:
from time import gmtime, strftime

model = 'RForecastPredictor'
model_name_bt = 'DEMO-GluonTSModel-{}-'.format(model) + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
model_url = 'https://s3-{}.amazonaws.com/{}/{}/{}/{}.tar.gz'.format(region, bucket, prefix, models_dir, model)
container = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id, region, 'demo-sagemaker-multimodel-gluonts')

print('Model name: ' + model_name_bt)
print('Model data Url: ' + model_url)
print('Container image: ' + container)

container = {
    'Image': container,
    'ModelDataUrl': model_url,
    'Mode': 'SingleModel'
}

create_model_response = sm_client.create_model(
    ModelName = model_name_bt,
    ExecutionRoleArn = role,
    Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

Model name: DEMO-GluonTSModel-RForecastPredictor-2020-09-10-01-27-50
Model data Url: https://s3-ap-southeast-2.amazonaws.com/sagemaker-ap-southeast-2-783128296767/demo-multimodel-gluonts-endpoint/models/RForecastPredictor.tar.gz
Container image: 783128296767.dkr.ecr.ap-southeast-2.amazonaws.com/demo-sagemaker-multimodel-gluonts:latest
Model Arn: arn:aws:sagemaker:ap-southeast-2:783128296767:model/demo-gluontsmodel-rforecastpredictor-2020-09-10-01-27-50


## Step 2: Start the Batch Transform Job Using the Model Created above

In [19]:
test_data_s3_path = "s3://{}/{}/data/test.json".format(bucket, prefix)
transform_job_name = 'DEMO-GluonTS-{}-BT-'.format(model) + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

transform_input = {
        'DataSource': {
            'S3DataSource': {
                'S3DataType': 'S3Prefix',
                'S3Uri': test_data_s3_path
            }
        },
        'ContentType': 'application/json',
        'CompressionType': 'None',
        'SplitType': 'Line'
    }

transform_output = {
        'S3OutputPath': 's3://{}/{}/inference-results/{}'.format(bucket,prefix, model),
    }

transform_resources = {
        'InstanceType': 'ml.m5.xlarge',
        'InstanceCount': 1
    }

sm_client.create_transform_job(TransformJobName = transform_job_name,
                        ModelName = model_name_bt,
                        BatchStrategy='SingleRecord',
                        TransformInput = transform_input,
                        TransformOutput = transform_output,
                        TransformResources = transform_resources
)

{'TransformJobArn': 'arn:aws:sagemaker:ap-southeast-2:783128296767:transform-job/demo-gluonts-rforecastpredictor-bt-2020-09-10-01-27-50',
 'ResponseMetadata': {'RequestId': '4ab27e57-38d9-4e3d-8325-7925f4f1610e',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '4ab27e57-38d9-4e3d-8325-7925f4f1610e',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '136',
   'date': 'Thu, 10 Sep 2020 01:27:50 GMT'},
  'RetryAttempts': 0}}

## Step 3: Check the Batch Transform Job Status

In [20]:
print ('JobStatus')
print('----------')
from time import sleep

describe_response = sm_client.describe_transform_job(TransformJobName = transform_job_name)
job_run_status = describe_response['TransformJobStatus']
print (job_run_status)

while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    describe_response = sm_client.describe_transform_job(TransformJobName = transform_job_name)
    job_run_status = describe_response['TransformJobStatus']
    print (job_run_status)
    sleep(30)

JobStatus
----------
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
InProgress
Completed


## Step 4: Inspect Batch Transform Results

In [21]:
s3_client = boto3.client('s3')
s3_client.download_file(Filename='data/test.json.out',
                        Bucket=bucket,
                        Key='{}/inference-results/{}/test.json.out'.format(prefix, model))
test_out_jsonline = read_data('data/test.json.out')
print(test_out_jsonline[:2])

[{'item_id': None, 'quantiles': {'0.1': [0.7315686527001264, 0.2856001165278528, 0.4323040259171994, 0.5833599570818291, -0.14797111589962553, 0.021587770106187287, -0.6077946952864486, -0.37748098928734525, -0.07952885636610768, -0.34318818769052273, -0.35792203168404557, -0.8223312373679962], '0.2': [1.4602484228793993, 1.073056784408827, 1.277245460243779, 1.0832357513623236, 0.3662195058442699, 0.7424413306138271, 0.4812303578001802, 0.5028512296478762, 1.0906918621359172, 0.5945405772812749, 0.5022932516142155, -0.013354661040992255], '0.3': [1.795786897086039, 1.465913012538786, 1.6618651577573946, 1.7628516387142779, 1.4342109215517789, 1.6406403571578272, 1.239871620102564, 1.029297049008235, 1.773293261366086, 1.3789020161223795, 1.4274630019458348, 0.7861460896701702], '0.4': [2.193155776216826, 1.8968683076991943, 2.0824539126703323, 2.106138158544148, 2.041541277690376, 2.0448217534102824, 1.8788850607500716, 1.4543228960864762, 2.2623898337815005, 1.8990658798677011, 2.071

# 10. Clean up the resources

## (Optional) Delete the hosting resources

In [22]:
sm_client.delete_endpoint(EndpointName=endpoint_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_model(ModelName=model_name)
sm_client.delete_model(ModelName=model_name_bt)

{'ResponseMetadata': {'RequestId': 'd1435880-de88-4807-bccf-62d6b26f6d7c',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'd1435880-de88-4807-bccf-62d6b26f6d7c',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Thu, 10 Sep 2020 01:35:52 GMT'},
  'RetryAttempts': 1}}

# 11. Conclusion

Time series data is a highly valuable data source to various businesses, and the ability to forecast such data is critial to making optimal and accurate business decisions. In stead of using AWS build-in services or algorithms, this tutorial has demonstated how to use AWS Sagemaker to build your own custom algorithm to do forecast, and deploy multiple forecast models into one Sagemaker endpoint. This will facilitate businesses to compare state-of-the-art algorithms more efficientily and effectively, and enable the possiblilities to do smarter decisions based on the forecast.

We have covered other use cases related to time series data as well, you can find other topics below:

- Forecast air pollution with SageMaker processing and the AWS Open Data Registry by Eric Greene
- Automate sales projections with Amazon Forecast, QuickSight and AWS Lambda by Yoshiyuki Ito
- Detect DDoS Attacks with Kineses Data Streams and SageMaker Isolation Forest by Seongmoon Kang