# Serving JPMML-based Tree-based models on Amazon SageMaker


---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

---


This example notebook demonstrates how to bring your own java-based container and serve PMML based Models on Amazon SageMaker for Inference.

The parts handling the PMML stuff (aka the model, loading the model and predicting) were inspired from [here](https://github.com/hkropp/jpmml-iris-example/blob/master/src/main/resources/sample/Iris.csv) and [here](https://henning.kropponline.de/2015/09/06/jpmml-example-random-forest/).

This example shows serving a pre-trained random forest model (PMML-based) on Amazon SageMaker using Bring your own container.
SageMaker provides the ability to bring your own model in the format of the Docker containers. More information and examples on how to bring your own algorithms can be found [here](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html)

Update the SageMaker Python SDK

In [None]:
!pip install --upgrade sagemaker

In [None]:
!pip install sagemaker-studio-image-build

Create a SageMaker session and get a IAM role.

In [None]:
import boto3
import numpy as np
import os
import pandas as pd
import re
import sagemaker

from sagemaker.utils import S3DataConfig
from datetime import datetime


import shutil
import tarfile


role = sagemaker.get_execution_role()
sm_session = sagemaker.Session()
bucket_name = sm_session.default_bucket()
prefix = "demo-multimodel-endpoint"

bucket_name

## When should I build my own algorithm container?
You may not need to create a container to bring your own code to Amazon SageMaker. When you are using a framework such as Apache MXNet or TensorFlow that has direct support in SageMaker, you can simply supply the Python code that implements your algorithm using the SDK entry points for that framework. This set of supported frameworks is regularly added to, so you should check the current list to determine whether your algorithm is written in one of these common machine learning environments.

Even if there is direct SDK support for your environment or framework, you may find it more effective to build your own container. If the code that implements your algorithm is quite complex, or you need special additions to the framework, building your own container may be the right choice.

Some reasons to build an already supported framework container are:

- A specific version isn't supported.
- Configure and install your dependencies and environment.
- Use a different training/hosting solution than provided.

This walkthrough shows that it is quite straightforward to build your own container. So you can still use SageMaker even if your use case is not covered by the deep learning containers that we've built for you.

## Permissions
Running this notebook requires permissions in addition to the normal SageMakerFullAccess permissions. This is because it creates new repositories on Amazon ECR. The easiest way to add these permissions is simply to add the managed policy AmazonEC2ContainerRegistryFullAccess to the role that you used to start your notebook instance. There's no need to restart your notebook instance when you do this, the new permissions will be available immediately.

## The example
In this example we show how to package a custom Java Spring-boot container serving PMML-based Random Forest Tree model on SageMaker. 

## Part 1: Packaging and Uploading your Algorithm for use with Amazon SageMaker
### An overview of Docker
If you're familiar with Docker already, you can skip ahead to the next section.

For many data scientists, Docker containers are a new technology. But they are not difficult and can significantly simply the deployment of your software packages.

Docker provides a simple way to package arbitrary code into an image that is totally self-contained. Once you have an image, you can use Docker to run a container based on that image. Running a container is just like running a program on the machine except that the container creates a fully self-contained environment for the program to run. Containers are isolated from each other and from the host environment, so the way your program is set up is the way it runs, no matter where you run it.

Docker is more powerful than environment managers like conda or virtualenv because (a) it is completely language independent and (b) it comprises your whole operating environment, including startup commands, and environment variable.

A Docker container is like a virtual machine, but it is much lighter weight. For example, a program running in a container can start in less than a second and many containers can run simultaneously on the same physical or virtual machine instance.

Docker uses a simple file called a Dockerfile to specify how the image is assembled. An example is provided below. You can build your Docker images based on Docker images built by yourself or by others, which can simplify things quite a bit.

Docker has become very popular in programming and devops communities due to its flexibility and its well-defined specification of how code can be run in its containers. It is the underpinning of many services built in the past few years, such as Amazon ECS.

Amazon SageMaker uses Docker to allow users to train and deploy arbitrary algorithms.

In Amazon SageMaker, Docker containers are invoked in a one way for training and another, slightly different, way for hosting. The following sections outline how to build containers for the SageMaker environment.

Some helpful links:

- [Docker home page](http://www.docker.com/)
- [Getting started with Docker](https://docs.docker.com/get-started/)
- [Dockerfile reference](https://docs.docker.com/engine/reference/builder/)
- [docker run reference](https://docs.docker.com/engine/reference/run/)

## How Amazon SageMaker runs your Docker container
Because you can run the same image in training or hosting, Amazon SageMaker runs your container with the argument train or serve. How your container processes this argument depends on the container.

- In this example, we don't define a ENTRYPOINT in the Dockerfile, so Docker runs the command train at training time and serve at serving time. In this example, we define these as executable Python scripts, but they could be any program that we want to start in that environment.
- If you specify a program as a ENTRYPOINT in the Dockerfile, that program will be run at startup and its first argument will be train or serve. The program can then look at that argument and decide what to do.
- If you are building separate containers for training and hosting (or building only for one or the other), you can define a program as a ENTRYPOINT in the Dockerfile and ignore (or verify) the first argument passed in.

### Running your container during hosting
Hosting has a very different model than training because hosting is responding to inference requests that come in via HTTP. In this example, we use TensorFlow Serving, however the hosting solution can be customized. One example is the Python serving stack within the scikit learn example.

Amazon SageMaker uses two URLs in the container:

/ping receives GET requests from the infrastructure. Your program returns 200 if the container is up and accepting requests.
/invocations is the endpoint that receives client inference POST requests. The format of the request and the response is up to the algorithm. If the client supplied ContentType and Accept headers, these are passed in as well.
The container has the model files in the same place that they were written to during training:


 /opt/ml
 `-- model
 `-- 

## The Dockerfile
 
The Dockerfile describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations.

For the Python science stack, we start from an official TensorFlow docker image and run the normal tools to install TensorFlow Serving. Then we add the code that implements our specific algorithm to the container and set up the right environment for it to run under.

Let's look at the Dockerfile for this example.

In [None]:
!cat Dockerfile

### Building and registering the container
The following shell code shows how to build the container image using docker build and push the container image to ECR using docker push. This code is also available as the shell script container/build-and-push.sh, which you can run as build-and-push.sh sagemaker-tf-cifar10-example to build the image sagemaker-tf-cifar10-example.

This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this is the region where the notebook instance was created). If the repository doesn't exist, the script will create it.

In [None]:
%%sh

# The name of our algorithm
algorithm_name=random_forest

#cd sagemaker-byoc-pmml-example

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
 aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

#sm-docker build .
# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

### Upload model artifacts to S3

In this example we will use pre-trained XGBoost based model defined in pmml format. First, we will convert the models to tar.gz format and then upload them to S3.

In [None]:
account = sm_session.boto_session.client("sts").get_caller_identity()["Account"]
region = sm_session.boto_session.region_name
serving_image = "{}.dkr.ecr.{}.amazonaws.com/random_forest:latest".format(
 account,
 region
 # serving_image = "{}.dkr.ecr.{}.amazonaws.com/sagemaker-studio-d-v8zbzuweo1qc:ds-fetch-user".format(
 # account, region
)

serving_image

In [None]:
import tarfile

with tarfile.open("data/iris_rf_1.tar.gz", "w:gz") as tar:
 tar.add("data/iris_rf_1.pmml", arcname=".")

with tarfile.open("data/iris_rf_2.tar.gz", "w:gz") as tar:
 tar.add("data/iris_rf_2.pmml", arcname=".")

In [None]:
from botocore.client import ClientError
import os

s3 = boto3.resource("s3")
try:
 s3.meta.client.head_bucket(Bucket=bucket_name)
except ClientError:
 s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={"LocationConstraint": region})

models = {"iris_rf_2.tar.gz", "iris_rf_1.tar.gz"}

for model in models:
 key = os.path.join(prefix, model)
 with open("data/" + model, "rb") as file_obj:
 s3.Bucket(bucket_name).Object(key).upload_fileobj(file_obj)

### Define Amazon SageMaker Model

Next, we define an Amazon SageMaker Model that defines the deployed model we will serve from an Amazon SageMaker Endpoint.



In [None]:
model_url = "https://s3-{}.amazonaws.com/{}/{}/".format(region, bucket_name, prefix)

serving_container_def = {"Image": serving_image, "ModelDataUrl": model_url, "Mode": "MultiModel"}
model_name = "pmml-random-forest"

create_model_response = sm_session.create_model(
 name=model_name, role=role, container_defs=serving_container_def
)

print(create_model_response)

Next, we set the name of the Amaozn SageMaker hosted service endpoint configuration.



In [None]:
endpoint_config_name = f"{model_name}-endpoint-config"
print(endpoint_config_name)

Next, we create the Amazon SageMaker hosted service endpoint configuration that uses one instance of ml.p3.2xlarge to serve the model.



In [None]:
epc = sm_session.create_endpoint_config(
 name=endpoint_config_name,
 model_name=model_name,
 initial_instance_count=1,
 instance_type="ml.m5.large",
)
print(epc)

Next we specify the Amazon SageMaker endpoint name for the endpoint used to serve the model.



In [None]:
endpoint_name = f"{model_name}-endpoint-{datetime.now().strftime('%Y%m-%d%H-%M%S')}"
print(endpoint_name)

Next, we create the Amazon SageMaker endpoint using the endpoint configuration we created above.



In [None]:
ep = sm_session.create_endpoint(
 endpoint_name=endpoint_name, config_name=endpoint_config_name, wait=True
)
print(ep)

Now that the Amazon SageMaker endpoint is in service, we will use the endpoint to do inference.



In [None]:
import boto3
import base64
import json


client = boto3.client("sagemaker-runtime")

payload = '{"data": [{ "features": ["5.1","3.5","1.4","0.2","Iris-setosa"]}]}'

response = client.invoke_endpoint(
 EndpointName=endpoint_name,
 ContentType="application/json",
 TargetModel="iris_rf_1.tar.gz", # this is the rest of the S3 path where the model artifacts are located
 Body=payload,
)
body = response["Body"].read()
print(body)

### Add models to the endpoint
We can add more models to the endpoint without having to update the endpoint. To demonstrate hosting multiple models behind the endpoint, this model is duplicated 10 times with a slightly different name in S3. In a more realistic scenario, these could be 10 new different models.

In [None]:
with tarfile.open("data/iris_rf.tar.gz", "w:gz") as tar:
 tar.add("data/iris_rf.pmml", arcname=".")

file = "data/iris_rf.tar.gz"

for x in range(0, 30):
 s3_file_name = "demo-subfolder/iris_rf_{}.tar.gz".format(x)
 key = os.path.join(prefix, s3_file_name)
 with open(file, "rb") as file_obj:
 s3.Bucket(bucket_name).Object(key).upload_fileobj(file_obj)
 models.add(s3_file_name)

print("Number of models: {}".format(len(models)))
print("Models: {}".format(models))

After uploading the SqueezeNet models to S3, we will invoke the endpoint 100 times, randomly choosing from one of the 12 models behind the S3 prefix for each invocation, and keeping a count of the label with the highest probability on each invoke response.

In [None]:
%%time

import random
from collections import defaultdict

results = defaultdict(int)

for x in range(0, 30):
 target_model = random.choice(tuple(models))
 response = client.invoke_endpoint(
 EndpointName=endpoint_name,
 ContentType="application/json",
 TargetModel=target_model,
 Body=payload,
 )

 # results[json.loads(response["Body"]] += 1

# print(*results.items(), sep="\n")

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/inference|structured|realtime|byoc|byoc-mme-java|JPMML_Models_SageMaker.ipynb)
