Step 6.1: Deploy the Model to Amazon SageMaker Hosting Services

To deploy a model in Amazon SageMaker, hosting services, you can use either the Amazon SageMaker Python SDK or the AWS SDK for Python (Boto 3). This exercise provides code examples for both libraries.

The Amazon SageMaker Python SDK abstracts several implementation details, and is easy to use. If you’re a first-time Amazon SageMaker user, we recommend that you use it. For more information, see https://sagemaker.readthedocs.io/en/stable/overview.html.

Topics + Deploy the Model to Amazon SageMaker Hosting Services (Amazon SageMaker Python SDK) + Deploy the Model to Amazon SageMaker Hosting Services (AWS SDK for Python (Boto 3).)

Deploy the model that you trained in Create and Run a Training Job (Amazon SageMaker Python SDK) by calling the deploy method of the sagemaker.estimator.Estimator object. This is the same object that you used to train the model. When you call the deploy method, specify the number and type of ML instances that you want to use to host the endpoint.

xgb_predictor = xgb_model.deploy(initial_instance_count=1,
                                instance_type='ml.m4.xlarge',
                                )

The deploy method creates the deployable model, configures the Amazon SageMaker hosting services endpoint, and launches the endpoint to host the model. For more information, see https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.Estimator.deploy.

It also returns a sagemaker.predictor.RealTimePredictor object, which you can use to get inferences from the model. For information, see https://sagemaker.readthedocs.io/en/stable/predictors.html#sagemaker.predictor.RealTimePredictor.

Next Step
Step 7: Validate the Model

Deploying a model using the AWS SDK for Python (Boto 3) is a three-step process:

Create a model in Amazon SageMaker – Send a CreateModel request to provide information such as the location of the S3 bucket that contains your model artifacts and the registry path of the image that contains inference code.
Create an endpoint configuration – Send a CreateEndpointConfig request to provide the resource configuration for hosting. This includes the type and number of ML compute instances to launch to deploy the model.
Create an endpoint – Send a CreateEndpoint request to create an endpoint. Amazon SageMaker launches the ML compute instances and deploys the model. Amazon SageMaker returns an endpoint. Applications can send requests for inference to this endpoint.

To deploy the model (AWS SDK for Python (Boto 3))

For each of the following steps, paste the code in a cell in the Jupyter notebook you created in Step 3: Create a Jupyter Notebook and run the cell.

Create a deployable model by identifying the location of model artifacts and the Docker image that contains the inference code.

model_name = training_job_name + '-mod'

info = sm.describe_training_job(TrainingJobName=training_job_name)
model_data = info['ModelArtifacts']['S3ModelArtifacts']
print(model_data)

primary_container = {
    'Image': container,
    'ModelDataUrl': model_data
}

create_model_response = sm.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    PrimaryContainer = primary_container)

print(create_model_response['ModelArn'])

Create an Amazon SageMaker endpoint configuration by specifying the ML compute instances that you want to deploy your model to.

endpoint_config_name = 'DEMO-XGBoostEndpointConfig-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_config_name)
create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType':'ml.m4.xlarge',
        'InitialVariantWeight':1,
        'InitialInstanceCount':1,
        'ModelName':model_name,
        'VariantName':'AllTraffic'}])

print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

Create an Amazon SageMaker endpoint.

%%time
import time

endpoint_name = 'DEMO-XGBoostEndpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print(endpoint_name)
create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print(create_endpoint_response['EndpointArn'])

resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Status: " + status)

while status=='Creating':
    time.sleep(60)
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp['EndpointStatus']
    print("Status: " + status)

print("Arn: " + resp['EndpointArn'])
print("Status: " + status)

This code continuously calls the describe_endpoint command in a while loop until the endpoint either fails or is in service, and then prints the status of the endpoint. When the status changes to InService, the endpoint is ready to serve inference requests.

Next Step
Step 7: Validate the Model