### Grab data

Commentary:

The popular [Abalone](https://archive.ics.uci.edu/ml/datasets/Abalone) data set originally from the UCI data repository \[1\] will be used.

> \[1\] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

In [None]:
from pathlib import Path
import boto3

for p in ['raw_data', 'training_data', 'validation_data']:
 Path(p).mkdir(exist_ok=True)

s3 = boto3.client('s3')
s3.download_file('sagemaker-sample-files', 'datasets/tabular/uci_abalone/abalone.libsvm', 'raw_data/abalone')

### Prepare training and validation data

In [None]:
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
from sklearn.model_selection import train_test_split

X, y = load_svmlight_file('raw_data/abalone')
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=1984, shuffle=True)

dump_svmlight_file(x_train, y_train, 'training_data/abalone.train')
dump_svmlight_file(x_test, y_test, 'validation_data/abalone.test')


### Train model

Commentary:

Notice that the [SageMaker XGBoost container](https://github.com/aws/sagemaker-xgboost-container) framework version is set to be `1.2-1`. This is extremely important – the older `0.90-2` version will NOT work with SageMaker Neo out of the box. This is because in February of 2021, the SageMaker Neo team updated their XGBoost library version to `1.2` and backwards compatibility was not kept.

Moreover, notice that we are using the open source XGBoost algorithm version, so we must provide our own training script and model loading function. These two required components are defined in `entrypoint.py`, which is part of the `neo-blog` repository. The training script is very basic, and the inspiration was taken from another sample notebook [here](https://github.com/aws/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone_dist_script_mode.ipynb). Please note also that for `instance_count` and `instance_type`, the values are `1` and `local`, respectively, which means that the training job will run locally on our notebook instance. This is beneficial because it eliminates the startup time of training instances when a job runs remotely instead.

Finally, notice that the number of boosting rounds has been set to 10,000. This means that the model will consist of 10,000 individual trees and will be computationally expensive to run, which we want for load testing purposes. A side effect will be that the model will severely overfit on the training data, but that is okay since accuracy is not a priority here. A computationally expensive model could have also been achieved by increasing the `max_depth` parameter as well.


In [None]:
import sagemaker
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput

bucket = Session().default_bucket()
role = sagemaker.get_execution_role()

# initialize hyperparameters
hyperparameters = {
 "max_depth":"5",
 "eta":"0.2",
 "gamma":"4",
 "min_child_weight":"6",
 "subsample":"0.7",
 "verbosity":"1",
 "objective":"reg:squarederror",
 "num_round":"10000"
}

# construct a SageMaker XGBoost estimator
# specify the entry_point to your xgboost training script
estimator = XGBoost(entry_point = "entrypoint.py", 
 framework_version='1.2-1', # 1.x MUST be used 
 hyperparameters=hyperparameters,
 role=role,
 instance_count=1,
 instance_type='local',
 output_path=f's3://{bucket}/neo-demo') # gets saved in bucket/neo-demo/job_name/model.tar.gz

# define the data type and paths to the training and validation datasets
content_type = "libsvm"
train_input = TrainingInput('file://training_data', content_type=content_type)
validation_input = TrainingInput('file://validation_data', content_type=content_type)

# execute the XGBoost training job
estimator.fit({'train': train_input, 'validation': validation_input}, logs=['Training'])


### Deploy unoptimized model

Commentary:

There are two interesting things to note here. The first of which is that although the training job was local, the model artifact was still set up to be stored in [Amazon S3](https://aws.amazon.com/s3/) upon job completion. The other peculiarity here is that we must create an `XGBoostModel` object and use its `deploy` method, rather than calling the `deploy` method of the estimator itself. This is due to the fact that we ran the training job in local mode, so the estimator is not aware of any “official” training job that is viewable in the SageMaker console and associable with the model artifact. Because of this, the estimator will error out if its own `deploy` method is used, and the `XGBoostModel` object must be constructed first instead. 

Notice also that we will be hosting the model on a c5 (compute-optimized) instance type. This instance will be particularly well suited for hosting the XGBoost model, since XGBoost by default runs on CPU and it’s a CPU-bound algorithm for inference (on the other hand, during training XGBoost is a memory bound algorithm). The c5.large instance type is also marginally cheaper to run in the us-east-1 region at $0.119 per hour compared to a t2.large at $0.1299 per hour.


In [None]:
from sagemaker.xgboost.model import XGBoostModel

# grab the model artifact that was written out by the local training job
s3_model_artifact = estimator.latest_training_job.describe()['ModelArtifacts']['S3ModelArtifacts']

# we have to switch from local mode to remote mode
xgboost_model = XGBoostModel(
 model_data=s3_model_artifact,
 role=role,
 entry_point="entrypoint.py",
 framework_version='1.2-1',
)

unoptimized_endpoint_name = 'unoptimized-c5'

xgboost_model.deploy(
 initial_instance_count = 1, 
 instance_type='ml.c5.large',
 endpoint_name=unoptimized_endpoint_name
)

### Optimize model with SageMaker Neo

In [None]:
job_name = s3_model_artifact.split("/")[-2]
neo_model = xgboost_model.compile(
 target_instance_family="ml_c5",
 role=role,
 input_shape =f'{{"data": [1, {X.shape[1]}]}}',
 output_path =f's3://{bucket}/neo-demo/{job_name}', # gets saved in bucket/neo-demo/model-ml_c5.tar.gz
 framework = "xgboost",
 job_name=job_name # what it shows up as in console
)

### Deploy Neo model

In [None]:

optimized_endpoint_name = 'neo-optimized-c5'

neo_model.deploy(
 initial_instance_count = 1, 
 instance_type='ml.c5.large',
 endpoint_name=optimized_endpoint_name
)


### Validate that endpoints are working

In [None]:
import boto3

smr = boto3.client('sagemaker-runtime')

resp = smr.invoke_endpoint(EndpointName='neo-optimized-c5', Body=b'2,0.675,0.55,0.175,1.689,0.694,0.371,0.474', ContentType='text/csv')
print('neo-optimized model response: ', resp['Body'].read())
resp = smr.invoke_endpoint(EndpointName='unoptimized-c5', Body=b'2,0.675,0.55,0.175,1.689,0.694,0.371,0.474', ContentType='text/csv')
print('unoptimized model response: ', resp['Body'].read())

### Create CloudWatch dashboard for monitoring performance

In [None]:
import json

cw = boto3.client('cloudwatch')

dashboard_name = 'NeoDemo'
region = Session().boto_region_name # get region we're currently in

body = {
 "widgets": [
 {
 "type": "metric",
 "x": 0,
 "y": 0,
 "width": 24,
 "height": 12,
 "properties": {
 "metrics": [
 [ "AWS/SageMaker", "Invocations", "EndpointName", optimized_endpoint_name, "VariantName", "AllTraffic", { "stat": "Sum", "yAxis": "left" } ],
 [ "...", unoptimized_endpoint_name, ".", ".", { "stat": "Sum", "yAxis": "left" } ],
 [ ".", "ModelLatency", ".", ".", ".", "." ],
 [ "...", optimized_endpoint_name, ".", "." ],
 [ "/aws/sagemaker/Endpoints", "CPUUtilization", ".", ".", ".", ".", { "yAxis": "right" } ],
 [ "...", unoptimized_endpoint_name, ".", ".", { "yAxis": "right" } ]
 ],
 "view": "timeSeries",
 "stacked": False,
 "region": region,
 "stat": "Average",
 "period": 60,
 "title": "Performance Metrics",
 "start": "-PT1H",
 "end": "P0D"
 }
 }
 ]
}

cw.put_dashboard(DashboardName=dashboard_name, DashboardBody=json.dumps(body))

print('link to dashboard:')
print(f'https://console.aws.amazon.com/cloudwatch/home?region={region}#dashboards:name={dashboard_name}')

### Install node.js

In [None]:
%conda install -c conda-forge nodejs 

### Validate successful installation

In [None]:
!node --version

### Install Serverless framework and Serverless Artillery

In [None]:
!npm install -g serverless@1.80.0 serverless-artillery@0.4.9

### Validate successful installations

In [None]:
!serverless --version

In [None]:
!slsart --version

### Deploy Serverless Artillery

Commentary:

The most important file that makes up part of the load generating function under the `serverless_artillery` directory is `processor.js`, which is responsible for generating the payload body and signed headers of each request that gets sent to the SageMaker endpoints. Please take a moment to review the file’s contents. In it, you’ll see that we’re manually signing our requests using the AWS Signature Version 4 algorithm. When you use any AWS SDK like [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html), your requests are automatically signed for you by the library. Here, however, we are directly interacting with AWS’s SageMaker API endpoints, so we must sign requests ourselves. The access keys and session token of the load-generating lambda function’s role are used to sign the request, and the role is given permissions to invoke SageMaker endpoints in its role statements (defined in serverless.yml on line 18). When a request is sent, AWS will first validate the signed headers, then validate that the assumed role has permission to invoke endpoints, and then finally let the request from the Lambda to pass through. 


In [None]:
!cd serverless_artillery && npm install && slsart deploy --stage dev

### Create Serverless Artillery load test script

In [None]:
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writefilewithvariables(line, cell):
 with open(line, 'w') as f:
 f.write(cell.format(**globals()))

# Get region that we're currently in
region = Session().boto_region_name

In [None]:
%%writefilewithvariables script.yaml

config:
 variables:
 unoptimizedEndpointName: {unoptimized_endpoint_name} # the xgboost model has 10000 trees
 optimizedEndpointName: {optimized_endpoint_name} # the xgboost model has 10000 trees
 numRowsInRequest: 125 # Each request to the endpoint contains 125 rows 
 target: 'https://runtime.sagemaker.{region}.amazonaws.com'
 phases:
 - duration: 120
 arrivalRate: 20 # 1200 total invocations per minute (600 per endpoint)
 - duration: 120
 arrivalRate: 40 # 2400 total invocations per minute (1200 per endpoint)
 - duration: 120
 arrivalRate: 60 # 3600 total invocations per minute (1800 per endpoint)
 - duration: 120
 arrivalRate: 80 # 4800 invocations per minute (2400 per endpoint... this is the max of the unoptimized endpoint)
 - duration: 120
 arrivalRate: 120 # only the neo endpoint can handle this load...
 - duration: 120
 arrivalRate: 160
 
 processor: './processor.js'
 
scenarios:
 - flow:
 - post:
 url: '/endpoints/{{{{ unoptimizedEndpointName }}}}/invocations'
 beforeRequest: 'setRequest'
 - flow:
 - post:
 url: '/endpoints/{{{{ optimizedEndpointName }}}}/invocations'
 beforeRequest: 'setRequest'


### Perform load tests

In [None]:
!slsart invoke --stage dev --path script.yaml

In [None]:
print("Here's the link to the dashboard again:")
print(f'https://console.aws.amazon.com/cloudwatch/home?region={region}#dashboards:name={dashboard_name}')

### Clean up resources

In [None]:

# delete endpoints and endpoint configurations

sm = boto3.client('sagemaker')

for name in [unoptimized_endpoint_name, optimized_endpoint_name]:
 sm.delete_endpoint(EndpointName=name)
 sm.delete_endpoint_config(EndpointConfigName=name)


In [None]:

# remove serverless artillery resources

!slsart remove --stage dev
