# MLOps Demo
This notebook walks you through few of the features of this MLOps SageMaker template.

For details of the use case, the high level architecture check the [README](README.md).

### Prepare the environment

In [None]:
import logging

import requests
import sagemaker
from utils.get_datasets import get_and_upload_data

In [None]:
logger = logging.getLogger(name='project')
sagemaker_session = sagemaker.Session()
boto_session = sagemaker_session.boto_session
sagemaker_client = boto_session.client('sagemaker')
region = sagemaker_session.boto_region_name
ssm = boto_session.client('ssm')

In [None]:
project_name = "" # <--- fill here

## Upload Demo Data

Define where the example data file will be stored in S3

In [None]:
claims_raw_uri = ssm.get_parameter(Name=f"/sagemaker-{project_name}/{project_name}-claims")['Parameter']['Value']
customers_raw_uri = ssm.get_parameter(Name=f"/sagemaker-{project_name}/{project_name}-customers")['Parameter']['Value']

logger.info(f"Claims dataset URI: {claims_raw_uri}")
logger.info(f"Customers dataset URI: {claims_raw_uri}")

Download the data from the Amazon SageMaker Example GitHub repository

In [None]:
base_url = "https://raw.githubusercontent.com/aws/amazon-sagemaker-examples/main/end_to_end/fraud_detection/data/"
file_list = ["claims.csv", "customers.csv"]
uri_list = [claims_raw_uri, customers_raw_uri]

Download the datasets and upload them to the designated URI

In [None]:
for k,j in zip(file_list, uri_list):
 get_and_upload_data(base_url + k, j)

The project creates an EventBrige rule for each Feature Ingestion pipeline. During next scheduled pipelines execution, the data will be transformed and uploaded to the Feature Store.

They are scheduled to run every 12 hours... if you don't wait to wait that long, we can trigger the pipeline run manually with the code below. 

In [None]:
customers_pipeline_name = f'{project_name}-customers-preprocessing'
claims_pipeline_name = f'{project_name}-claims-preprocessing'

customers_pipeline_execution = sagemaker_client.start_pipeline_execution(
 PipelineName=customers_pipeline_name,
 PipelineExecutionDisplayName="ManualExecution",
 PipelineParameters=[
 {"Name": "InputDataUrl", "Value": customers_raw_uri},
 ],
)

claims_pipeline_execution = sagemaker_client.start_pipeline_execution(
 PipelineName=claims_pipeline_name,
 PipelineExecutionDisplayName="ManualExecution",
 PipelineParameters=[
 {"Name": "InputDataUrl", "Value": claims_raw_uri},
 ],
)

⬅️ You can observe the progress of the pipeline by double-cliking on the pipeline name in the `Pipelines` panel on the left hand side.

Once the pipelines executions are completed, We can now confirm data is in the Feature Store

In [None]:
featurestore_runtime = boto_session.client(
 service_name="sagemaker-featurestore-runtime", region_name=region
)
claims_fg_name=f"{project_name}-claims"
customers_fg_name=f"{project_name}-customers"

In [None]:
featurestore_runtime.get_record(
 FeatureGroupName=claims_fg_name,
 RecordIdentifierValueAsString=f"{9}",
 )['Record']

In [None]:
featurestore_runtime.get_record(
 FeatureGroupName=customers_fg_name,
 RecordIdentifierValueAsString=f"{9}",
 )['Record']

## Model Building
With data in the feature store, you can now start the model building pipeline. You can leave the default parameter values.

In [None]:
xgboost_pipeline_name = f"{project_name}-build-xgboost"

In [None]:
sagemaker_client.start_pipeline_execution(
 PipelineName=xgboost_pipeline_name,
 PipelineExecutionDisplayName="ManualExecution",
)

⬅️ You can observe the progress of the pipeline by double-cliking on the pipeline name in the `Pipelines` panel on the left hand side.

Once the model building pipeline execution is completed, you can check the model training metrics from the model registry.

⬅️ You can access the `Model registry` from the panel on the left.

## Model deployment

The template deploys an EventBridge rule that triggers the execution of a CodePipeline when a new version of the model is approved.

⬅️ You can approve the model from the `Model registry` panel...

 ⬇️ or run the cell below to approve the latest unapproved model.

In [None]:
model_package_group_name=f"{project_name}-fraud-classification-xgboost"

model_package_arn = sagemaker_client.list_model_packages(
 ModelPackageGroupName=model_package_group_name,
 ModelApprovalStatus='PendingManualApproval',
 SortBy='CreationTime',
 SortOrder='Descending'
)['ModelPackageSummaryList'][0]['ModelPackageArn']
sagemaker_client.update_model_package(
 ModelPackageArn=model_package_arn,
 ModelApprovalStatus='Approved')

logger.info(f"{model_package_arn} Approved")

### Testing the Real Time Endpoint

In [None]:
try:
 live_endpoint = ssm.get_parameter(Name=f"/sagemaker-{project_name}/{project_name}-xgboost")['Parameter']['Value']
except ssm.exceptions.ParameterNotFound:
 logger.exception("Possibly the Real Time endpoint has not been deployed yet", exc_info=False)

In [None]:
%timeit requests.get(live_endpoint, params=dict(policy_id=1)).json()

In [None]:
preds = [
 requests.get(live_endpoint, params=dict(policy_id=k)).json()
 for k
 in range(1, 6)
]

preds

### Testing Batch Inference

To load the predictions into DynamoDB, you can trigger the batch inference pipeline, either via the Studio UI or with the code below.

In [None]:
batch_pipeline_name = f'{project_name}-batch-transform'
try:
 batch_pipeline_execution = sagemaker_client.start_pipeline_execution(
 PipelineName=batch_pipeline_name,
 PipelineExecutionDisplayName="ManualExecution",
 )
except sagemaker_client.exceptions.ResourceNotFound:
 logger.exception("Possibly the Batch transform stack has not been deployed yet", exc_info=False)

Once the pipeline execution is completed, we can access the cached predictions via the REST endpoint.

In [None]:
try:
 ddb_serving = ssm.get_parameter(Name=f"/sagemaker-{project_name}/{project_name}-batch-transform")['Parameter']['Value']
except ssm.exceptions.ParameterNotFound:
 logger.exception("The serving stack might have not been deployed yet", exc_info=False)

In [None]:
%timeit requests.get(ddb_serving, params=dict(policy_id=3))

In [None]:
preds_cached = [
 requests.get(ddb_serving, params=dict(policy_id=k)).json()
 for k
 in range(1, 6)
]
preds_cached

## Cleanup

Uncomment the code in the following cells and run it to clear the resources created by the SageMaker Project.

Removing all model version and model
packages

In [None]:
# [
# sagemaker_client.delete_model_package(ModelPackageName=k['ModelPackageArn'])
# for k
# in sagemaker_client.list_model_packages(
# ModelPackageGroupName=model_package_group_name,
# )['ModelPackageSummaryList']
# ]
# sagemaker_client.delete_model_package_group(ModelPackageGroupName=model_package_group_name)

You might want to remove the three CloudFormation Stacks created by the project

[CloudFormation Console](https://console.aws.amazon.com/cloudformation/home)

Once the stacks have finished deleting, it is possible to delete the SageMaker Project. This will also trigger the deletion of the CloudFormation Stack.

In [None]:
# sagemaker_client.delete_project(ProjectName=project_name)