
# Getting Started

ML Ops is gaining a lot of popularity. This example showcases a key piece you can use to construct your automation pipeline. As we can see in the following architecture diagram, you will be deploying an AWS Step Funciton Workflow containing AWS Lambda functions that call Amazon S3, Amazon Personalize, and Amazon SNS APIs.


This package contains the source code of a Step Functions pipeline that is able to perform 
multiple actions within **Amazon Personalize**, including the following:

- Dataset Group creation
- Datasets creation and import
- Solution creation
- Solution version creation
- Campaign creation

**Note**: This notebook is an example of a [Custom Dataset Group and associated resources](https://docs.aws.amazon.com/personalize/latest/dg/custom-dataset-groups.html), please refer to the documentation for more information on [Domain Dataset Groups and Recommenders](https://docs.aws.amazon.com/personalize/latest/dg/domain-dataset-groups.html).

Once the steps are completed, the step functions notifies the users of its completion through the
use of an SNS topic.

The below diagram describes the architecture of the solution:

![Architecture Diagram](../../static/imgs/ml_ops_architecture.png)

The below diagram showcases the StepFunction workflow definition:

![stepfunction definition](../../static/imgs/step_functions.png)



## Uploading data

Let's get the bucket that our cloudformation deployed. We will be uploading our data to this bucket, plus the configuration file to trigger the automation

In [None]:
bucket = !aws cloudformation describe-stacks --stack-name id-ml-ops --query "Stacks[0].Outputs[?OutputKey=='InputBucketName'].OutputValue" --output text
bucket_name = bucket[0]
print(bucket_name)

Now that we have the bucket name, lets copy over our Media data so we can explore and upload to S3

In [None]:
!cp -R /home/ec2-user/SageMaker/amazon-personalize-immersion-day/automation/ml_ops/domain/Media ./example

In [None]:
# Import Dependencies

import boto3
import json
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import time
import requests
import csv
import sys
import botocore
import uuid
from collections import defaultdict
import random
import numpy as np

from packaging import version
from botocore.exceptions import ClientError
from pathlib import Path

%matplotlib inline

# Setup Clients

personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')
personalize_events = boto3.client('personalize-events')

# We will upload our training data in these files:
raw_items_filename = "example/data/Items/items.csv" # Do Not Change
raw_users_filename = "example/data/Users/users.csv" # Do Not Change
raw_interactions_filename = "example/data/Interactions/interactions.csv" # Do Not Change
items_filename = "items.csv" # Do Not Change
users_filename = "users.csv" # Do Not Change
interactions_filename = "interactions.csv" # Do Not Change


In [None]:
interactions_df = pd.read_csv(raw_interactions_filename)
interactions_df.head()

There are 2 ways of uploading your datasets to S3:
1. Using the boto3 SDK
1. Using the CLI

In this example we are going to use the CLI command

In [None]:
!aws s3 sync ./example/data s3://$bucket_name

## Starting the State Machine Execution

In order to execute the MLOps pipeline we need to provide a parameters file that will tell our state machine which names and configurations we want in our Amazon Personalize deployment.

Let's create a parameters.json file and define our Amazon Personalize resources we want our MLOps pipeline to deploy

In [None]:
params = {
 "datasetGroup": {
 "name": "AP-ML-Ops-1"
 },
 "datasets": {
 "Interactions": {
 "name": "InteractionsDataset",
 "schema": {
 "fields": [
 {
 "name": "USER_ID",
 "type": "string"
 },
 {
 "name": "ITEM_ID",
 "type": "string"
 },
 {
 "name": "EVENT_TYPE",
 "type": "string"
 },
 {
 "name": "TIMESTAMP",
 "type": "long"
 }
 ],
 "name": "Interactions",
 "namespace": "com.amazonaws.personalize.schema",
 "type": "record",
 "version": "1.0"
 }
 },
 "Items": {
 "name": "ItemsDataset",
 "schema": {
 "fields": [
 {
 "name": "ITEM_ID",
 "type": "string"
 },
 {
 "categorical": True,
 "name": "GENRES",
 "type": "string"
 },
 {
 "name": "YEAR",
 "type": "int"
 }
 ],
 "name": "Items",
 "namespace": "com.amazonaws.personalize.schema",
 "type": "record",
 "version": "1.0"
 }
 }
 },
 "solutions": {
 "sims": {
 "name": "na-simsCampaign-1",
 "recipeArn": "arn:aws:personalize:::recipe/aws-sims"
 }
 },
 "campaigns": {
 "simsCampaign": {
 "minProvisionedTPS": 1,
 "name": "na-simsCampaign-1"
 }
 },
 "eventTracker": {
 "name": "AutomationImmersionDayEventTracker-1"
 }
}

In [None]:
print(json.dumps(params, indent=4, sort_keys=True))

This parameters file will create a dataset group containing a campaign exposing a solution trained with the user-personalization recipe

## Updating and uploading your parameters file to S3

First let's write the file locally

In [None]:
with open('example/params.json', 'w') as outfile:
 json.dump(params, outfile)

Now we can upload this file to S3, we are going to be using the CLI to do so

In [None]:
!aws s3 cp ./example/params.json s3://$bucket_name

## Validating your MLOps pipeline

Lets take a look at the stepfunctions execution.

In [None]:
client = boto3.client('stepfunctions')
stateMachineArn = !aws cloudformation describe-stacks --stack-name id-ml-ops --query "Stacks[0].Outputs[?OutputKey=='DeployStateMachineArn'].OutputValue" --output text
stateMachineArn= stateMachineArn[0]
stateMachineArn

In [None]:
executions_response = client.list_executions(
 stateMachineArn=stateMachineArn,
 statusFilter='RUNNING',
 maxResults=2
)
print(json.dumps(executions_response, indent=4, sort_keys=True, default=str))

This step will take at least 30 minutes to complete. 

You can check the status of the state machine execution in the console by:

1. Navigate to the [Step Functions console](https://console.aws.amazon.com/states/home). 


2. Click on the number **1** under the **Running** column

![stepfunction definition](../../static/imgs/step_functions_console.png)

3. Click on the **current execution** that is named after the date

![stepfunction definition](../../static/imgs/step_functions_console_execution.png)

4. Here you can see which steps are currently executing highlighted in blue

![stepfunction definition](../../static/imgs/step_functions_in_progress.png)


This example step function definition will automatically retry each step by querying the describe service APIs with a backoff rate of 1.5, in each retry a new lambda function is executed looking for a success or a failure of a given step.

These step functions will take around 20 minutes to finish executing, which includes importing the datasets, trainign a SIMS solution, and deploying a campaing. **Note:** we are only training a SIMS model due to time constrains.


In [None]:
while ( len(client.list_executions(
 stateMachineArn=stateMachineArn,
 statusFilter='RUNNING',
 maxResults=2
 )['executions']) > 0):
 print ('State Machine is running...')
 time.sleep(60)


### Let's look at the succeeded execution

Once your step functions are done executing, you can list the executions and describe them

In [None]:
executions_response = client.list_executions(
 stateMachineArn=stateMachineArn,
 statusFilter='SUCCEEDED',
 maxResults=2
)
print(json.dumps(executions_response, indent=4, sort_keys=True, default=str))

You can validate your Amazon Personalize deployment by navigating to the [Service Console](https://console.aws.amazon.com/personalize/home) and looking for the dataset group called **AP-ML-Ops-1**

### Let's look at the input that was delivered to the State Machine

As we can see below, this is the input from our Parameters file we uploaded to S3. This input json was then passed to lambda functions in the state machine to utilize across Amazon Personalize APIs

In [None]:
describe_executions_response = client.describe_execution(
 executionArn=executions_response['executions'][0]['executionArn']
)
print(json.dumps(json.loads(describe_executions_response['input']), indent=4, sort_keys=True, default=str))

### Let's look at the time stamps

As we can see below, this is the input from our Parameters file we uploaded to S3. This input json was then passed to lambda functions in the state machine to utilize across Amazon Personalize APIs

In [None]:
print("Start Date:")
print(json.dumps(describe_executions_response['startDate'], indent=4, sort_keys=True, default=str))
print("Stop Date:")
print(json.dumps(describe_executions_response['stopDate'], indent=4, sort_keys=True, default=str))
print("Elapsed Time: ")
elapsed_time = describe_executions_response['stopDate'] - describe_executions_response['startDate']
print(elapsed_time)

As we see above, the whole process does take a significant ammount of time, but now all the steps are fully automated!

If you are interested in deploying this example in your environment, visit our [Github Samples Page](https://github.com/aws-samples/amazon-personalize-samples/tree/master/next_steps/operations/ml_ops) to download the latest codebase.