# CPG Industry - Personalization Workshop

Welcome to the CPG Industry Personalization Workshop. In this module we're going to be adding three core personalization features powered by [Amazon Personalize](https://aws.amazon.com/personalize/): related product recommendations on the product detail page, personalized recommendations, and personalized ranking of items. This will allow us to give our users targeted recommendations based on their activity.
This workshop reuse a lot of code and behaviour from Retail Demo Store, if you want to expand to explore retail related cases take a look at: https://github.com/aws-samples/retail-demo-store

Recommended Time: 2 Hours

## Setup

To run this notebook, you need to have run the previous notebook, 02_Training_Layer, where you created a dataset and imported interaction data into Amazon Personalize. At the end of that notebook, you saved some of the variable values, which you now need to load into this notebook.

In [None]:
%store -r

### Import Dependencies and Setup Boto3 Python Clients

Throughout this workshop we will need access to some common libraries and clients for connecting to AWS services. We also have to retrieve Uid from a SageMaker notebook instance tag.

In [None]:
# Import Dependencies

import boto3
import json
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import time
import requests
import csv
import sys
import botocore
import uuid

from packaging import version
from random import randint
from botocore.exceptions import ClientError

%matplotlib inline

# Setup Clients

personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')
personalize_events = boto3.client('personalize-events')
s3 = boto3.client('s3')

### Implement some visualization functions for displaying information of the products in a dataframe

Throughout this workshop we will need to search information of products several times, this function will help us to do it without repeating the same code.

In [None]:
#Load all the dataset before searching. For users we will use the original that include all the customer data for easier visualization. 
users_df = pd.read_csv('../../automation/ml_ops/domain/CPG/data/metadata/users-origin.csv')

In [None]:
def search_items_in_dataframe(item_list):
 df = pd.DataFrame() 
 for x in range(len(item_list)):
 temp = get_product_from_id( int(item_list[x]['itemId']) )
 df = df.append(temp, ignore_index=True)
 pd.set_option('display.max_rows', 10)
 return df

In [None]:
def get_product_from_id ( prod_id ):
 temp = products_df.loc[products_df['id'] == prod_id ][[ 'name', 'category','type', 'size', 'sugar']]
 return temp

## Use Campaigns

Now that our campaigns have been created in the previous notebook, let's test each campaign and evaluate the results.

### Get recommendations using the Related Product Recommendations Campaign

Let's look at the recommendations made by the related items/products campaign by selecting a product from the products dataset.

#### Select a Product

We'll just pick a random product for simplicity. Feel free to change the `product_id` below and execute the following cells with a different product to get a sense for how the recommendations change.

In [None]:
product_id = 10

get_product_from_id ( product_id )

#### Get Related Product Recommendations for Product

Now let's call Amazon Personalize to get related item/product recommendations for our product from the related item campaign.

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
 campaignArn = related_campaign_arn,
 itemId = str(product_id),
 numResults = 5
)

item_list = get_recommendations_response['itemList']
print(json.dumps(item_list, indent=4))

In [None]:
# A list of ids does not tell a lot from the items, lets find out what they are. 
search_items_in_dataframe(item_list)

Based on the random product selected above, do the similar item recommendations from Personalize make sense? Keep in mind that the similar item recommendations from the SIMS recipe are based on the interactions we generated as input into the solution creation process above.

### Get recommendations using the Product Recommendations Campaign

Let's look at the recommendations made by the product recommendations campaign by selecting a user from the users dataset and requesting item recommendations for that user.

#### Select a User

We'll just pick a random user for simplicity. Feel free to change the `user_id` below and execute the following cells with a different user to get a sense for how the recommendations change.

In [None]:
user_id = 50
users_df.loc[users_df['id'] == user_id]

**Take note of the `persona` value for the user above. We should see recommendations for products consistent with this persona since we generated historical interactions for products in the categories represented in the persona.**

#### Get Product Recommendations for User

Now let's call Amazon Personalize to get recommendations for our user from the product recommendations campaign.

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
 campaignArn = recommend_campaign_arn,
 userId = str(user_id),
 numResults = 5
)

item_list = get_recommendations_response['itemList']
print(json.dumps(item_list, indent=4))

search_items_in_dataframe(item_list)

# saving to compare later
pre_real_time_event_recommendations = item_list.copy()

Are the recommended products consistent with the persona? Note that this is a rather contrived example using a limited amount of generated interaction data without model parameter tuning. The purpose of this notebook is to give you hands on experience building models and retrieving inferences from Amazon Personalize. 

### Get recommendations using the Personalized Ranking Campaign

Next let's evaluate the results of the personalized ranking campaign. As a reminder, given a list of items and a user, this campaign will rerank the items based on the preferences of the user.

#### Get Featured Products List

First let's get a list of products from the Products dataset.

In [None]:
product_list =[]
for x in range(10):
 product_list.append(str(products_dataset_df.sample().iloc[0][0]))
 
print(product_list)

#### ReRank Featured Products

Using the featured products list just retrieved, first we'll call the personalized raking campaign and send the list of item IDs that we want to rerank for a specific user. This reranking will allow us to provide ranked products based on the user's behavior. These behaviors should be consistent the same persona that was mentioned above (since we're going to use the same `user_id`).

In [None]:
users_df.loc[users_df['id'] == user_id]

Now let's have Personalize rank the featured product IDs based on our random user.

In [None]:
response = personalize_runtime.get_personalized_ranking(
 campaignArn=ranking_campaign_arn,
 inputList=product_list,
 userId=str(user_id)
)
print(json.dumps(response['personalizedRanking'], indent = 4))

In [None]:
item_list = response['personalizedRanking']

search_items_in_dataframe(item_list)

Are the reranked results different than the original results from the Search service? Notice that we are also given a score that indicates the recommended ranking across all items in the catalog. Experiment with a different `user_id` in the cells above to see how the item ranking changes.

## Event Tracking - Keeping up with evolving user intent

Up to this point we have trained and deployed three Amazon Personalize campaigns based on historical data that we generated in this workshop. This allows us to make related product, user recommendations, and rerank product lists based on already observed behavior of our users. However, user intent often changes in real-time such that what products the user is interested in now may be different than what they were interested in a week ago, a day ago, or even a few minutes ago. Making recommendations that keep up with evolving user intent is one of the more difficult challenges with personalization. Fortunately, Amazon Personalize has a mechanism for this exact case.

Amazon Personalize supports the ability to send real-time user events (i.e. clickstream) data into the service. Personalize uses this event data to improve recommendations. It will also save these events and automatically include them when solutions for the same dataset group are re-created (i.e. model retraining).

### Create Personalize Event Tracker

Let's start by creating an event tracker for our dataset group.

In [None]:

event_tracker_response = personalize.create_event_tracker(
 datasetGroupArn=dataset_group_arn,
 name='cpg-event-tracker'
)

event_tracker_arn = event_tracker_response['eventTrackerArn']
event_tracking_id = event_tracker_response['trackingId']

print('Event Tracker ARN: ' + event_tracker_arn)
print('Event Tracking ID: ' + event_tracking_id)

### Wait for Event Tracker Status to Become ACTIVE

The event tracker should take a minute or so to become active.

In [None]:
status = None
max_time = time.time() + 60*60 # 1 hours
while time.time() < max_time:
 describe_event_tracker_response = personalize.describe_event_tracker(
 eventTrackerArn = event_tracker_arn
 )
 status = describe_event_tracker_response["eventTracker"]["status"]
 print("EventTracker: {}".format(status))
 
 if status == "ACTIVE" or status == "CREATE FAILED":
 break
 
 time.sleep(15)

### Simulate a user event
Now we will send to the tracker a "ProductViewed" event, to simulate user interest on a product.
Use the same user from previous interactions so you can compare the results of recomendations before and after the "ProductViewed" event

In [None]:
# get some random products

product_ids_to_view = products_dataset_df['ITEM_ID'].sample(n=5, random_state=1)
product_ids_to_view

In [None]:
for product_id_to_view in product_ids_to_view:
 itemSugarLevel = products_dataset_df.loc[products_dataset_df['ITEM_ID'] == product_id_to_view]['SUGAR'].iloc[0]
 event = {
 "itemId": str(product_id_to_view),
 "itemSugarLevel": itemSugarLevel
 }

 event_json = json.dumps(event)
 print ("sending product", event_json)
 display (products_dataset_df.loc[products_dataset_df['ITEM_ID'] == product_id_to_view])
 
 response = personalize_events.put_events(
 trackingId = event_tracking_id,
 userId = str(user_id),
 sessionId = str(uuid.uuid4()),
 eventList = [
 {
 'eventId': str(uuid.uuid4()),
 'eventType': 'ProductViewed',
 'sentAt': int(time.time()),
 'properties': event_json
 }
 ]
 )
 
 # Wait for ProductViewed event to become consistent.
 time.sleep(5)

Let's look at the recommendations we got before sending the events:

In [None]:
search_items_in_dataframe(pre_real_time_event_recommendations)

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
 campaignArn = recommend_campaign_arn,
 userId = str(user_id),
 numResults = 5
)

item_list = get_recommendations_response['itemList']
print(json.dumps(item_list, indent=4))

search_items_in_dataframe(item_list)

As you can see, the recommendations have updated to reflect the more recent user intent.

## Contextual recomendations

Now lets explore the possibility of passing contextual information to the recomendation call. Context can be any attribute included in the Interactions dataset used to train the solution, in our case: "ITEM_SUGAR_LEVEL". Other useful contextual informacion can be the device or trade channel used to interact, weather information and alike. More information in the [documebntation](https://aws.amazon.com/blogs/machine-learning/increasing-the-relevance-of-your-amazon-personalize-recommendations-by-leveraging-contextual-information/)

Try the next section with and without the context parameter in the get_recommendation call and observe the changes in results. 

In [None]:
user_id = 169
users_df.loc[users_df['id'] == user_id]

In [None]:
## Recommendations of products with 0 % Sugar. 

get_recommendations_response = personalize_runtime.get_recommendations(
 campaignArn = recommend_campaign_arn,
 userId = str(user_id),
 numResults = 5,
 context = {
 'ITEM_SUGAR_LEVEL': 'REGULAR'
 }

)

item_list = get_recommendations_response['itemList']
print(json.dumps(item_list, indent=4))
display(search_items_in_dataframe(item_list))

In [None]:
## Recommendations of products with 0 % Sugar. 

get_recommendations_response = personalize_runtime.get_recommendations(
 campaignArn = recommend_campaign_arn,
 userId = str(user_id),
 numResults = 5
)

item_list = get_recommendations_response['itemList']
print(json.dumps(item_list, indent=4))
display(search_items_in_dataframe(item_list))

## Create Purchased Products Filter

Amazon Personalize supports the ability to create [filters](https://docs.aws.amazon.com/personalize/latest/dg/filter.html) that can be used to exclude items from being recommended that meet a filter expression. For example, we can use a filter to exclude alcoholic beverages in the recomendation for a under age customer.

In [None]:
response = personalize.create_filter(
 name = 'cpg-filter-purchased-products',
 datasetGroupArn = dataset_group_arn,
 filterExpression = 'EXCLUDE itemId WHERE ITEMS.CATEGORY in ("beers", "spirits")'
)
 
filter_arn = response['filterArn']
print(f'Filter ARN: {filter_arn}')

### Wait for Filter Status to Become ACTIVE

The filter should take a minute or so to become active.

In [None]:
status = None
max_time = time.time() + 60*60 # 1 hours
while time.time() < max_time:
 describe_filter_response = personalize.describe_filter(
 filterArn = filter_arn
 )
 status = describe_filter_response["filter"]["status"]
 print("Filter: {}".format(status))
 
 if status == "ACTIVE" or status == "CREATE FAILED":
 break
 
 time.sleep(15)

### Test Purchased Products Filter

To test our purchased products filter, we will request recommendations for user '88'. Persona is spirits_beers_sparkling so her default recomendations is full of alcoholic beverages.

In [None]:
# Pick a user ID in the range of test users and fetch 5 recommendations.
user_id = '88'
get_recommendations_response = personalize_runtime.get_recommendations(
 campaignArn = recommend_campaign_arn,
 userId = user_id,
 numResults = 5
)

item_list = get_recommendations_response['itemList']
print(json.dumps(item_list, indent=2))
display(search_items_in_dataframe(item_list))

Now, let's retrieve recommendations for the user again but this time specifying the filter to exclude items from beers and spirits categories. We do this by passing the filter's ARN via the `filterArn` parameter.

In [None]:
get_recommendations_response = personalize_runtime.get_recommendations(
 campaignArn = recommend_campaign_arn,
 userId = user_id,
 numResults = 5,
 filterArn = filter_arn
)

item_list = get_recommendations_response['itemList']
print(json.dumps(item_list, indent=2))
display(search_items_in_dataframe(item_list))

You can see the items recommended are consistent with our filter. 

## Workshop Complete

Congratulations! You have completed the CPG Personalization Workshop.

### Cleanup

If you are working on a personal AWS account **AND** you're done with all workshops, make sure to delete all of the Amazon Personalize resources created by this workshop. You can use the notebook `05_Clean_Up`.

If you are participating in an AWS managed event such as a workshop and using an AWS provided temporary account, you can skip the cleanup workshop unless otherwise instructed.