{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CPG Industry - Personalization Workshop\n",
    "\n",
    "Welcome to the CPG Industry Personalization Workshop. In this module we're going to be adding three core personalization features powered by [Amazon Personalize](https://aws.amazon.com/personalize/): related product recommendations on the product detail page, personalized recommendations, and personalized ranking of items. This will allow us to give our users targeted recommendations based on their activity.\n",
    "This workshop reuse a lot of code and behaviour from Retail Demo Store, if you want to expand to explore retail related cases take a look at: https://github.com/aws-samples/retail-demo-store\n",
    "\n",
    "Recommended Time: 2 Hours"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "To run this notebook, you need to have run the previous notebook, 02_Training_Layer, where you created a dataset and imported interaction data into Amazon Personalize. At the end of that notebook, you saved some of the variable values, which you now need to load into this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Dependencies and Setup Boto3 Python Clients\n",
    "\n",
    "Throughout this workshop we will need access to some common libraries and clients for connecting to AWS services. We also have to retrieve Uid from a SageMaker notebook instance tag."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import Dependencies\n",
    "\n",
    "import boto3\n",
    "import json\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "import time\n",
    "import requests\n",
    "import csv\n",
    "import sys\n",
    "import botocore\n",
    "import uuid\n",
    "\n",
    "from packaging import version\n",
    "from random import randint\n",
    "from botocore.exceptions import ClientError\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "# Setup Clients\n",
    "\n",
    "personalize = boto3.client('personalize')\n",
    "personalize_runtime = boto3.client('personalize-runtime')\n",
    "personalize_events = boto3.client('personalize-events')\n",
    "s3 = boto3.client('s3')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implement some visualization functions for displaying information of the products in a dataframe\n",
    "\n",
    "Throughout this workshop we will need to search information of products several times, this function will help us to do it without repeating the same code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Load all the dataset before searching. For users we will use the original that include all the customer data for easier visualization. \n",
    "users_df = pd.read_csv('../../automation/ml_ops/domain/CPG/data/metadata/users-origin.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def search_items_in_dataframe(item_list):\n",
    "    df = pd.DataFrame() \n",
    "    for x in range(len(item_list)):\n",
    "        temp =  get_product_from_id( int(item_list[x]['itemId']) )\n",
    "        df = df.append(temp, ignore_index=True)\n",
    "    pd.set_option('display.max_rows', 10)\n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_product_from_id ( prod_id ):\n",
    "    temp =  products_df.loc[products_df['id'] == prod_id ][[ 'name', 'category','type', 'size', 'sugar']]\n",
    "    return temp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use Campaigns\n",
    "\n",
    "Now that our campaigns have been created in the previous notebook, let's test each campaign and evaluate the results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get recommendations using the Related Product Recommendations Campaign\n",
    "\n",
    "Let's look at the recommendations made by the related items/products campaign by selecting a product from the products dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Select a Product\n",
    "\n",
    "We'll just pick a random product for simplicity. Feel free to change the `product_id` below and execute the following cells with a different product to get a sense for how the recommendations change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "product_id = 10\n",
    "\n",
    "get_product_from_id ( product_id )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Get Related Product Recommendations for Product\n",
    "\n",
    "Now let's call Amazon Personalize to get related item/product recommendations for our product from the related item campaign."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = related_campaign_arn,\n",
    "    itemId = str(product_id),\n",
    "    numResults = 5\n",
    ")\n",
    "\n",
    "item_list = get_recommendations_response['itemList']\n",
    "print(json.dumps(item_list, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A list of ids does not tell a lot from the items, lets find out what they are. \n",
    "search_items_in_dataframe(item_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on the random product selected above, do the similar item recommendations from Personalize make sense? Keep in mind that the similar item recommendations from the SIMS recipe are based on the interactions we generated as input into the solution creation process above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get recommendations using the Product Recommendations Campaign\n",
    "\n",
    "Let's look at the recommendations made by the product recommendations campaign by selecting a user from the users dataset and requesting item recommendations for that user."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Select a User\n",
    "\n",
    "We'll just pick a random user for simplicity. Feel free to change the `user_id` below and execute the following cells with a different user to get a sense for how the recommendations change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_id = 50\n",
    "users_df.loc[users_df['id'] == user_id]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Take note of the `persona` value for the user above. We should see recommendations for products consistent with this persona since we generated historical interactions for products in the categories represented in the persona.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Get Product Recommendations for User\n",
    "\n",
    "Now let's call Amazon Personalize to get recommendations for our user from the product recommendations campaign."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = recommend_campaign_arn,\n",
    "    userId = str(user_id),\n",
    "    numResults = 5\n",
    ")\n",
    "\n",
    "item_list = get_recommendations_response['itemList']\n",
    "print(json.dumps(item_list, indent=4))\n",
    "\n",
    "search_items_in_dataframe(item_list)\n",
    "\n",
    "# saving to compare later\n",
    "pre_real_time_event_recommendations = item_list.copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Are the recommended products consistent with the persona? Note that this is a rather contrived example using a limited amount of generated interaction data without model parameter tuning. The purpose of this notebook is to give you hands on experience building models and retrieving inferences from Amazon Personalize. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get recommendations using the Personalized Ranking Campaign\n",
    "\n",
    "Next let's evaluate the results of the personalized ranking campaign. As a reminder, given a list of items and a user, this campaign will rerank the items based on the preferences of the user."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Get Featured Products List\n",
    "\n",
    "First let's get a list of products from the Products dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "product_list =[]\n",
    "for x in range(10):\n",
    "    product_list.append(str(products_dataset_df.sample().iloc[0][0]))\n",
    "    \n",
    "print(product_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ReRank Featured Products\n",
    "\n",
    "Using the featured products list just retrieved, first we'll call the personalized raking campaign and send the list of item IDs that we want to rerank for a specific user. This reranking will allow us to provide ranked products based on the user's behavior. These behaviors should be consistent the same persona that was mentioned above (since we're going to use the same `user_id`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "users_df.loc[users_df['id'] == user_id]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's have Personalize rank the featured product IDs based on our random user."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = personalize_runtime.get_personalized_ranking(\n",
    "    campaignArn=ranking_campaign_arn,\n",
    "    inputList=product_list,\n",
    "    userId=str(user_id)\n",
    ")\n",
    "print(json.dumps(response['personalizedRanking'], indent = 4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "item_list = response['personalizedRanking']\n",
    "\n",
    "search_items_in_dataframe(item_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Are the reranked results different than the original results from the Search service? Notice that we are also given a score that indicates the recommended ranking across all items in the catalog. Experiment with a different `user_id` in the cells above to see how the item ranking changes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Event Tracking - Keeping up with evolving user intent\n",
    "\n",
    "Up to this point we have trained and deployed three Amazon Personalize campaigns based on historical data that we generated in this workshop. This allows us to make related product, user recommendations, and rerank product lists based on already observed behavior of our users. However, user intent often changes in real-time such that what products the user is interested in now may be different than what they were interested in a week ago, a day ago, or even a few minutes ago. Making recommendations that keep up with evolving user intent is one of the more difficult challenges with personalization. Fortunately, Amazon Personalize has a mechanism for this exact case.\n",
    "\n",
    "Amazon Personalize supports the ability to send real-time user events (i.e. clickstream) data into the service. Personalize uses this event data to improve recommendations. It will also save these events and automatically include them when solutions for the same dataset group are re-created (i.e. model retraining)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Personalize Event Tracker\n",
    "\n",
    "Let's start by creating an event tracker for our dataset group."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "event_tracker_response = personalize.create_event_tracker(\n",
    "    datasetGroupArn=dataset_group_arn,\n",
    "    name='cpg-event-tracker'\n",
    ")\n",
    "\n",
    "event_tracker_arn = event_tracker_response['eventTrackerArn']\n",
    "event_tracking_id = event_tracker_response['trackingId']\n",
    "\n",
    "print('Event Tracker ARN: ' + event_tracker_arn)\n",
    "print('Event Tracking ID: ' + event_tracking_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wait for Event Tracker Status to Become ACTIVE\n",
    "\n",
    "The event tracker should take a minute or so to become active."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "status = None\n",
    "max_time = time.time() + 60*60 # 1 hours\n",
    "while time.time() < max_time:\n",
    "    describe_event_tracker_response = personalize.describe_event_tracker(\n",
    "        eventTrackerArn = event_tracker_arn\n",
    "    )\n",
    "    status = describe_event_tracker_response[\"eventTracker\"][\"status\"]\n",
    "    print(\"EventTracker: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(15)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simulate a user event\n",
    "Now we will send to the tracker a \"ProductViewed\" event, to simulate user interest on a product.\n",
    "Use the same user from previous interactions so you can compare the results of recomendations before and after the \"ProductViewed\" event"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get some random products\n",
    "\n",
    "product_ids_to_view = products_dataset_df['ITEM_ID'].sample(n=5, random_state=1)\n",
    "product_ids_to_view"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for product_id_to_view in product_ids_to_view:\n",
    "    itemSugarLevel = products_dataset_df.loc[products_dataset_df['ITEM_ID'] == product_id_to_view]['SUGAR'].iloc[0]\n",
    "    event = {\n",
    "        \"itemId\": str(product_id_to_view),\n",
    "        \"itemSugarLevel\": itemSugarLevel\n",
    "        }\n",
    "\n",
    "    event_json = json.dumps(event)\n",
    "    print (\"sending product\", event_json)\n",
    "    display (products_dataset_df.loc[products_dataset_df['ITEM_ID'] == product_id_to_view])\n",
    "    \n",
    "    response = personalize_events.put_events(\n",
    "    trackingId = event_tracking_id,\n",
    "    userId = str(user_id),\n",
    "    sessionId = str(uuid.uuid4()),\n",
    "    eventList = [\n",
    "            {\n",
    "                'eventId': str(uuid.uuid4()),\n",
    "                'eventType': 'ProductViewed',\n",
    "                'sentAt': int(time.time()),\n",
    "                'properties': event_json\n",
    "            }\n",
    "        ]\n",
    "    )\n",
    "    \n",
    "    # Wait for ProductViewed event to become consistent.\n",
    "    time.sleep(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at the recommendations we got before sending the events:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "search_items_in_dataframe(pre_real_time_event_recommendations)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = recommend_campaign_arn,\n",
    "    userId = str(user_id),\n",
    "    numResults = 5\n",
    ")\n",
    "\n",
    "item_list = get_recommendations_response['itemList']\n",
    "print(json.dumps(item_list, indent=4))\n",
    "\n",
    "search_items_in_dataframe(item_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the recommendations have updated to reflect the more recent user intent."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Contextual recomendations\n",
    "\n",
    "Now lets explore the possibility of passing contextual information to the recomendation call. Context can be any attribute included in the Interactions dataset used to train the solution, in our case: \"ITEM_SUGAR_LEVEL\". Other useful contextual informacion can be the device or trade channel used to interact, weather information and alike. More information in the [documebntation](https://aws.amazon.com/blogs/machine-learning/increasing-the-relevance-of-your-amazon-personalize-recommendations-by-leveraging-contextual-information/)\n",
    "\n",
    "Try the next section with and without the context parameter in the get_recommendation call and observe the changes in results. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_id = 169\n",
    "users_df.loc[users_df['id'] == user_id]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Recommendations of products with 0 % Sugar. \n",
    "\n",
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = recommend_campaign_arn,\n",
    "    userId = str(user_id),\n",
    "    numResults = 5,\n",
    "    context = {\n",
    "      'ITEM_SUGAR_LEVEL': 'REGULAR'\n",
    "    }\n",
    "\n",
    ")\n",
    "\n",
    "item_list = get_recommendations_response['itemList']\n",
    "print(json.dumps(item_list, indent=4))\n",
    "display(search_items_in_dataframe(item_list))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Recommendations of products with 0 % Sugar. \n",
    "\n",
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = recommend_campaign_arn,\n",
    "    userId = str(user_id),\n",
    "    numResults = 5\n",
    ")\n",
    "\n",
    "item_list = get_recommendations_response['itemList']\n",
    "print(json.dumps(item_list, indent=4))\n",
    "display(search_items_in_dataframe(item_list))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Purchased Products Filter\n",
    "\n",
    "Amazon Personalize supports the ability to create [filters](https://docs.aws.amazon.com/personalize/latest/dg/filter.html) that can be used to exclude items from being recommended that meet a filter expression. For example, we can use a filter to exclude alcoholic beverages in the recomendation for a under age customer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = personalize.create_filter(\n",
    "    name = 'cpg-filter-purchased-products',\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    filterExpression = 'EXCLUDE itemId WHERE ITEMS.CATEGORY in (\"beers\", \"spirits\")'\n",
    ")\n",
    " \n",
    "filter_arn = response['filterArn']\n",
    "print(f'Filter ARN: {filter_arn}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wait for Filter Status to Become ACTIVE\n",
    "\n",
    "The filter should take a minute or so to become active."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "status = None\n",
    "max_time = time.time() + 60*60 # 1 hours\n",
    "while time.time() < max_time:\n",
    "    describe_filter_response = personalize.describe_filter(\n",
    "        filterArn = filter_arn\n",
    "    )\n",
    "    status = describe_filter_response[\"filter\"][\"status\"]\n",
    "    print(\"Filter: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(15)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test Purchased Products Filter\n",
    "\n",
    "To test our purchased products filter, we will request recommendations for user '88'. Persona is spirits_beers_sparkling so her default recomendations is full of alcoholic beverages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Pick a user ID in the range of test users and fetch 5 recommendations.\n",
    "user_id = '88'\n",
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = recommend_campaign_arn,\n",
    "    userId = user_id,\n",
    "    numResults = 5\n",
    ")\n",
    "\n",
    "item_list = get_recommendations_response['itemList']\n",
    "print(json.dumps(item_list, indent=2))\n",
    "display(search_items_in_dataframe(item_list))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's retrieve recommendations for the user again but this time specifying the filter to exclude items from beers and spirits categories. We do this by passing the filter's ARN via the `filterArn` parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = recommend_campaign_arn,\n",
    "    userId = user_id,\n",
    "    numResults = 5,\n",
    "    filterArn = filter_arn\n",
    ")\n",
    "\n",
    "item_list = get_recommendations_response['itemList']\n",
    "print(json.dumps(item_list, indent=2))\n",
    "display(search_items_in_dataframe(item_list))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see the items recommended are consistent with our filter. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Workshop Complete\n",
    "\n",
    "Congratulations! You have completed the CPG Personalization Workshop.\n",
    "\n",
    "### Cleanup\n",
    "\n",
    "If you are working on a personal AWS account **AND** you're done with all workshops, make sure to delete all of the Amazon Personalize resources created by this workshop. You can use the notebook `05_Clean_Up`.\n",
    "\n",
    "If you are participating in an AWS managed event such as a workshop and using an AWS provided temporary account, you can skip the cleanup workshop unless otherwise instructed."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_amazonei_mxnet_p36",
   "language": "python",
   "name": "conda_amazonei_mxnet_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}