{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CPG Industry - Personalization Workshop\n", "\n", "Welcome to the CPG Industry Personalization Workshop. In this module we're going to be adding three core personalization features powered by [Amazon Personalize](https://aws.amazon.com/personalize/): related product recommendations on the product detail page, personalized recommendations, and personalized ranking of items. This will allow us to give our users targeted recommendations based on their activity.\n", "This workshop reuse a lot of code and behaviour from Retail Demo Store, if you want to expand to explore retail related cases take a look at: https://github.com/aws-samples/retail-demo-store\n", "\n", "Recommended Time: 2 Hours" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "To run this notebook, you need to have run the previous notebook, 01_Data_Layer, where you created a dataset and imported interaction data into Amazon Personalize. At the end of that notebook, you saved some of the variable values, which you now need to load into this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%store -r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import Dependencies and Setup Boto3 Python Clients\n", "\n", "Throughout this workshop we will need access to some common libraries and clients for connecting to AWS services. We also have to retrieve Uid from a SageMaker notebook instance tag." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import Dependencies\n", "\n", "import boto3\n", "import json\n", "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import time\n", "import requests\n", "import csv\n", "import sys\n", "import botocore\n", "import uuid\n", "\n", "from packaging import version\n", "from random import randint\n", "from botocore.exceptions import ClientError\n", "\n", "%matplotlib inline\n", "\n", "# Setup Clients\n", "\n", "personalize = boto3.client('personalize')\n", "personalize_runtime = boto3.client('personalize-runtime')\n", "personalize_events = boto3.client('personalize-events')\n", "s3 = boto3.client('s3')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Solutions\n", "\n", "With our three datasets imported into our dataset group, we can now turn to training models. As a reminder, we will be training three models in this workshop to support three different personalization use-cases. One model will be used to make related product recommendations on the product detail view/page, another model will be used to make personalized product recommendations to users on the homepage, and the last model will be used to rerank product lists on the category and featured products page. In Amazon Personalize, training a model involves creating a Solution and Solution Version. So when we are finished we will have three solutions and a solution version for each solution. \n", "\n", "When creating a solution, you provide your dataset group and the recipe for training. Let's declare the recipes that we will need for our solutions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List Recipes\n", "\n", "First, let's list all available recipes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "list_recipes_response = personalize.list_recipes()\n", "list_recipes_response" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see above, there are several recipes to choose from. Let's declare the recipes for each Solution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Declare Personalize Recipe for Related Products\n", "\n", "On the product detail page we want to display related products so we'll create a campaign using the [SIMS](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-sims.html) recipe.\n", "\n", "> The Item-to-item similarities (SIMS) recipe is based on the concept of collaborative filtering. A SIMS model leverages user-item interaction data to recommend items similar to a given item. In the absence of sufficient user behavior data for an item, this recipe recommends popular items." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "related_recipe_arn = \"arn:aws:personalize:::recipe/aws-sims\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Declare Personalize Recipe for Product Recommendations\n", "\n", "Since we are providing metadata for users and items, we will be using the [HRNN-Metadata](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-hrnn-metadata.html) recipe for our product recommendations solution.\n", "\n", "> The HRNN-Metadata recipe predicts the items that a user will interact with. It is similar to the HRNN recipe, with additional features derived from contextual, user, and item metadata (from Interactions, Users, and Items datasets, respectively). HRNN-Metadata provides accuracy benefits over non-metadata models when high quality metadata is available. Using this recipe might require longer training times." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "recommend_recipe_arn = \"arn:aws:personalize:::recipe/aws-user-personalization\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Declare Personalize Recipe for Personalized Ranking\n", "\n", "In use-cases where we have a curated list of products, we can use the [Personalized-Ranking](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-search.html) recipe to reorder the products for the current user.\n", "\n", "> The Personalized-Ranking recipe generates personalized rankings. A personalized ranking is a list of recommended items that are re-ranked for a specific user." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ranking_recipe_arn = \"arn:aws:personalize:::recipe/aws-personalized-ranking\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create Solutions and Solution Versions\n", "\n", "With our recipes defined, we can now create our solutions and solution versions. \n", "\n", "First you create a solution using the recipe. Although you provide the dataset ARN in this step, the model is not yet trained. See this as an identifier instead of a trained model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Related Products Solution" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_solution_response = personalize.create_solution(\n", " name = \"cpg-related-products\",\n", " datasetGroupArn = dataset_group_arn,\n", " recipeArn = related_recipe_arn\n", ")\n", "\n", "related_solution_arn = create_solution_response['solutionArn']\n", "print(json.dumps(create_solution_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Related Products Solution Version\n", "\n", "Once you have a solution, you need to create a version in order to complete the model training." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_solution_version_response = personalize.create_solution_version(\n", " solutionArn = related_solution_arn\n", ")\n", "\n", "related_solution_version_arn = create_solution_version_response['solutionVersionArn']\n", "print(json.dumps(create_solution_version_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Product Recommendation Solution" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_solution_response = personalize.create_solution(\n", " name = \"cpg-product-personalization\",\n", " datasetGroupArn = dataset_group_arn,\n", " recipeArn = recommend_recipe_arn\n", ")\n", "\n", "recommend_solution_arn = create_solution_response['solutionArn']\n", "print(json.dumps(create_solution_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Product Recommendation Solution Version\n", "\n", "Once you have a solution, you need to create a version in order to complete the model training." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_solution_version_response = personalize.create_solution_version(\n", " solutionArn = recommend_solution_arn\n", ")\n", "\n", "recommend_solution_version_arn = create_solution_version_response['solutionVersionArn']\n", "print(json.dumps(create_solution_version_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Personalized Ranking Solution" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_solution_response = personalize.create_solution(\n", " name = \"cpg-personalized-ranking\",\n", " datasetGroupArn = dataset_group_arn,\n", " recipeArn = ranking_recipe_arn\n", ")\n", "\n", "ranking_solution_arn = create_solution_response['solutionArn']\n", "print(json.dumps(create_solution_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Personalized Ranking Solution Version\n", "\n", "Once you have a solution, you need to create a version in order to complete the model training." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_solution_version_response = personalize.create_solution_version(\n", " solutionArn = ranking_solution_arn\n", ")\n", "\n", "ranking_solution_version_arn = create_solution_version_response['solutionVersionArn']\n", "print(json.dumps(create_solution_version_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wait for Solution Versions to Complete\n", "\n", "It can take 40-60 minutes for all solution versions to be created. During this process a model is being trained and tested with the data contained within your datasets. The duration of training jobs can increase based on the size of the dataset, training parameters and using AutoML vs. manually selecting a recipe. We submitted requests for all three solutions and versions at once so they are trained in parallel and then below we will wait for all three to finish.\n", "\n", "While you are waiting for this process to complete you can learn more about solutions here: https://docs.aws.amazon.com/personalize/latest/dg/training-deploying-solutions.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View solution creation status in the console\n", "\n", "You can view the status updates in the Amazon Personalize console:\n", "\n", "* In another browser tab you should already have the AWS Console up from opening this notebook instance. \n", "* Switch to that tab and search at the top for the service `Personalize`, then go to that service page. \n", "* Click `View dataset groups`.\n", "* Click the name of your dataset group, most likely something with POC in the name.\n", "* Click `Solutions and recipes`.\n", "* You will now see a list of all of the solutions you created above, including a column with the status of the solution versions. Once it is `Active`, your solution is ready to be reviewed and can also be deployed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Wait for Related Products Solution Version to Have ACTIVE Status" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "soln_ver_arns = [ related_solution_version_arn, recommend_solution_version_arn, ranking_solution_version_arn ]\n", "\n", "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", " for soln_ver_arn in reversed(soln_ver_arns):\n", " soln_ver_response = personalize.describe_solution_version(\n", " solutionVersionArn = soln_ver_arn\n", " )\n", " status = soln_ver_response[\"solutionVersion\"][\"status\"]\n", "\n", " if status == \"ACTIVE\":\n", " print(f'Solution version {soln_ver_arn} successfully completed')\n", " soln_ver_arns.remove(soln_ver_arn)\n", " elif status == \"CREATE FAILED\":\n", " print(f'Solution version {soln_ver_arn} failed')\n", " if soln_ver_response.get('failureReason'):\n", " print(' Reason: ' + soln_ver_response['failureReason'])\n", " soln_ver_arns.remove(soln_ver_arn)\n", "\n", " if len(soln_ver_arns) > 0:\n", " print('At least one solution version is still in progress')\n", " time.sleep(60)\n", " else:\n", " print(\"All solution versions have completed\")\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hyperparameter tuning\n", "\n", "Personalize offers the option of running hyperparameter tuning when creating a solution. Because of the additional computation required to perform hyperparameter tuning, this feature is turned off by default. Therefore, the solutions we created above, will simply use the default values of the hyperparameters for each recipe. For more information about hyperparameter tuning, see the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config-hpo.html).\n", "\n", "If you have settled on the correct recipe to use, and are ready to run hyperparameter tuning, the following code shows how you would do so, using SIMS as an example.\n", "\n", "```python\n", "sims_create_solution_response = personalize.create_solution(\n", " name = \"personalize-poc-sims-hpo\",\n", " datasetGroupArn = dataset_group_arn,\n", " recipeArn = SIMS_recipe_arn,\n", " performHPO=True\n", ")\n", "\n", "sims_solution_arn = sims_create_solution_response['solutionArn']\n", "print(json.dumps(sims_create_solution_response, indent=2))\n", "```\n", "\n", "If you already know the values you want to use for a specific hyperparameter, you can also set this value when you create the solution. The code below shows how you could set the value for the `popularity_discount_factor` for the SIMS recipe.\n", "\n", "```python\n", "sims_create_solution_response = personalize.create_solution(\n", " name = \"personalize-poc-sims-set-hp\",\n", " datasetGroupArn = dataset_group_arn,\n", " recipeArn = SIMS_recipe_arn,\n", " solutionConfig = {\n", " 'algorithmHyperParameters': {\n", " 'popularity_discount_factor': '0.7'\n", " }\n", " }\n", ")\n", "\n", "sims_solution_arn = sims_create_solution_response['solutionArn']\n", "print(json.dumps(sims_create_solution_response, indent=2))\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluate Offline Metrics for Solution Versions\n", "\n", "Amazon Personalize provides [offline metrics](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html#working-with-training-metrics-metrics) that allow you to evaluate the performance of the solution version before you deploy the model in your application. Metrics can also be used to view the effects of modifying a Solution's hyperparameters or to compare the metrics between solutions that use the same training data but created with different recipes.\n", "\n", "Let's retrieve the metrics for the solution versions we just created." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Related Products Metrics" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_solution_metrics_response = personalize.get_solution_metrics(\n", " solutionVersionArn = related_solution_version_arn\n", ")\n", "\n", "print(json.dumps(get_solution_metrics_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Product Recommendations Metrics" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_solution_metrics_response = personalize.get_solution_metrics(\n", " solutionVersionArn = recommend_solution_version_arn\n", ")\n", "\n", "print(json.dumps(get_solution_metrics_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Personalized Ranking Metrics" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_solution_metrics_response = personalize.get_solution_metrics(\n", " solutionVersionArn = ranking_solution_version_arn\n", ")\n", "\n", "print(json.dumps(get_solution_metrics_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We recommend reading [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html) to understand the metrics, but we have also copied parts of the documentation below for convenience.\n", "\n", "You need to understand the following terms regarding evaluation in Personalize:\n", "\n", "* *Relevant recommendation* refers to a recommendation that matches a value in the testing data for the particular user.\n", "* *Rank* refers to the position of a recommended item in the list of recommendations. Position 1 (the top of the list) is presumed to be the most relevant to the user.\n", "* *Query* refers to the internal equivalent of a GetRecommendations call.\n", "\n", "The metrics produced by Personalize are:\n", "* **coverage**: The proportion of unique recommended items from all queries out of the total number of unique items in the training data (includes both the Items and Interactions datasets).\n", "* **mean_reciprocal_rank_at_25**: The [mean of the reciprocal ranks](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation.\n", "* **normalized_discounted_cumulative_gain_at_K**: Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the [cumulative discounted gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) (DCG) at K, each relevant discounted recommendation in the top K recommendations is summed together. The normalized discounted cumulative gain (NDCG) is the DCG divided by the ideal DCG such that NDCG is between 0 - 1. (The ideal DCG is where the top K recommendations are sorted by relevance.) Amazon Personalize uses a weighting factor of 1/log(1 + position), where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.\n", "* **precision_at_K**: The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using evaluation metrics \n", "\n", "It is important to use evaluation metrics carefully. There are a number of factors to keep in mind.\n", "\n", "* If there is an existing recommendation system in place, this will have influenced the user's interaction history which you use to train your new solutions. This means the evaluation metrics are biased to favor the existing solution. If you work to push the evaluation metrics to match or exceed the existing solution, you may just be pushing the User Personalization to behave like the existing solution and might not end up with something better.\n", "* The HRNN Coldstart recipe is difficult to evaluate using the metrics produced by Amazon Personalize. The aim of the recipe is to recommend items which are new to your business. Therefore, these items will not appear in the existing user transaction data which is used to compute the evaluation metrics. As a result, HRNN Coldstart will never appear to perform better than the other recipes, when compared on the evaluation metrics alone. Note: The User Personalization recipe also includes improved cold start functionality\n", "\n", "Keeping in mind these factors, the evaluation metrics produced by Personalize are generally useful for two cases:\n", "1. Comparing the performance of solution versions trained on the same recipe, but with different values for the hyperparameters and features (impression data etc)\n", "1. Comparing the performance of solution versions trained on different recipes (except HRNN Coldstart). Here also keep in mind that the recipes answer different use cases and comparing them to each other might not make sense in your solution.\n", "\n", "Properly evaluating a recommendation system is always best done through A/B testing while measuring actual business outcomes. Since recommendations generated by a system usually influence the user behavior which it is based on, it is better to run small experiments and apply A/B testing for longer periods of time. Over time, the bias from the existing model will fade." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Campaigns\n", "\n", "Once we're satisfied with our solution versions, we need to create Campaigns for each solution version.\n", "\n", "A campaign is a hosted solution version; an endpoint which you can query for recommendations.\n", "\n", "When creating a campaign you specify the minimum transactions per second (`minProvisionedTPS`) that you expect to make against the service for this campaign. Personalize will automatically scale the inference endpoint up and down for the campaign to match demand but will never scale below `minProvisionedTPS`. Pricing is set by estimating throughput capacity (requests from users for personalization per second). For more information, see the [pricing page](https://aws.amazon.com/personalize/pricing/).\n", "\n", "Let's create campaigns for our three solution versions with each set at `minProvisionedTPS` of 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Related Products Campaign\n", "\n", "Deploy a campaign for your SIMS solution version." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_campaign_response = personalize.create_campaign(\n", " name = \"cpg-related-products\",\n", " solutionVersionArn = related_solution_version_arn,\n", " minProvisionedTPS = 1\n", ")\n", "\n", "related_campaign_arn = create_campaign_response['campaignArn']\n", "print(json.dumps(create_campaign_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Product Recommendation Campaign\n", "\n", "Deploy a campaign for your User Personalization solution version." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_campaign_response = personalize.create_campaign(\n", " name = \"cpg-product-personalization\",\n", " solutionVersionArn = recommend_solution_version_arn,\n", " minProvisionedTPS = 1\n", ")\n", "\n", "recommend_campaign_arn = create_campaign_response['campaignArn']\n", "print(json.dumps(create_campaign_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Personalized Ranking Campaign\n", "\n", "Deploy a campaign for your personalized ranking solution version." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_campaign_response = personalize.create_campaign(\n", " name = \"cpg-personalized-ranking\",\n", " solutionVersionArn = ranking_solution_version_arn,\n", " minProvisionedTPS = 1\n", ")\n", "\n", "ranking_campaign_arn = create_campaign_response['campaignArn']\n", "print(json.dumps(create_campaign_response, indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Wait for Related Products Campaign to Have ACTIVE Status\n", "\n", "It can take 20-30 minutes for the campaigns to be fully created. \n", "\n", "While you are waiting for this to complete you can learn more about campaigns in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "campaign_arns = [ related_campaign_arn, recommend_campaign_arn, ranking_campaign_arn ]\n", "\n", "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", " for campaign_arn in reversed(campaign_arns):\n", " campaign_response = personalize.describe_campaign(\n", " campaignArn = campaign_arn\n", " )\n", " status = campaign_response[\"campaign\"][\"status\"]\n", "\n", " if status == \"ACTIVE\":\n", " print(f'Campaign {campaign_arn} successfully completed')\n", " campaign_arns.remove(campaign_arn)\n", " elif status == \"CREATE FAILED\":\n", " print(f'Campaign {campaign_arn} failed')\n", " if campaign_response.get('failureReason'):\n", " print(' Reason: ' + campaign_response['failureReason'])\n", " campaign_arns.remove(campaign_arn)\n", "\n", " if len(campaign_arns) > 0:\n", " print('At least one campaign is still in progress')\n", " time.sleep(60)\n", " else:\n", " print(\"All campaigns have completed\")\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Congratulations you finished the training layer notebook\n", "\n", "Now, lets store all the values needed to continue on the next notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%store related_campaign_arn\n", "%store recommend_campaign_arn\n", "%store ranking_campaign_arn" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_amazonei_mxnet_p36", "language": "python", "name": "conda_amazonei_mxnet_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }