{ "cells": [ { "cell_type": "markdown", "id": "f760989d", "metadata": {}, "source": [ "# Measuring the business impact of your recommendations\n", "\n", "This notebook will walk you through an example of using [metric attributions](https://docs.aws.amazon.com/personalize/latest/dg/metric-attribution-requirements.html) in [Amazon Personalize](https://aws.amazon.com/personalize/) to measure how your personalized recommendaations are helping you achieve your business goals. \n", "\n", "Personalized recommendations increase user engagement with your websites or apps. If you want to meassure the impact of your recommendations on your KPIs [Amazon Personalize metric attributions](https://docs.aws.amazon.com/personalize/latest/dg/UPDATE) can help. For example, a video on demand application may want to see the impact of recommendations on click through rate (CTR) or total time watched. A retail application may want to track a click-through rates and margins.\n", "\n", "![iagram showing how metrics from user interactions can be tracked and exported to Amazon S3 or Amazon CloudWatch](images/metrics-overview.png \"Diagram showing how metrics from user interactions can be tracked and exported to Amazon S3 or Amazon CloudWatch\")\n", "\n", "Figure 1. Feature Overview. The interactions dataset is used to train a recommender or campaign. Then, when users interact with recommended items, these interactions are sent to Amazon Personalize and attributed to the corresponding recommender or campaign. These metrics are then exported to Amazon S3 and Amazon CloudWatch so you can monitor them and compare the metrics of each recommender or campaign.\n", "\n", "![Diagram showing how metrics from user interactions with two scenarios can be tracked and exported to Amazon S3 or Amazon CloudWatch](images/metrics-overview-scenarios.png \"Diagram showing how metrics from user interactions with two scenarios can be tracked and exported to Amazon S3 or Amazon CloudWatch\")\n", "\n", "Figure 2. Measuring the business impact of recommendations in two scenarios. The interactions dataset is used to train two recommenders or campaigns, in this case designated “Blue” and “Orange”. Then, when users interact with recommended items, these interactions are sent to Amazon Personalize and attributed to the corresponding recommender, campaign, or scenario the user was exposed to when they interacted with the item. These metrics are then exported to Amazon S3 and Amazon CloudWatch so you can monitor them and compare the metrics of each recommender or campaign.\n", "\n", "First, we'll follow the steps to build a Domain dataset group. Then we will create the metrics attribution to track two metrics:\n", "\n", "1. the click-though rate for recommendations \n", "1. the total margin from purchases\n", "\n", "Then we will upload synthetic data based on a data set generated for [a ficticious retail store](https://github.com/aws-samples/retail-demo-store). We will we will train a recommender that returns product recommendations. \n", "\n", "We will then interact with this recommender and and see how metrics are exported to Amazon S3.\n", "\n", "The goal is to recommend products that are relevant for each particular user and to measure the impact of the recommendations.\n", "\n", "Finally, we'll cleanup all of the resources we created so we avoid incurring costs for resources that are no longer being used using the notebook [Clean_Up_Resources.ipynb](Clean_Up_Resources.ipynb). \n", "\n", "The estimated time to run through this notebook is about 40 minutes. \n", "\n", "## How to use the Notebook\n", "\n", "The code is broken up into cells like the one below. There's a triangular Run button at the top of this page that you can click to execute each cell and move onto the next, or you can press `Shift` + `Enter` while in the cell to execute it and move onto the next one.\n", "\n", "As a cell is executing you'll notice a line to the side showcase an `*` while the cell is running or it will update to a number to indicate the last cell that completed executing after it has finished exectuting all the code within a cell.\n", "\n", "Simply follow the instructions below and execute the cells to get started.\n", "\n", "## Introduction to Amazon Personalize\n", "\n", "[Amazon Personalize](https://aws.amazon.com/personalize/) makes it easy for customers to develop applications with a wide array of personalization use cases, including real time product recommendations and customized direct marketing. Amazon Personalize brings the same machine learning technology used by Amazon.com to everyone for use in their applications – with no machine learning experience required. Amazon Personalize customers pay for what they use, with no minimum fees or upfront commitment. \n", "\n", "You can start using Amazon Personalize with a simple three step process, which only takes a few clicks in the AWS console, or a set of simple API calls. \n", "\n", "* First, point Amazon Personalize to user data, catalog data, and activity stream of views, clicks, purchases, etc. in Amazon S3 or upload using a simple API call. \n", "\n", "* Second, with a single click in the console or an API call, train a private recommendation model for your data. \n", "\n", "* Third, retrieve personalized recommendations for any user by creating a recommender, and using the GetRecommendations API.\n", "\n", "If you are not familiar with Amazon Personalize, you can learn more about the service on by looking at [Github Sample Notebooks](https://github.com/aws-samples/amazon-personalize-samples) and [Product Documentation](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html)." ] }, { "cell_type": "markdown", "id": "6a7ca987", "metadata": {}, "source": [ "## Imports\n", "Python ships with a broad collection of libraries and we need to import those as well as the ones installed to help us like [boto3](https://aws.amazon.com/sdk-for-python/) (AWS SDK for python) and [Pandas](https://pandas.pydata.org/)/[Numpy](https://numpy.org/) which are core data science tools." ] }, { "cell_type": "code", "execution_count": null, "id": "86eb0420", "metadata": {}, "outputs": [], "source": [ "# Get the latest version of botocore to ensure we have the latest features in the SDK\n", "import sys\n", "!{sys.executable} -m pip install --upgrade pip\n", "!{sys.executable} -m pip install --upgrade --no-deps --force-reinstall botocore" ] }, { "cell_type": "code", "execution_count": null, "id": "fca0fd16", "metadata": {}, "outputs": [], "source": [ "# Imports\n", "import boto3\n", "import json\n", "import numpy as np\n", "import pandas as pd\n", "import time\n", "import uuid\n", "import datetime\n", "import matplotlib.pyplot as plt\n", "import matplotlib.dates as md" ] }, { "cell_type": "markdown", "id": "96c423e7", "metadata": {}, "source": [ "## Specify the AWS region we will use\n", "This is the region where the Dataset Group will be Created" ] }, { "cell_type": "code", "execution_count": null, "id": "cbbac3b9", "metadata": {}, "outputs": [], "source": [ "# Sets the same region as current Amazon SageMaker Notebook\n", "with open('/opt/ml/metadata/resource-metadata.json') as notebook_info:\n", " data = json.load(notebook_info)\n", " resource_arn = data['ResourceArn']\n", " region = resource_arn.split(':')[3]\n", "print('region:', region)\n", "\n", "# or you can set your own region using:\n", "# region = \"us-west-2\"" ] }, { "cell_type": "code", "execution_count": null, "id": "9af0b2bb", "metadata": {}, "outputs": [], "source": [ "# Configure the SDK to Personalize:\n", "personalize = boto3.client('personalize')\n", "personalize_runtime = boto3.client('personalize-runtime')\n", "\n", "# Establish a connection to Personalize's event streaming\n", "personalize_events = boto3.client(service_name='personalize-events')" ] }, { "cell_type": "markdown", "id": "12983cae", "metadata": {}, "source": [ "Next you will want to validate that your environment can communicate successfully with Amazon Personalize, the lines below do just that." ] }, { "cell_type": "code", "execution_count": null, "id": "df13c737", "metadata": {}, "outputs": [], "source": [ "personalize.list_dataset_groups()" ] }, { "cell_type": "markdown", "id": "479e3ff5", "metadata": {}, "source": [ "Setup names for items and interactions files to use later" ] }, { "cell_type": "code", "execution_count": null, "id": "e44a59b7", "metadata": {}, "outputs": [], "source": [ "interactions_file_path = 'cleaned_interactions_training_data.csv'\n", "items_file_path = 'cleaned_item_training_data.csv'" ] }, { "cell_type": "markdown", "id": "413f3e37", "metadata": {}, "source": [ "## Create the Amazon Personalize role\n", "\n", "Amazon Personalize needs the ability to assume roles in AWS in order to have the permissions to execute certain tasks. Let's create an IAM role and attach the required policies to it so it can access data from S3. The code below attaches very permissive policies, please use more restrictive policies for any production application.\n", "\n", "Note: Make sure the role you are using to run the code in this notebook has the necessary permissions to create a role." ] }, { "cell_type": "code", "execution_count": null, "id": "365fd4fc", "metadata": {}, "outputs": [], "source": [ "iam = boto3.Session().client(\n", " service_name='iam', region_name=region)\n", "account_id = boto3.Session().client(\n", " service_name='sts', region_name=region).get_caller_identity().get('Account')" ] }, { "cell_type": "code", "execution_count": null, "id": "77f9ca6b", "metadata": {}, "outputs": [], "source": [ "role_name = account_id+\"measure-impact-recommendations\"\n", "assume_role_policy_document = {\n", " \"Version\": \"2012-10-17\",\n", " \"Statement\": [\n", " {\n", " \"Sid\": \"\",\n", " \"Effect\": \"Allow\",\n", " \"Principal\": {\n", " \"Service\": [\n", " \"personalize.amazonaws.com\"\n", " ]\n", " },\n", " \"Action\": \"sts:AssumeRole\"\n", " }\n", " ]\n", "}\n", "\n", "# Create or retrieve the role:\n", "try:\n", " create_role_response = iam.create_role(\n", " RoleName = role_name,\n", " AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)\n", " );\n", " role_arn = create_role_response[\"Role\"][\"Arn\"]\n", " \n", "except iam.exceptions.EntityAlreadyExistsException as e:\n", " print('Warning: role already exists: {}\\n'.format(e))\n", " role_arn = iam.get_role(\n", " RoleName = role_name\n", " )[\"Role\"][\"Arn\"];\n", "\n", "print('IAM Role: {}\\n'.format(role_arn))" ] }, { "cell_type": "markdown", "id": "2d9b3675", "metadata": {}, "source": [ "## Attach the policy if it is not previously attached" ] }, { "cell_type": "code", "execution_count": null, "id": "cfb11bf3", "metadata": {}, "outputs": [], "source": [ "policy_arn = \"arn:aws:iam::aws:policy/AmazonS3FullAccess\" #Allows access to any S3 bucket, limit as required\n", "\n", "if (policy_arn in [ x['PolicyArn'] for x in iam.list_attached_role_policies( RoleName = role_name)['AttachedPolicies']]):\n", " print ('The policy {} is already attached to this role.'.format(policy_arn))\n", "else:\n", " print (\"Attaching the role_policy\")\n", " attach_response = iam.attach_role_policy(\n", " RoleName = role_name,\n", " PolicyArn = \"arn:aws:iam::aws:policy/AmazonS3FullAccess\"\n", " );\n", " print (\"30s pause to allow role to be fully consistent.\")\n", " time.sleep(30)\n", " print('Done.')\n", " \n", " \n", "access_cloudwatch_role_policy_document = {\n", " \"Version\": \"2012-10-17\",\n", " \"Statement\": [\n", " {\n", " \"Sid\": \"\",\n", " \"Effect\": \"Allow\",\n", " \"Action\": [\n", " \"cloudwatch:PutMetricData\"\n", " ],\n", " \"Resource\":\"*\"\n", " }\n", " ]\n", "}\n", "\n", "print (\"Attaching the role_policy to Access Cloudwatch\")\n", "response = iam.put_role_policy(\n", " RoleName = role_name,\n", " PolicyName='AccesCloudwatch',\n", " PolicyDocument=json.dumps(access_cloudwatch_role_policy_document)\n", ")\n", "\n", "print (response)" ] }, { "cell_type": "markdown", "id": "0765d343", "metadata": {}, "source": [ "## Specify an S3 Bucket for your data\n", "\n", "Amazon Personalize will need an S3 bucket to act as the source of your data. The code bellow will create a bucket with a unique `bucket_name`. We will aso use this bucket to export data.\n", "\n", "The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources. " ] }, { "cell_type": "code", "execution_count": null, "id": "0e27deee", "metadata": {}, "outputs": [], "source": [ "s3 = boto3.Session().client(\n", " service_name='s3', region_name=region)" ] }, { "cell_type": "code", "execution_count": null, "id": "4ff5e864", "metadata": {}, "outputs": [], "source": [ "bucket_name = account_id + \"-\" + region + \"-measure-impact-recommendations\"\n", "print('bucket_name:', bucket_name)\n", "\n", "try: \n", " if region == \"us-east-1\":\n", " s3.create_bucket(Bucket=bucket_name)\n", " else:\n", " s3.create_bucket(\n", " Bucket = bucket_name,\n", " CreateBucketConfiguration={'LocationConstraint': region}\n", " )\n", "except s3.exceptions.BucketAlreadyOwnedByYou:\n", " print(\"Bucket already exists. Using bucket\", bucket_name)" ] }, { "cell_type": "markdown", "id": "13a1d43c", "metadata": {}, "source": [ "### Update the bucket policy" ] }, { "cell_type": "code", "execution_count": null, "id": "9083eac3", "metadata": {}, "outputs": [], "source": [ "policy = {\n", " \"Version\": \"2012-10-17\",\n", " \"Id\": \"PersonalizeS3BucketAccessPolicy\",\n", " \"Statement\": [\n", " {\n", " \"Sid\": \"PersonalizeS3BucketAccessPolicy\",\n", " \"Effect\": \"Allow\",\n", " \"Principal\": {\n", " \"Service\": [\n", " \"personalize.amazonaws.com\",\n", " ]\n", " },\n", " \"Action\": [\n", " \"s3:GetObject\",\n", " \"s3:PutObject\",\n", " \"s3:ListBucket\"\n", " ],\n", " \"Resource\": [\n", " \"arn:aws:s3:::{}\".format(bucket_name),\n", " \"arn:aws:s3:::{}/*\".format(bucket_name)\n", " ]\n", " }\n", " ]\n", "}\n", "\n", "bucket_current_policy = None\n", "\n", "try:\n", " bucket_current_policy = s3.get_bucket_policy(Bucket=bucket_name)['Policy']\n", " \n", "except s3.exceptions.from_code('NoSuchBucketPolicy') as e: \n", " print(\"There is no current Bucket Policy for bucket \" + bucket_name)\n", " \n", "except Exception as e: \n", " raise(e)\n", "\n", "if (bucket_current_policy and policy == json.loads(bucket_current_policy)):\n", " print (\"The policy is already associated with the S3 Bucket.\")\n", "else:\n", " print (\"Adding the policy to the bucket.\")\n", " print(s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy)))" ] }, { "cell_type": "markdown", "id": "9b430601", "metadata": {}, "source": [ "## Download, Prepare, and Upload Training Data\n", "\n", "We generated the synthetic data based on the code in the [Retail Demo Store project](https://github.com/aws-samples/retail-demo-store). Follow the link to learn more about the data and potential uses.\n", "\n", "First we need to download the data (training data). In this tutorial we'll use the Purchase history from a retail store dataset. \n", "\n", "### Download and Explore the Interactions Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "6ec851d2", "metadata": {}, "outputs": [], "source": [ "!aws s3 cp s3://retail-demo-store-us-east-1/csvs/interactions.csv ." ] }, { "cell_type": "markdown", "id": "ae49d831", "metadata": {}, "source": [ "The dataset has been successfully downloaded as interactions.csv\n", "\n", "Let's learn more about the dataset by viewing its charateristics:" ] }, { "cell_type": "code", "execution_count": null, "id": "475cb0a3", "metadata": { "scrolled": true }, "outputs": [], "source": [ "interaction_data = pd.read_csv('./interactions.csv')\n", "interaction_data" ] }, { "cell_type": "code", "execution_count": null, "id": "711a3292", "metadata": {}, "outputs": [], "source": [ "interaction_data.info()" ] }, { "cell_type": "markdown", "id": "4feef60f", "metadata": {}, "source": [ "The ECOMMERCE recommenders require you to provide specific EVENT_TYPE values in order to understand the context of an interaction. Let's look at what event types are currently in our dataset:" ] }, { "cell_type": "code", "execution_count": null, "id": "383fa167", "metadata": {}, "outputs": [], "source": [ "interaction_data.EVENT_TYPE.value_counts()" ] }, { "cell_type": "markdown", "id": "66032111", "metadata": {}, "source": [ "From the cells above, we've learned that our data has has 5 columns, 675004 rows and the headers are: ITEM_ID, USER_ID, EVENT_TYPE, TIMESTAMP and DISCOUNT.\n", "\n", "The rows look like what we expect, we have:\n", "\n", "* item_id of the product hte user interacted with\n", "* user_id of the user who did the interaction\n", "* event_type that defines the interaction. The event type can be View, AddToCart, StartCheckout, or Purchase\n", "* timestamp in unix format\n", "* wether a discount was applied\n", "\n", "To be compatible with an Amazon Personalize interactions schema, this dataset requires column headings compatible with Amazon Personalize default column names (read about column names [here](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html) )\n" ] }, { "cell_type": "markdown", "id": "0bee3c96", "metadata": {}, "source": [ "### Prepare the Interactions Data\n" ] }, { "cell_type": "markdown", "id": "7d7b067f", "metadata": {}, "source": [ "#### Drop Columns\n", "\n", "Some columns in this dataset would not add value to our model and as such need to be dropped from this dataset. Columns such as *discount*." ] }, { "cell_type": "code", "execution_count": null, "id": "65efe415", "metadata": { "scrolled": true }, "outputs": [], "source": [ "interaction_data.drop(columns=['DISCOUNT'], inplace = True)\n", "\n", "interaction_data.head()" ] }, { "cell_type": "markdown", "id": "9033359c", "metadata": {}, "source": [ "We can see that 'View' and 'Purchase' event are present and we can proceed. " ] }, { "cell_type": "code", "execution_count": null, "id": "2c8ad8ac", "metadata": {}, "outputs": [], "source": [ "interaction_data.to_csv(interactions_file_path)" ] }, { "cell_type": "markdown", "id": "7e13d9db", "metadata": {}, "source": [ "In the cell below, we will write our cleaned data to a file named 'cleaned_interactions_training_data.csv'" ] }, { "cell_type": "markdown", "id": "da108332", "metadata": {}, "source": [ "### Upload Interactions data to S3\n", "Now that our training data is ready for Amazon Personalize,the next step is to upload it to the s3 bucket created earlier" ] }, { "cell_type": "code", "execution_count": null, "id": "f2416342", "metadata": {}, "outputs": [], "source": [ "boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_file_path).upload_file(interactions_file_path)\n", "interactions_s3DataPath = \"s3://\"+bucket_name+\"/\"+interactions_file_path" ] }, { "cell_type": "markdown", "id": "2fd3b13d", "metadata": {}, "source": [ "### Download and Explore the Items Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "89324993", "metadata": {}, "outputs": [], "source": [ "!aws s3 cp s3://retail-demo-store-us-east-1/csvs/items.csv ." ] }, { "cell_type": "markdown", "id": "3bd1fa09", "metadata": {}, "source": [ "The dataset has been successfully downloaded as items.csv\n", "\n", "Lets learn more about the dataset by viewing its charateristics" ] }, { "cell_type": "code", "execution_count": null, "id": "9588db31", "metadata": {}, "outputs": [], "source": [ "items_df = pd.read_csv('./items.csv')\n", "items_df" ] }, { "cell_type": "code", "execution_count": null, "id": "d25b6c88", "metadata": {}, "outputs": [], "source": [ "items_df.info()" ] }, { "cell_type": "markdown", "id": "4fda300e", "metadata": {}, "source": [ "Let's explore the kinds of items included in the dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "3c3cf7bc", "metadata": {}, "outputs": [], "source": [ "items_df.CATEGORY_L1.unique()" ] }, { "cell_type": "code", "execution_count": null, "id": "57a631e4", "metadata": {}, "outputs": [], "source": [ "items_df.CATEGORY_L2.unique()" ] }, { "cell_type": "markdown", "id": "30d55069", "metadata": {}, "source": [ "### Drop Columns\n", "\n", "Some columns in this dataset could add value to our model but are not relevant for this example. For simplicity, we will drop them from this dataset. Columns such as *product_decription*." ] }, { "cell_type": "code", "execution_count": null, "id": "f79effaa", "metadata": {}, "outputs": [], "source": [ "items_df.drop(columns=['PRODUCT_DESCRIPTION'], inplace = True)\n", "items_df.head()" ] }, { "cell_type": "markdown", "id": "ef827e5b", "metadata": {}, "source": [ "Write our cleaned data to a .csv file " ] }, { "cell_type": "code", "execution_count": null, "id": "6dcf134f", "metadata": {}, "outputs": [], "source": [ "items_df.to_csv(items_file_path)" ] }, { "cell_type": "markdown", "id": "2c220f1d", "metadata": {}, "source": [ "#### Add a margin column\n", "\n", "We will add an additional column to represent the margin. In this example we will use a randomly generated number between 0 and 10 to represent the item revenue margin. We will add this column." ] }, { "cell_type": "code", "execution_count": null, "id": "0b10da71", "metadata": {}, "outputs": [], "source": [ "random = np.random.rand( len(items_df.index))\n", "items_df[\"MARGIN\"] = random * 10\n", "items_df.head()" ] }, { "cell_type": "markdown", "id": "9978e484", "metadata": {}, "source": [ "In the cell below, we will write our cleaned data to a file named 'cleaned_item_training_data.csv'" ] }, { "cell_type": "code", "execution_count": null, "id": "1e59f8cc", "metadata": {}, "outputs": [], "source": [ "items_df.to_csv(items_file_path)" ] }, { "cell_type": "markdown", "id": "1938fb97", "metadata": {}, "source": [ "### Upload Items data to S3\n", "Now that our training data is ready for Amazon Personalize,the next step is to upload it to the s3 bucket created earlier" ] }, { "cell_type": "code", "execution_count": null, "id": "0024b96b", "metadata": {}, "outputs": [], "source": [ "boto3.Session().resource('s3').Bucket(bucket_name).Object(items_file_path).upload_file(items_file_path)\n", "items_s3DataPath = \"s3://\"+bucket_name+\"/\"+items_file_path" ] }, { "cell_type": "markdown", "id": "cfbe117d", "metadata": {}, "source": [ "## Create the Dataset Group\n", "The largest grouping in Personalize is a Dataset Group, this will isolate your data, event trackers, solutions, recommenders, and campaigns. Grouping things together that share a common collection of data. Feel free to alter the name below if you'd like. \n", "\n", "When you create a Domain dataset group, you choose your domain. The domain you specify determines the default schemas for datasets and the use cases that are available for recommenders. \n", "\n", "You can find more information about creating a Domain dataset group in [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/create-domain-dataset-group.html).\n", "\n", "### Create Dataset Group" ] }, { "cell_type": "code", "execution_count": null, "id": "1aeed98c", "metadata": { "scrolled": true }, "outputs": [], "source": [ "response = personalize.create_dataset_group(\n", " name='personalize_ecomemerce_ds_group',\n", " domain='ECOMMERCE'\n", ")\n", "\n", "dataset_group_arn = response['datasetGroupArn']\n", "print(json.dumps(response, indent=2))" ] }, { "cell_type": "markdown", "id": "757bcc10", "metadata": {}, "source": [ "Wait for Dataset Group to Have ACTIVE Status\n", "Before we can use the Dataset Group in any items below it must be active, execute the cell below and wait for it to show active." ] }, { "cell_type": "code", "execution_count": null, "id": "01ca00c7", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", " describe_dataset_group_response = personalize.describe_dataset_group(\n", " datasetGroupArn = dataset_group_arn\n", " )\n", " status = describe_dataset_group_response[\"datasetGroup\"][\"status\"]\n", " print(\"DatasetGroup: {}\".format(status))\n", " \n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " \n", " time.sleep(30)" ] }, { "cell_type": "markdown", "id": "c11b6c37", "metadata": {}, "source": [ "## Create Interactions Schema\n", "A core component of how Personalize understands your data comes from the Schema that is defined below. This configuration tells the service how to digest the data provided via your CSV file. Note the columns and types align to what was in the file you created above." ] }, { "cell_type": "code", "execution_count": null, "id": "591cd9ae", "metadata": {}, "outputs": [], "source": [ "interactions_schema = {\n", " \"type\": \"record\",\n", " \"name\": \"Interactions\",\n", " \"namespace\": \"com.amazonaws.personalize.schema\",\n", " \"fields\": [\n", " {\n", " \"name\": \"USER_ID\",\n", " \"type\": \"string\"\n", " },\n", " {\n", " \"name\": \"ITEM_ID\",\n", " \"type\": \"string\"\n", " },\n", " {\n", " \"name\": \"TIMESTAMP\",\n", " \"type\": \"long\"\n", " },\n", " {\n", " \"name\": \"EVENT_TYPE\",\n", " \"type\": \"string\" \n", " }\n", " ],\n", " \"version\": \"1.0\"\n", "}\n", "\n", "\n", "create_schema_response = personalize.create_schema(\n", " name = \"personalize-ecommerce-interactions-schema\",\n", " domain = \"ECOMMERCE\",\n", " schema = json.dumps(interactions_schema)\n", ")\n", "\n", "interaction_schema_arn = create_schema_response['schemaArn']\n", "print(json.dumps(create_schema_response, indent=2))" ] }, { "cell_type": "markdown", "id": "e08ea8bd", "metadata": {}, "source": [ "## Create Items Schema\n", "A core component of how Personalize understands your data comes from the Schema that is defined below. This configuration tells the service how to digest the data provided via your CSV file. Note the columns and types align to what was in the file you created above." ] }, { "cell_type": "code", "execution_count": null, "id": "05ce96f8", "metadata": {}, "outputs": [], "source": [ "items_schema = {\n", " \"type\": \"record\",\n", " \"name\": \"Items\",\n", " \"namespace\": \"com.amazonaws.personalize.schema\",\n", " \"fields\": [\n", " {\n", " \"name\": \"ITEM_ID\",\n", " \"type\": \"string\"\n", " },\n", " {\n", " \"name\": \"PRICE\",\n", " \"type\": \"float\"\n", " },\n", " {\n", " \"name\": \"CATEGORY_L1\",\n", " \"type\": [\"string\"],\n", " \"categorical\": True\n", " },\n", " {\n", " \"name\": \"CATEGORY_L2\",\n", " \"type\": [\"string\"],\n", " \"categorical\": True\n", " },\n", " {\n", " \"name\": \"MARGIN\",\n", " \"type\": \"double\" \n", " }\n", " ],\n", " \"version\": \"1.0\"\n", "}\n", "\n", "create_schema_response = personalize.create_schema(\n", " name = \"personalize-ecommerce-item_group\",\n", " domain = \"ECOMMERCE\",\n", " schema = json.dumps(items_schema)\n", ")\n", "\n", "items_schema_arn = create_schema_response['schemaArn']\n", "\n", "print(json.dumps(create_schema_response, indent=2))" ] }, { "cell_type": "markdown", "id": "0ec1637f", "metadata": {}, "source": [ "## Create Datasets\n", "After the group, the next thing to create is the datasets where your data will be uploaded to in Amazon Personalize." ] }, { "cell_type": "markdown", "id": "173160f0", "metadata": {}, "source": [ "### Create Interactions Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "0cb3878c", "metadata": {}, "outputs": [], "source": [ "dataset_type = \"INTERACTIONS\"\n", "\n", "create_dataset_response = personalize.create_dataset(\n", " name = \"personalize_ecommerce_demo_interactions\",\n", " datasetType = dataset_type,\n", " datasetGroupArn = dataset_group_arn,\n", " schemaArn = interaction_schema_arn\n", ")\n", "\n", "interactions_dataset_arn = create_dataset_response['datasetArn']\n", "print(json.dumps(create_dataset_response, indent=2))" ] }, { "cell_type": "markdown", "id": "4b4ff59a", "metadata": {}, "source": [ "### Create Items Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "0fc2c81c", "metadata": {}, "outputs": [], "source": [ "dataset_type = \"ITEMS\"\n", "\n", "create_dataset_response = personalize.create_dataset(\n", " name = \"personalize_ecommerce_demo_items\",\n", " datasetType = dataset_type,\n", " datasetGroupArn = dataset_group_arn,\n", " schemaArn = items_schema_arn\n", ")\n", "\n", "items_dataset_arn = create_dataset_response['datasetArn']\n", "print(json.dumps(create_dataset_response, indent=2))" ] }, { "cell_type": "markdown", "id": "24982fb9", "metadata": {}, "source": [ "Let's wait until all the datasets have been created." ] }, { "cell_type": "code", "execution_count": null, "id": "f7cff7f2", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "max_time = time.time() + 6*60*60 # 6 hours\n", "while time.time() < max_time:\n", " describe_dataset_response = personalize.describe_dataset(\n", " datasetArn = interactions_dataset_arn\n", " )\n", " status_interaction_dataset = describe_dataset_response[\"dataset\"]['status']\n", " print(\"Interactions Dataset: {}\".format(status_interaction_dataset))\n", " \n", " if status_interaction_dataset == \"ACTIVE\":\n", " print(\"Build succeeded for {}\".format(interactions_dataset_arn))\n", " \n", " elif status_interaction_dataset == \"CREATE FAILED\":\n", " print(\"Build failed for {}\".format(interactions_dataset_arn))\n", " break\n", " \n", " if not status_interaction_dataset == \"ACTIVE\":\n", " print(\"The interaction dataset creation is still in progress\")\n", " else:\n", " print(\"The interaction dataset is ACTIVE\")\n", " \n", "\n", " describe_dataset_response = personalize.describe_dataset(\n", " datasetArn = items_dataset_arn\n", " )\n", " status_item_dataset = describe_dataset_response[\"dataset\"]['status']\n", " print(\"Items Dataset: {}\".format(status_item_dataset))\n", " \n", " if status_item_dataset == \"ACTIVE\":\n", " print(\"Build succeeded for {}\".format(items_dataset_arn))\n", " \n", " elif status_item_dataset == \"CREATE FAILED\":\n", " print(\"Build failed for {}\".format(items_dataset_arn))\n", " break\n", " if not status_item_dataset == \"ACTIVE\":\n", " print(\"The item dataset creation is still in progress\")\n", " else:\n", " print(\"The item dataset is ACTIVE\")\n", " \n", " \n", " if status_interaction_dataset == \"ACTIVE\" and status_item_dataset == \"ACTIVE\":\n", " break\n", " \n", " time.sleep(30) " ] }, { "cell_type": "markdown", "id": "bd358faa", "metadata": {}, "source": [ "## Create a metric attribution\n", "\n", "Before importing our data to Amazon Personalize, we will define the metrics attribution. \n", "\n", "### Important\n", "After you create a metric attribution, Amazon Personalize automatically sends metrics to CloudWatch. if you record events or import incremental bulk dara, you incur some monthly CloudWatch cost per metric. For information about CloudWatch oricing, see the [Amazon CloudWatch pricing page](https://aws.amazon.com/cloudwatch/pricing/). To stop sending metrics to CloudWatch, delete the metric attribution." ] }, { "cell_type": "code", "execution_count": null, "id": "cff159b0", "metadata": {}, "outputs": [], "source": [ "metric_attribution_name = 'my-first-metric-attribution'\n", "\n", "# If metrics attribution already exist, delete it\n", "metric_attribution_list = personalize.list_metric_attributions(datasetGroupArn = dataset_group_arn)['metricAttributions']\n", "\n", "for metric in metric_attribution_list:\n", " if (metric_attribution_name == metric['name']):\n", " response = personalize.delete_metric_attribution(\n", " metricAttributionArn = metric['metricAttributionArn']\n", " ) \n", " print (response)\n", " #wait for metric to delete\n", " time.sleep (60) " ] }, { "cell_type": "markdown", "id": "2de9ef60", "metadata": {}, "source": [ "We will create two metric attributions, one to count the Views, and one to sum the margin each time an item is purchased. we will be using the two expressions `SAMPLECOUNT()` and `SUM(DatasetType.column_name)`. You can create up to 10 different attributions in a dataset group." ] }, { "cell_type": "code", "execution_count": null, "id": "44f781c9", "metadata": {}, "outputs": [], "source": [ "metrics_list = [{ \n", " \"eventType\": \"View\",\n", " \"expression\": \"SAMPLECOUNT()\",\n", " \"metricName\": \"countViews\"\n", " }\n", " ,\n", " {\n", " \"eventType\": \"Purchase\",\n", " \"expression\": \"SUM(ITEMS.MARGIN)\",\n", " \"metricName\": \"sumMargin\"\n", " }\n", "]\n", "\n", "output_config = {\n", " \"roleArn\": role_arn, \n", " \"s3DataDestination\": {\n", " \"path\": \"s3://\"+bucket_name+\"/metrics/\"\n", " }\n", "}\n", "\n", "response = personalize.create_metric_attribution(\n", " name = metric_attribution_name,\n", " datasetGroupArn = dataset_group_arn,\n", " metricsOutputConfig = output_config,\n", " metrics = metrics_list\n", ")\n", "\n", "metric_attribution_arn = response['metricAttributionArn']\n", "\n", "print ('Metric attribution ARN: ' + metric_attribution_arn)\n", "\n", "description = personalize.describe_metric_attribution( \n", " metricAttributionArn = metric_attribution_arn)['metricAttribution']\n", "\n", "print('Name: ' + description['name'])\n", "print('ARN: ' + description['metricAttributionArn'])\n", "print('Status: ' + description['status'])" ] }, { "cell_type": "markdown", "id": "16adda22", "metadata": {}, "source": [ "Let us wait for the metric attribution to be created." ] }, { "cell_type": "code", "execution_count": null, "id": "b219ce57", "metadata": {}, "outputs": [], "source": [ "status = None\n", "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", " describe_metric_response = personalize.describe_metric_attribution(\n", " metricAttributionArn = metric_attribution_arn\n", " )\n", " status = describe_metric_response['metricAttribution']['status']\n", " print(\"Status: {}\".format(status))\n", "\n", " if status == \"ACTIVE\" or status == 'CREATE FAILED':\n", " break\n", "\n", " time.sleep(60)" ] }, { "cell_type": "code", "execution_count": null, "id": "84c16cd3", "metadata": {}, "outputs": [], "source": [ "personalize.list_metric_attributions()['metricAttributions']" ] }, { "cell_type": "markdown", "id": "616f86ce", "metadata": {}, "source": [ "## Import the data\n", "Earlier you created the DatasetGroup and Dataset to house your information, now you will execute an import job that will load the data from S3 into Amazon Personalize for usage building your model.\n", "### Create Interactions Dataset Import Job" ] }, { "cell_type": "code", "execution_count": null, "id": "2e902d38", "metadata": {}, "outputs": [], "source": [ "create_interactions_dataset_import_job_response = personalize.create_dataset_import_job(\n", " jobName = \"personalize_ecommerce_demo_interactions_import\",\n", " datasetArn = interactions_dataset_arn,\n", " dataSource = {\n", " \"dataLocation\": \"s3://{}/{}\".format(bucket_name, interactions_file_path)\n", " },\n", " roleArn = role_arn\n", ")\n", "\n", "dataset_interactions_import_job_arn = create_interactions_dataset_import_job_response['datasetImportJobArn']\n", "print(json.dumps(create_interactions_dataset_import_job_response, indent=2))" ] }, { "cell_type": "markdown", "id": "64d20685", "metadata": {}, "source": [ "### Create Items Dataset Import Job" ] }, { "cell_type": "code", "execution_count": null, "id": "432da810", "metadata": {}, "outputs": [], "source": [ "create_items_dataset_import_job_response = personalize.create_dataset_import_job(\n", " jobName = \"personalize_ecommerce_demo_items_import2\",\n", " datasetArn = items_dataset_arn,\n", " dataSource = {\n", " \"dataLocation\": \"s3://{}/{}\".format(bucket_name, items_file_path)\n", " },\n", " roleArn = role_arn\n", ")\n", "\n", "dataset_items_import_job_arn = create_items_dataset_import_job_response['datasetImportJobArn']\n", "print(json.dumps(create_items_dataset_import_job_response, indent=2))" ] }, { "cell_type": "markdown", "id": "06519930", "metadata": {}, "source": [ "### Wait for Dataset Import Jobs to Have ACTIVE Status\n", "It can take a while before the import jobs complete, please wait until you see that they are active below." ] }, { "cell_type": "code", "execution_count": null, "id": "26369f92", "metadata": {}, "outputs": [], "source": [ "max_time = time.time() + 3*60*60 # 3 hours\n", "\n", "while time.time() < max_time:\n", " describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n", " datasetImportJobArn = dataset_items_import_job_arn\n", " )\n", " status = describe_dataset_import_job_response[\"datasetImportJob\"]['status']\n", " print(\"ItemsDatasetImportJob: {}\".format(status))\n", " \n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " \n", " time.sleep(60)\n", "\n", "while time.time() < max_time:\n", " describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n", " datasetImportJobArn = dataset_interactions_import_job_arn\n", " )\n", " status = describe_dataset_import_job_response[\"datasetImportJob\"]['status']\n", " print(\"InteractionsDatasetImportJob: {}\".format(status))\n", " \n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " \n", " time.sleep(60)" ] }, { "cell_type": "markdown", "id": "8df7d504", "metadata": {}, "source": [ "## Choose a recommender use case\n", "\n", "Each domain has different use cases. When you create a recommender you create it for a specific use case, and each use case has different requirements for getting recommendations." ] }, { "cell_type": "code", "execution_count": null, "id": "dd6bdb74", "metadata": {}, "outputs": [], "source": [ "available_recipes = personalize.list_recipes(domain='ECOMMERCE') # See a list of recommenders for the domain. \n", "display (available_recipes['recipes'])" ] }, { "cell_type": "markdown", "id": "2960a2b6", "metadata": {}, "source": [ "We are going to create 2 recommenders \n", "\n", "* [Recommended For You](https://docs.aws.amazon.com/personalize/latest/dg/ECOMMERCE-use-cases.html#recommended-for-you-use-case). This type of recommender offers personalized recommendations for items based on a user that you specify. \n", "* [Customers who viewed X also viewed](https://docs.aws.amazon.com/personalize/latest/dg/ECOMMERCE-use-cases.html#customers-also-viewed-use-case). This type of recommender offers personalized recommendations for items that customers also viewed based on an item that you specify. \n", "\n", "In both these use cases, Amazon Personalize automatically filters items the user purchased based on the userId that you specify and Purchase events.\n", "\n", "[More use cases per domain](https://docs.aws.amazon.com/personalize/latest/dg/domain-use-cases.html)" ] }, { "cell_type": "markdown", "id": "6b84caed", "metadata": {}, "source": [ "### Create the \"Recommended For You\" recommender" ] }, { "cell_type": "code", "execution_count": null, "id": "ef90d282", "metadata": {}, "outputs": [], "source": [ "create_recommender_response = personalize.create_recommender(\n", " name = 'recommended_for_you_demo',\n", " recipeArn = 'arn:aws:personalize:::recipe/aws-ecomm-recommended-for-you',\n", " datasetGroupArn = dataset_group_arn\n", ")\n", "recommended_for_you_arn = create_recommender_response[\"recommenderArn\"]\n", "print (json.dumps(create_recommender_response))" ] }, { "cell_type": "markdown", "id": "0a724c53", "metadata": {}, "source": [ "### Create the \"Customers who viewed X also viewed\" recommender" ] }, { "cell_type": "code", "execution_count": null, "id": "c7cb6df3", "metadata": {}, "outputs": [], "source": [ "create_recommender_response = personalize.create_recommender(\n", " name = 'recommended_who_viewed_x',\n", " recipeArn = 'arn:aws:personalize:::recipe/aws-ecomm-customers-who-viewed-x-also-viewed',\n", " datasetGroupArn = dataset_group_arn\n", ")\n", "who_viewed_x_arn = create_recommender_response[\"recommenderArn\"]\n", "print (json.dumps(create_recommender_response))" ] }, { "cell_type": "markdown", "id": "78af7483", "metadata": {}, "source": [ "We wait until the recomenders have finished creating and have status `ACTIVE`. We check periodically on the status of the recommender" ] }, { "cell_type": "code", "execution_count": null, "id": "244a86c2", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "max_time = time.time() + 10*60*60 # 10 hours\n", " \n", "while time.time() < max_time:\n", "\n", " version_response = personalize.describe_recommender(\n", " recommenderArn = recommended_for_you_arn\n", " )\n", " status = version_response[\"recommender\"][\"status\"]\n", "\n", " if status == \"ACTIVE\":\n", " print(\"Build succeeded for {}\".format(recommended_for_you_arn))\n", " \n", " elif status == \"CREATE FAILED\":\n", " print(\"Build failed for {}\".format(recommended_for_you_arn))\n", " break\n", "\n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " else:\n", " print('At least one recommender build is still in progress.')\n", " \n", " time.sleep(60)\n", " \n", "while time.time() < max_time:\n", "\n", " version_response = personalize.describe_recommender(\n", " recommenderArn = who_viewed_x_arn\n", " )\n", " status = version_response[\"recommender\"][\"status\"]\n", "\n", " if status == \"ACTIVE\":\n", " print(\"Build succeeded for {}\".format(who_viewed_x_arn))\n", " \n", " elif status == \"CREATE FAILED\":\n", " print(\"Build failed for {}\".format(who_viewed_x_arn))\n", " break\n", "\n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " else:\n", " print('At least one recommender build is still in progress.')\n", " \n", " time.sleep(60)" ] }, { "cell_type": "markdown", "id": "e1146c9e", "metadata": {}, "source": [ "## Getting recommendations with a recommender\n", "Now that the recommender has been trained, lets have a look at the recommendations we can get for our users and items!" ] }, { "cell_type": "code", "execution_count": null, "id": "4aaa6533", "metadata": {}, "outputs": [], "source": [ "# reading the original data in order to have a dataframe that has both item_ids \n", "# and the corresponding titles to make our recommendations easier to read.\n", "items_df = pd.read_csv('./items.csv')\n", "items_df.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "583e35f5", "metadata": {}, "outputs": [], "source": [ "def get_item_by_id(item_id, item_df):\n", " \"\"\"\n", " This takes in an item_id from a recommendation in string format,\n", " converts it to an int, and then does a lookup in a default or specified\n", " dataframe and returns the item description.\n", " \n", " A really broad try/except clause was added in case anything goes wrong.\n", " \n", " Feel free to add more debugging or filtering here to improve results if\n", " you hit an error.\n", " \"\"\"\n", " try:\n", " return items_df.loc[items_df[\"ITEM_ID\"]==str(item_id)]['PRODUCT_DESCRIPTION'].values[0]\n", " except:\n", " print (item_id)\n", " return \"Error obtaining item description\"" ] }, { "cell_type": "code", "execution_count": null, "id": "848b1cc0", "metadata": {}, "outputs": [], "source": [ "def get_category_by_id(item_id, item_df):\n", " \"\"\"\n", " This takes in an item_id from a recommendation in string format,\n", " converts it to an int, and then does a lookup in a default or specified\n", " dataframe and returns the item category.\n", " \n", " A really broad try/except clause was added in case anything goes wrong.\n", " \"\"\"\n", " \n", " try:\n", " return items_df.loc[items_df[\"ITEM_ID\"]==str(item_id)]['CATEGORY_L2'].values[0]\n", " except:\n", " print (item_id)\n", " return \"Error obtaining item category\"\n", " " ] }, { "cell_type": "markdown", "id": "13cb379f", "metadata": {}, "source": [ "Let us get some recommendations from the recommender returning \"Recommended for you\":" ] }, { "cell_type": "code", "execution_count": null, "id": "9a47b1a3", "metadata": {}, "outputs": [], "source": [ "# First pick a user\n", "test_user_id = \"777\" \n", "\n", "# Get recommendations for the user\n", "get_recommendations_response = personalize_runtime.get_recommendations(\n", " recommenderArn = recommended_for_you_arn,\n", " userId = test_user_id,\n", " numResults = 20\n", ")\n", "\n", "# Build a new dataframe for the recommendations\n", "item_list = get_recommendations_response['itemList']\n", "recommendation_id_list = []\n", "recommendation_description_list = []\n", "recommendation_category_list = []\n", "\n", "for item in item_list:\n", " description = get_item_by_id(item['itemId'], items_df)\n", " recommendation_description_list.append(description)\n", " recommendation_id_list.append(item['itemId'])\n", " recommendation_category_list.append(get_category_by_id(item['itemId'], items_df))\n", "\n", "user_recommendations_df = pd.DataFrame(recommendation_id_list, columns = [\"ID\"])\n", "user_recommendations_df[\"description\"] = recommendation_description_list\n", "user_recommendations_df[\"category level 2\"] = recommendation_category_list\n", "\n", "pd.options.display.max_rows =20\n", "display(user_recommendations_df)" ] }, { "cell_type": "markdown", "id": "b426bce0", "metadata": {}, "source": [ "### \"Customers who viewed X also viewed\" recommender\n", "\"Customers who viewed X also viewed\" recommender requires an item and a user as input, and it will return customers also viewed based on an item that you specify.\n", "\n", "The cells below will handle getting recommendations from the \"Customers who viewed X also viewed\" Recommender and rendering the results. Let's see what the recommendations are for an item.\n", "\n", "We will be using the `recommenderArn`, the `itemId`, the `userId`, as well as the number or results we want, numResults.\n", "\n", "#### Select a User\n", "We'll just pick a random user for simplicity (there are about 6,000 users with user IDs assigned in sequential order). Feel free to change the `user_id` below and execute the following cells with a different user to get a sense for how the recommendations change." ] }, { "cell_type": "code", "execution_count": null, "id": "5967b173", "metadata": {}, "outputs": [], "source": [ "user_id = 555" ] }, { "cell_type": "markdown", "id": "17b5a722", "metadata": {}, "source": [ "We'll just pick a random product for simplicity. Feel free to change the `product_id` below and execute the following cells with a different product to get a sense for how the recommendations change." ] }, { "cell_type": "code", "execution_count": null, "id": "8ca34f26", "metadata": {}, "outputs": [], "source": [ "product_id = items_df.sample(1)[\"ITEM_ID\"].values[0]\n", "\n", "print ('productId: ', product_id)\n", "\n", "get_recommendations_response = personalize_runtime.get_recommendations(\n", " recommenderArn = who_viewed_x_arn,\n", " itemId = str(product_id),\n", " userId = str(user_id),\n", " numResults = 40\n", ")\n", "item_list = get_recommendations_response['itemList']\n", "\n", "df = pd.DataFrame()\n", "df['Item'] = [ itm['itemId'] for itm in item_list ]\n", "df['Name'] = [ get_item_by_id( itm['itemId'], items_df) for itm in item_list ]\n", "df['Category'] = [ get_category_by_id ( itm['itemId'], items_df) for itm in item_list ]\n", "display (df)" ] }, { "cell_type": "markdown", "id": "d417d16f", "metadata": {}, "source": [ "This is your first list of recommendations! This list is fine, but it would be better to see the recommendations for similar items render in a nice dataframe. Let's create a helper function to achieve this." ] }, { "cell_type": "code", "execution_count": null, "id": "a5b368ea", "metadata": {}, "outputs": [], "source": [ "# Update DF rendering\n", "pd.set_option('display.max_rows', 30)\n", "\n", "def get_new_recommendations_df_viewed_x(recommendations_df, item_id, user_id):\n", " # Get the item name\n", " original_item_name = get_item_by_id(item_id, items_df)\n", " # Get the recommendations\n", " get_recommendations_response = personalize_runtime.get_recommendations(\n", " recommenderArn = who_viewed_x_arn,\n", " itemId = str(item_id),\n", " userId = str(user_id),\n", " )\n", " # Build a new dataframe of recommendations\n", " item_list = get_recommendations_response['itemList']\n", " recommendation_list = []\n", " for item in item_list:\n", " item_name = get_item_by_id(item['itemId'], items_df)\n", " recommendation_list.append(item_name)\n", " new_rec_df = pd.DataFrame(recommendation_list, columns = [original_item_name])\n", " # Add this dataframe to the old one\n", " recommendations_df = pd.concat([recommendations_df, new_rec_df], axis=1)\n", " return recommendations_df" ] }, { "cell_type": "markdown", "id": "09bdff59", "metadata": {}, "source": [ "Now, let's test the helper function with several different items. Let's sample some data from our dataset to test our \"Customers who viewed X also viewed\" Recommender. Grab 5 random items from our dataframe." ] }, { "cell_type": "code", "execution_count": null, "id": "39fc739c", "metadata": {}, "outputs": [], "source": [ "samples = items_df.sample(5)\n", "samples" ] }, { "cell_type": "code", "execution_count": null, "id": "0b0dd514", "metadata": {}, "outputs": [], "source": [ "viewed_x_also_viewed_recommendations_df = pd.DataFrame()\n", "items = samples.ITEM_ID.tolist()\n", "\n", "for item in items:\n", " viewed_x_also_viewed_recommendations_df = get_new_recommendations_df_viewed_x(viewed_x_also_viewed_recommendations_df, item, user_id)\n", "\n", "viewed_x_also_viewed_recommendations_df.head(10)" ] }, { "cell_type": "markdown", "id": "e48e6b3c", "metadata": {}, "source": [ "## Sending Interaction Data to Amazon Personalize\n", "\n", "For the purposes of this demo, we will simulate sending data to Amazon personalize.\n", "\n", "### Create Personalize Event Tracker\n", "Let's start by creating an event tracker for our dataset group." ] }, { "cell_type": "code", "execution_count": null, "id": "ad3cdf26", "metadata": {}, "outputs": [], "source": [ "event_tracker_response = personalize.create_event_tracker(\n", " datasetGroupArn=dataset_group_arn,\n", " name='amazon-personalize-event-tracker'\n", ")\n", "\n", "event_tracker_arn = event_tracker_response['eventTrackerArn']\n", "event_tracking_id = event_tracker_response['trackingId']\n", "\n", "print('Event Tracker ARN: ' + event_tracker_arn)\n", "print('Event Tracking ID: ' + event_tracking_id)" ] }, { "cell_type": "markdown", "id": "27e174ef", "metadata": {}, "source": [ "### Wait for Event Tracker Status to Become ACTIVE\n", "The event tracker should take a minute or so to become active." ] }, { "cell_type": "code", "execution_count": null, "id": "650d8624", "metadata": {}, "outputs": [], "source": [ "status = None\n", "max_time = time.time() + 60*60 # 1 hours\n", "while time.time() < max_time:\n", " describe_event_tracker_response = personalize.describe_event_tracker(\n", " eventTrackerArn = event_tracker_arn\n", " )\n", " status = describe_event_tracker_response[\"eventTracker\"][\"status\"]\n", " print(\"EventTracker: {}\".format(status))\n", " \n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " \n", " time.sleep(15)" ] }, { "cell_type": "markdown", "id": "500ecbfc", "metadata": {}, "source": [ "### Send Events and attribute them to the \"recommended for you\" recommender" ] }, { "cell_type": "markdown", "id": "7c8699dd", "metadata": {}, "source": [ "Let us generate some events to track. We will create events based on the recommendations by the \"recommended for you\" recommender for several users. \n", "\n", "We will generate events starting now and backdate them decreasing in 2 minute intervals with a random allocation of 'View' and 'Purchase' (80% and 20% respectively)." ] }, { "cell_type": "code", "execution_count": null, "id": "c0d06ff7", "metadata": {}, "outputs": [], "source": [ "test_users = [555, 100, 120, 444, 70, 82]\n", "\n", "today = datetime.datetime.now()\n", "minutes_in_interval = 2\n", "interval_multiplier = 1\n", "\n", "possible_interactions = ['View', 'Purchase']\n", "\n", "for user_id in test_users:\n", " get_recommendations_response = personalize_runtime.get_recommendations(\n", " recommenderArn = recommended_for_you_arn,\n", " userId = str(user_id),\n", " numResults = 30,\n", " )\n", " item_list = get_recommendations_response['itemList']\n", "\n", " df = pd.DataFrame()\n", " df['Item'] = [ itm['itemId'] for itm in item_list ]\n", " df['Name'] = [ get_item_by_id( itm['itemId'], items_df) for itm in item_list ]\n", " df['Category'] = [ get_category_by_id ( itm['itemId'], items_df) for itm in item_list ]\n", "\n", " for itm in item_list:\n", "\n", " time_to_convert = today - datetime.timedelta(minutes = minutes_in_interval * interval_multiplier)\n", " timestamp = int(time.mktime(time_to_convert.timetuple()))\n", "\n", " event = {\n", " 'eventId': str(uuid.uuid4()),\n", " 'eventType': np.random.choice(possible_interactions, p = [0.8, 0.2]),\n", " 'itemId': itm['itemId'],\n", " 'metricAttribution': {\"eventAttributionSource\": \"recommended_for_you\" },\n", " 'sentAt': timestamp\n", " }\n", "\n", " response = personalize_events.put_events(\n", " trackingId=event_tracking_id,\n", " userId = str(user_id),\n", " sessionId = str(uuid.uuid4()),\n", " eventList = [event]\n", " )\n", " interval_multiplier += 1\n" ] }, { "cell_type": "markdown", "id": "7dd02c31", "metadata": {}, "source": [ "### Send Events and attribute them to the \"Customers who viewed X also viewed\" recommender\n", "We will do the same with the \"Customers who viewed X also viewed\" recommender, and select 6 random products. We will create events to track based on the recommendations by the \"Customers who viewed X also viewed\" recommender for several users. \n", "\n", "We will generate events starting now and backdate them decreasing in 2 minute intervals with a random allocation of 'View' and 'Purchase' (80% and 20% respectively)." ] }, { "cell_type": "code", "execution_count": null, "id": "10494f9c", "metadata": {}, "outputs": [], "source": [ "test_products = [\n", " '9dbf7c1a-3936-42ee-8c36-a28e94a35265',\n", " '1d9ab420-01b5-4d5d-b7b9-43c10f4eac84',\n", " 'f5be9f67-8def-405d-af7e-2abf6876277f',\n", " 'e5e842c6-9d8e-4940-a245-a0f02a5552ad',\n", " '08b05a0d-8f0d-412a-a2b1-97aa6c08a03d',\n", " '447bc9e4-c176-4665-892e-2a0f47b3a582']" ] }, { "cell_type": "code", "execution_count": null, "id": "270ef6fc", "metadata": {}, "outputs": [], "source": [ "# we reset the interval multiplier\n", "interval_multiplier = 1\n", "\n", "for product_id in test_products:\n", " get_recommendations_response = personalize_runtime.get_recommendations(\n", " recommenderArn = who_viewed_x_arn,\n", " itemId = str(product_id),\n", " userId = str(user_id),\n", " numResults = 30\n", " )\n", " item_list = get_recommendations_response['itemList']\n", "\n", " df = pd.DataFrame()\n", " df['Item'] = [ itm['itemId'] for itm in item_list ]\n", " df['Name'] = [ get_item_by_id( itm['itemId'], items_df) for itm in item_list ]\n", " df['Category'] = [ get_category_by_id ( itm['itemId'], items_df) for itm in item_list ]\n", "\n", "\n", " for itm in item_list:\n", " time_to_convert = today - datetime.timedelta(minutes = minutes_in_interval * interval_multiplier)\n", " timestamp = int(time.mktime(time_to_convert.timetuple()))\n", "\n", " event = {\n", " 'eventId': str(uuid.uuid4()),\n", " 'eventType': np.random.choice(possible_interactions, p = [0.8, 0.2]),\n", " 'itemId': itm['itemId'],\n", " 'metricAttribution': {\"eventAttributionSource\": \"viewed_X_also_viewed\" },\n", " 'sentAt': timestamp\n", " }\n", "\n", " res = personalize_events.put_events(\n", " trackingId=event_tracking_id,\n", " userId = str(user_id),\n", " sessionId = str(uuid.uuid4()),\n", " eventList = [event]\n", " )\n", " \n", " interval_multiplier += 1" ] }, { "cell_type": "markdown", "id": "083fb534", "metadata": {}, "source": [ "## Inspect data from Amazon CloudWatch\n", "\n", "Now that we have sent events to Amazon Personalize using putEvents, we can inspect the business metrics by querying data from Amazon CloudWatch.\n", "\n", "You can also see this data using the [Amazon Cloudwatch Console](https://aws.amazon.com/console/) under the 'AWS/Personalize' namespace.\n", "\n", "
\n", "Note: Data in CloudWatch can take some time to update, please wait at least 15 minutes to view the metrics.\n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9af9e449", "metadata": {}, "outputs": [], "source": [ "# initialize the cloudwatch client\n", "cloud_watch = boto3.Session().client(\n", " service_name='cloudwatch', region_name=region)" ] }, { "cell_type": "markdown", "id": "d67e1b0d", "metadata": {}, "source": [ "Let us see the metrics we created in Amazon CloudWatch:" ] }, { "cell_type": "code", "execution_count": null, "id": "888d3f7a", "metadata": {}, "outputs": [], "source": [ "# List metrics through the pagination interface\n", "paginator = cloud_watch.get_paginator('list_metrics')\n", "\n", "for response in paginator.paginate(Namespace='AWS/Personalize'):\n", " r1 = response['Metrics']\n", " for met in r1:\n", " if (met['MetricName'] in ['SUMMARGIN', 'COUNTVIEWS']):\n", " print (met)" ] }, { "cell_type": "markdown", "id": "c2ee07ba", "metadata": {}, "source": [ "We can see one metric of each type `COUNTVIEWS` and `SUMMARGIN` for each of our scenarios `RECOMMENDED_FOR_YOU` and `VIEWED_X_ALSO_VIEWED`.\n", "\n", "Next we will query claudwatch for the latest metrics. We will compare the metrics for the two scenarios." ] }, { "cell_type": "code", "execution_count": null, "id": "4148d785", "metadata": {}, "outputs": [], "source": [ "# Define the start and end_times in Unix format this is the timeframe we will query\n", "start_time = int(time.mktime((today + datetime.timedelta(hours = -5)).timetuple()))\n", "end_time = int(time.mktime(today.timetuple()))\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ed80ef9e", "metadata": {}, "outputs": [], "source": [ "# Getting metrics data from CloudWatch\n", "recommended_for_you_sum_response = cloud_watch.get_metric_data(\n", " StartTime=start_time,\n", " EndTime=end_time,\n", " MetricDataQueries= [{\n", " \"Id\": \"q1\",\n", " \"MetricStat\": {\n", " \"Metric\": {\n", " \"Namespace\": 'AWS/Personalize',\n", " \"MetricName\": \"SUMMARGIN\",\n", " \"Dimensions\": [\n", " {\n", " 'Name': 'EventAttributionSource',\n", " 'Value': 'RECOMMENDED_FOR_YOU'\n", " },\n", " {\n", " 'Name': 'DatasetGroupArn', \n", " 'Value': dataset_group_arn\n", " }\n", " ]\n", " },\n", " \"Period\": 900, #minimum value is 900s = 15min\n", " \"Stat\":\"Sum\",\n", " },\n", " }\n", " ]\n", " ) \n", "\n", "print(recommended_for_you_sum_response)\n", "\n", "viewed_also_viewed_sum_response = cloud_watch.get_metric_data(\n", " StartTime=start_time,\n", " EndTime=end_time,\n", " MetricDataQueries= [{\n", " \"Id\": \"q1\",\n", " \"MetricStat\": {\n", " \"Metric\": {\n", " \"Namespace\": 'AWS/Personalize',\n", " \"MetricName\": \"SUMMARGIN\",\n", " \"Dimensions\": [\n", " {\n", " 'Name': 'EventAttributionSource',\n", " 'Value': 'VIEWED_X_ALSO_VIEWED'\n", " },\n", " {\n", " 'Name': 'DatasetGroupArn', \n", " 'Value': dataset_group_arn\n", " }\n", " ]\n", " },\n", " \"Period\": 900, #minimum value is 900s = 15min\n", " \"Stat\":\"Sum\",\n", " },\n", " }\n", " ]\n", " )\n", "print(viewed_also_viewed_sum_response)" ] }, { "cell_type": "code", "execution_count": null, "id": "c0939fa2", "metadata": {}, "outputs": [], "source": [ "# Getting data from the \n", "recommended_for_you_count_response = cloud_watch.get_metric_data(\n", " StartTime=start_time,\n", " EndTime=end_time,\n", " MetricDataQueries= [{\n", " \"Id\": \"q1\",\n", " \"MetricStat\": {\n", " \"Metric\": {\n", " \"Namespace\": 'AWS/Personalize',\n", " \"MetricName\": \"COUNTVIEWS\",\n", " \"Dimensions\": [\n", " {\n", " 'Name': 'EventAttributionSource',\n", " 'Value': 'RECOMMENDED_FOR_YOU'\n", " },\n", " {\n", " 'Name': 'DatasetGroupArn', \n", " 'Value': dataset_group_arn\n", " }\n", " ]\n", " },\n", " \"Period\": 900, #minimum value is 900s = 15min\n", " \"Stat\":\"Sum\",\n", " },\n", " }\n", " ]\n", " )\n", "print(recommended_for_you_count_response)\n", "\n", "viewed_also_viewed_count_response = cloud_watch.get_metric_data(\n", " StartTime=start_time,\n", " EndTime=end_time,\n", " MetricDataQueries= [{\n", " \"Id\": \"q1\",\n", " \"MetricStat\": {\n", " \"Metric\": {\n", " \"Namespace\": 'AWS/Personalize',\n", " \"MetricName\": \"COUNTVIEWS\",\n", " \"Dimensions\": [\n", " {\n", " 'Name': 'EventAttributionSource',\n", " 'Value': 'VIEWED_X_ALSO_VIEWED'\n", " },\n", " {\n", " 'Name': 'DatasetGroupArn', \n", " 'Value': dataset_group_arn\n", " }\n", " ]\n", " },\n", " \"Period\": 900, #minimum value is 900s = 15min\n", " \"Stat\":\"Sum\",\n", " },\n", " }\n", " ]\n", " )\n", "\n", "print(viewed_also_viewed_count_response)" ] }, { "cell_type": "code", "execution_count": null, "id": "1fb6b180", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import matplotlib.dates as md\n", "\n", "plt.xticks( rotation=25 )\n", "ax=plt.gca()\n", "\n", "xfmt = md.DateFormatter('%Y-%m-%d %H:%M:%S')\n", "ax.xaxis.set_major_formatter(xfmt)\n", "\n", "plt.plot (\n", " recommended_for_you_sum_response['MetricDataResults'][0]['Timestamps'], \n", " recommended_for_you_sum_response['MetricDataResults'][0]['Values']\n", ")\n", "plt.plot (\n", " viewed_also_viewed_sum_response['MetricDataResults'][0]['Timestamps'], \n", " viewed_also_viewed_sum_response['MetricDataResults'][0]['Values']\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "51571523", "metadata": {}, "outputs": [], "source": [ "plt.xticks( rotation=25 )\n", "ax=plt.gca()\n", "\n", "xfmt = md.DateFormatter('%Y-%m-%d %H:%M:%S')\n", "ax.xaxis.set_major_formatter(xfmt)\n", "\n", "plt.plot (\n", " recommended_for_you_count_response['MetricDataResults'][0]['Timestamps'], \n", " recommended_for_you_count_response['MetricDataResults'][0]['Values']\n", ")\n", "plt.plot (\n", " viewed_also_viewed_count_response['MetricDataResults'][0]['Timestamps'], \n", " viewed_also_viewed_count_response['MetricDataResults'][0]['Values']\n", ")" ] }, { "cell_type": "markdown", "id": "c230a2a4", "metadata": {}, "source": [ "In this plot we can compare how the margin (because of the purchases done by the customers) differs based on the scenario they were interacting with. \n", "It can also help you understand which rails are more effective." ] }, { "cell_type": "markdown", "id": "610a4725", "metadata": {}, "source": [ "## Review\n", "Using the codes above you have successfully trained a deep learning model to generate item recommendations based on prior user behavior. You have created a recommenders for a foundational use case and you have used metric attributions to extract data to evaluate your model on business metrics.\n", "\n", "Going forward, you can adapt this code to create other recommenders and metric attributions.\n", "\n", "If you are done with this sample, make sure to follow the steps in the next section to cleanup the resources created in this notebook." ] }, { "cell_type": "markdown", "id": "aa077309", "metadata": {}, "source": [ "## Cleanup Resources \n", "This section contains instructions on how to clean up the resources created in this notebook." ] }, { "cell_type": "markdown", "id": "231035cc", "metadata": {}, "source": [ "### Save resource information for cleanup:\n", "There are a few values you will need for the next notebook, execute the cell below to store them so they can be used in the `Clean_Up_Resources.ipynb` notebook.\n", "\n", "This will overwite any data stored for those variables and set them to the values specified in this notebook. " ] }, { "cell_type": "code", "execution_count": null, "id": "c480f708", "metadata": {}, "outputs": [], "source": [ "# store for cleanup\n", "%store dataset_group_arn\n", "%store role_name\n", "%store region" ] }, { "cell_type": "markdown", "id": "31a9eb53", "metadata": {}, "source": [ "### Run the cleanup notebook\n", "\n", "Continue to [Clean_Up_Resources.ipynb](Clean_Up_Resources.ipynb) clean up resources." ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 5 }