{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Amazon Personalize AWS User Personalization + Contextual Recommendations Example"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Introduction \n",
"\n",
"For the most part, the algorithms in Amazon Personalize (called recipes) look to solve different tasks, explained here:\n",
"\n",
"1. **User Personalization** - Recommends items based on previous user interactions with items.\n",
"1. **Personalized-Ranking** - Takes a collection of items and then orders them in probable order of interest using an HRNN-like approach.\n",
"1. **SIMS (Similar Items)** - Given one item, recommends other items also interacted with by users.\n",
"\n",
"No matter the use case, the algorithms all share a base of learning on user-item-interaction data which is defined by 3 core attributes:\n",
"\n",
"1. **UserID** - The user who interacted\n",
"1. **ItemID** - The item the user interacted with\n",
"1. **Timestamp** - The time at which the interaction occurred\n",
"\n",
"\n",
"## Choose a dataset or data source \n",
"[Back to top](#top)\n",
"\n",
"As we mentioned, the user-item-iteraction data is key for getting started with the service. This means we need to look for use cases that generate that kind of data, a few common examples are:\n",
"\n",
"1. Video-on-demand applications\n",
"1. E-commerce platforms\n",
"1. Social media aggregators / platforms\n",
"\n",
"There are a few guidelines for scoping a problem suitable for Personalize. We recommend the values below as a starting point, although the [official limits](https://docs.aws.amazon.com/personalize/latest/dg/limits.html) lie a little lower.\n",
"\n",
"* Authenticated users\n",
"* At least 50 unique users\n",
"* At least 100 unique items\n",
"* At least 2 dozen interactions for each user \n",
"\n",
"Most of the time this is easily attainable, and if you are low in one category, you can often make up for it by having a larger number in another category.\n",
"\n",
"Generally speaking your data will not arrive in a perfect form for Personalize, and will take some modification to be structured correctly. This notebook looks to guide you through all of that. \n",
"\n",
"To begin with, we are going to use an airlines review dataset. A scraped dataset created from all user reviews found on Skytrax (www.airlinequality.com). The data can be found at https://github.com/quankiquanki/skytrax-reviews-dataset "
]
},
{
"cell_type": "code",
"execution_count": 274,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd, numpy as np\n",
"import io\n",
"import scipy.sparse as ss\n",
"import json\n",
"import time\n",
"import datetime\n",
"import os\n",
"import sagemaker.amazon.common as smac\n",
"import boto3\n",
"import uuid\n",
"from botocore.exceptions import ClientError"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import and Explore your dataset"
]
},
{
"cell_type": "code",
"execution_count": 275,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2020-06-18 16:33:44-- https://raw.githubusercontent.com/quankiquanki/skytrax-reviews-dataset/master/data/airline.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.64.133\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.64.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 34752262 (33M) [text/plain]\n",
"Saving to: ‘airline.csv’\n",
"\n",
"airline.csv 100%[===================>] 33.14M 113MB/s in 0.3s \n",
"\n",
"2020-06-18 16:33:45 (113 MB/s) - ‘airline.csv’ saved [34752262/34752262]\n",
"\n"
]
}
],
"source": [
"data_dir = \"airlines_data\"\n",
"!mkdir $data_dir\n",
"!cd $data_dir && wget https://raw.githubusercontent.com/quankiquanki/skytrax-reviews-dataset/master/data/airline.csv\n"
]
},
{
"cell_type": "code",
"execution_count": 276,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" airline_name | \n",
" link | \n",
" title | \n",
" author | \n",
" author_country | \n",
" date | \n",
" content | \n",
" aircraft | \n",
" type_traveller | \n",
" cabin_flown | \n",
" route | \n",
" overall_rating | \n",
" seat_comfort_rating | \n",
" cabin_staff_rating | \n",
" food_beverages_rating | \n",
" inflight_entertainment_rating | \n",
" ground_service_rating | \n",
" wifi_connectivity_rating | \n",
" value_money_rating | \n",
" recommended | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" adria-airways | \n",
" /airline-reviews/adria-airways | \n",
" Adria Airways customer review | \n",
" D Ito | \n",
" Germany | \n",
" 2015-04-10 | \n",
" Outbound flight FRA/PRN A319. 2 hours 10 min f... | \n",
" NaN | \n",
" NaN | \n",
" Economy | \n",
" NaN | \n",
" 7.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" NaN | \n",
" NaN | \n",
" 4.0 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" adria-airways | \n",
" /airline-reviews/adria-airways | \n",
" Adria Airways customer review | \n",
" Ron Kuhlmann | \n",
" United States | \n",
" 2015-01-05 | \n",
" Two short hops ZRH-LJU and LJU-VIE. Very fast ... | \n",
" NaN | \n",
" NaN | \n",
" Business Class | \n",
" NaN | \n",
" 10.0 | \n",
" 4.0 | \n",
" 5.0 | \n",
" 4.0 | \n",
" 1.0 | \n",
" NaN | \n",
" NaN | \n",
" 5.0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" adria-airways | \n",
" /airline-reviews/adria-airways | \n",
" Adria Airways customer review | \n",
" E Albin | \n",
" Switzerland | \n",
" 2014-09-14 | \n",
" Flew Zurich-Ljubljana on JP365 newish CRJ900. ... | \n",
" NaN | \n",
" NaN | \n",
" Economy | \n",
" NaN | \n",
" 9.0 | \n",
" 5.0 | \n",
" 5.0 | \n",
" 4.0 | \n",
" 0.0 | \n",
" NaN | \n",
" NaN | \n",
" 5.0 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" adria-airways | \n",
" /airline-reviews/adria-airways | \n",
" Adria Airways customer review | \n",
" Tercon Bojan | \n",
" Singapore | \n",
" 2014-09-06 | \n",
" Adria serves this 100 min flight from Ljubljan... | \n",
" NaN | \n",
" NaN | \n",
" Business Class | \n",
" NaN | \n",
" 8.0 | \n",
" 4.0 | \n",
" 4.0 | \n",
" 3.0 | \n",
" 1.0 | \n",
" NaN | \n",
" NaN | \n",
" 4.0 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" adria-airways | \n",
" /airline-reviews/adria-airways | \n",
" Adria Airways customer review | \n",
" L James | \n",
" Poland | \n",
" 2014-06-16 | \n",
" WAW-SKJ Economy. No free snacks or drinks on t... | \n",
" NaN | \n",
" NaN | \n",
" Economy | \n",
" NaN | \n",
" 4.0 | \n",
" 4.0 | \n",
" 2.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" NaN | \n",
" NaN | \n",
" 2.0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" airline_name link \\\n",
"0 adria-airways /airline-reviews/adria-airways \n",
"1 adria-airways /airline-reviews/adria-airways \n",
"2 adria-airways /airline-reviews/adria-airways \n",
"3 adria-airways /airline-reviews/adria-airways \n",
"4 adria-airways /airline-reviews/adria-airways \n",
"\n",
" title author author_country date \\\n",
"0 Adria Airways customer review D Ito Germany 2015-04-10 \n",
"1 Adria Airways customer review Ron Kuhlmann United States 2015-01-05 \n",
"2 Adria Airways customer review E Albin Switzerland 2014-09-14 \n",
"3 Adria Airways customer review Tercon Bojan Singapore 2014-09-06 \n",
"4 Adria Airways customer review L James Poland 2014-06-16 \n",
"\n",
" content aircraft type_traveller \\\n",
"0 Outbound flight FRA/PRN A319. 2 hours 10 min f... NaN NaN \n",
"1 Two short hops ZRH-LJU and LJU-VIE. Very fast ... NaN NaN \n",
"2 Flew Zurich-Ljubljana on JP365 newish CRJ900. ... NaN NaN \n",
"3 Adria serves this 100 min flight from Ljubljan... NaN NaN \n",
"4 WAW-SKJ Economy. No free snacks or drinks on t... NaN NaN \n",
"\n",
" cabin_flown route overall_rating seat_comfort_rating \\\n",
"0 Economy NaN 7.0 4.0 \n",
"1 Business Class NaN 10.0 4.0 \n",
"2 Economy NaN 9.0 5.0 \n",
"3 Business Class NaN 8.0 4.0 \n",
"4 Economy NaN 4.0 4.0 \n",
"\n",
" cabin_staff_rating food_beverages_rating inflight_entertainment_rating \\\n",
"0 4.0 4.0 0.0 \n",
"1 5.0 4.0 1.0 \n",
"2 5.0 4.0 0.0 \n",
"3 4.0 3.0 1.0 \n",
"4 2.0 1.0 2.0 \n",
"\n",
" ground_service_rating wifi_connectivity_rating value_money_rating \\\n",
"0 NaN NaN 4.0 \n",
"1 NaN NaN 5.0 \n",
"2 NaN NaN 5.0 \n",
"3 NaN NaN 4.0 \n",
"4 NaN NaN 2.0 \n",
"\n",
" recommended \n",
"0 1 \n",
"1 1 \n",
"2 1 \n",
"3 1 \n",
"4 0 "
]
},
"execution_count": 276,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"airline_df = pd.read_csv(data_dir + '/airline.csv')\n",
"airline_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see here the dataset has a lot of columns we can use to create the required data sets in Amazon Personalize.\n",
"\n",
"The first thing we are going to do is make 2 copies of the dataset"
]
},
{
"cell_type": "code",
"execution_count": 277,
"metadata": {},
"outputs": [],
"source": [
"a_interactions_df = airline_df.copy()\n",
"a_users_df = airline_df.copy()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building the Interactions Data set\n",
"\n",
"Let's build the interactions dataset. By following the these steps:\n",
"\n",
"- Drop the columns we are not interested in\n",
"- Create a new column to account for Event Type\n",
"- Rename the columns to a more standard naming convention for you Amazon Personalize import job\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 278,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ITEM_ID | \n",
" USER_ID | \n",
" TIMESTAMP | \n",
" CABIN_TYPE | \n",
" EVENT_VALUE | \n",
" EVENT_TYPE | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" adria-airways | \n",
" DIto | \n",
" 2015-04-10 | \n",
" Economy | \n",
" 7.0 | \n",
" RATING | \n",
"
\n",
" \n",
" 1 | \n",
" adria-airways | \n",
" RonKuhlmann | \n",
" 2015-01-05 | \n",
" Business Class | \n",
" 10.0 | \n",
" RATING | \n",
"
\n",
" \n",
" 2 | \n",
" adria-airways | \n",
" EAlbin | \n",
" 2014-09-14 | \n",
" Economy | \n",
" 9.0 | \n",
" RATING | \n",
"
\n",
" \n",
" 3 | \n",
" adria-airways | \n",
" TerconBojan | \n",
" 2014-09-06 | \n",
" Business Class | \n",
" 8.0 | \n",
" RATING | \n",
"
\n",
" \n",
" 4 | \n",
" adria-airways | \n",
" LJames | \n",
" 2014-06-16 | \n",
" Economy | \n",
" 4.0 | \n",
" RATING | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ITEM_ID USER_ID TIMESTAMP CABIN_TYPE EVENT_VALUE \\\n",
"0 adria-airways DIto 2015-04-10 Economy 7.0 \n",
"1 adria-airways RonKuhlmann 2015-01-05 Business Class 10.0 \n",
"2 adria-airways EAlbin 2014-09-14 Economy 9.0 \n",
"3 adria-airways TerconBojan 2014-09-06 Business Class 8.0 \n",
"4 adria-airways LJames 2014-06-16 Economy 4.0 \n",
"\n",
" EVENT_TYPE \n",
"0 RATING \n",
"1 RATING \n",
"2 RATING \n",
"3 RATING \n",
"4 RATING "
]
},
"execution_count": 278,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Keeping only 5 columns\n",
"a_interactions_df = a_interactions_df[['airline_name', 'author', 'date', 'cabin_flown', 'overall_rating']]\n",
"# Creating an additional column for Event Type\n",
"a_interactions_df['EVENT_TYPE']='RATING'\n",
"# Making sure the author name is unique without spaces\n",
"a_interactions_df['author'] = a_interactions_df['author'].str.replace(\" \",\"\")\n",
"# Rename the columns to a more Amazon Personalize standar notation\n",
"a_interactions_df.rename(columns = {'airline_name':'ITEM_ID', 'author':'USER_ID',\n",
" 'date':'TIMESTAMP', 'cabin_flown': 'CABIN_TYPE', 'overall_rating': 'EVENT_VALUE'}, inplace = True) \n",
"a_interactions_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Amazon Personalize supports **contextual recommendations**, through which you can improve relevance of recommendations by generating them within a context, for instance device type, location, time of day, etc. Contextual information is also useful in personalization for new/unidentified users even when the past interactions of these users are not known.\n",
"\n",
"In our case we are going to use **Cabin Type** as a context to recommend which airline is the best fit for our user. Let's explore which values we are going to be able to pass as our context when getting recommendations\n"
]
},
{
"cell_type": "code",
"execution_count": 279,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Economy', 'Business Class', nan, 'Premium Economy', 'First Class'],\n",
" dtype=object)"
]
},
"execution_count": 279,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a_interactions_df.CABIN_TYPE.unique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see our current **Timestamp** value in the dataset is a string. Amazon Personalize requires the timestamp value as Unix type. Let's take a random timestamp value and convert it to Unix type"
]
},
{
"cell_type": "code",
"execution_count": 280,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Date: 2004-03-18\n",
"Unix Time: 1079568000.0\n"
]
}
],
"source": [
"# Get a random value from the timestamp column\n",
"arb_time_stamp = a_interactions_df.iloc[50]['TIMESTAMP']\n",
"# Transform this string to date time\n",
"date_time_obj = datetime.datetime.strptime(arb_time_stamp, '%Y-%m-%d')\n",
"print('Date:', date_time_obj.date())\n",
"# Get the date of this object\n",
"d = date_time_obj.date()\n",
"# Transform the date object to Unix time\n",
"unixtime = time.mktime(d.timetuple())\n",
"print('Unix Time: ', unixtime)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are going to do the same transformation to all of our values in the timestamp column"
]
},
{
"cell_type": "code",
"execution_count": 281,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" ITEM_ID | \n",
" USER_ID | \n",
" TIMESTAMP | \n",
" CABIN_TYPE | \n",
" EVENT_VALUE | \n",
" EVENT_TYPE | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" adria-airways | \n",
" DIto | \n",
" 1.428624e+09 | \n",
" Economy | \n",
" 7.0 | \n",
" RATING | \n",
"
\n",
" \n",
" 1 | \n",
" adria-airways | \n",
" RonKuhlmann | \n",
" 1.420416e+09 | \n",
" Business Class | \n",
" 10.0 | \n",
" RATING | \n",
"
\n",
" \n",
" 2 | \n",
" adria-airways | \n",
" EAlbin | \n",
" 1.410653e+09 | \n",
" Economy | \n",
" 9.0 | \n",
" RATING | \n",
"
\n",
" \n",
" 3 | \n",
" adria-airways | \n",
" TerconBojan | \n",
" 1.409962e+09 | \n",
" Business Class | \n",
" 8.0 | \n",
" RATING | \n",
"
\n",
" \n",
" 4 | \n",
" adria-airways | \n",
" LJames | \n",
" 1.402877e+09 | \n",
" Economy | \n",
" 4.0 | \n",
" RATING | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" ITEM_ID USER_ID TIMESTAMP CABIN_TYPE EVENT_VALUE \\\n",
"0 adria-airways DIto 1.428624e+09 Economy 7.0 \n",
"1 adria-airways RonKuhlmann 1.420416e+09 Business Class 10.0 \n",
"2 adria-airways EAlbin 1.410653e+09 Economy 9.0 \n",
"3 adria-airways TerconBojan 1.409962e+09 Business Class 8.0 \n",
"4 adria-airways LJames 1.402877e+09 Economy 4.0 \n",
"\n",
" EVENT_TYPE \n",
"0 RATING \n",
"1 RATING \n",
"2 RATING \n",
"3 RATING \n",
"4 RATING "
]
},
"execution_count": 281,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Define a function\n",
"def convert_to_unix(string_date):\n",
" date_time_obj = datetime.datetime.strptime(string_date, '%Y-%m-%d')\n",
" d = date_time_obj.date()\n",
" return time.mktime(d.timetuple())\n",
"\n",
"# Apply this function across the Timestamp column\n",
"a_interactions_df['TIMESTAMP'] = a_interactions_df['TIMESTAMP'].apply(convert_to_unix)\n",
"a_interactions_df.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a look at some of our dataset properties"
]
},
{
"cell_type": "code",
"execution_count": 282,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" TIMESTAMP | \n",
" EVENT_VALUE | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 4.139600e+04 | \n",
" 36861.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 1.373950e+09 | \n",
" 6.039527 | \n",
"
\n",
" \n",
" std | \n",
" 5.771909e+07 | \n",
" 3.214680 | \n",
"
\n",
" \n",
" min | \n",
" 0.000000e+00 | \n",
" 1.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" 1.350864e+09 | \n",
" 3.000000 | \n",
"
\n",
" \n",
" 50% | \n",
" 1.389658e+09 | \n",
" 7.000000 | \n",
"
\n",
" \n",
" 75% | \n",
" 1.412122e+09 | \n",
" 9.000000 | \n",
"
\n",
" \n",
" max | \n",
" 1.438474e+09 | \n",
" 10.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" TIMESTAMP EVENT_VALUE\n",
"count 4.139600e+04 36861.000000\n",
"mean 1.373950e+09 6.039527\n",
"std 5.771909e+07 3.214680\n",
"min 0.000000e+00 1.000000\n",
"25% 1.350864e+09 3.000000\n",
"50% 1.389658e+09 7.000000\n",
"75% 1.412122e+09 9.000000\n",
"max 1.438474e+09 10.000000"
]
},
"execution_count": 282,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a_interactions_df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Are there any Null values??"
]
},
{
"cell_type": "code",
"execution_count": 283,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 283,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a_interactions_df.isnull().values.any()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's drop those Null values and make sure there aren't any"
]
},
{
"cell_type": "code",
"execution_count": 284,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 284,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a_interactions_df = a_interactions_df.dropna()\n",
"a_interactions_df.isnull().values.any()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have our data ready for Amazon Personalize, let's save it into a file locally"
]
},
{
"cell_type": "code",
"execution_count": 285,
"metadata": {},
"outputs": [],
"source": [
"interactions_filename = \"a_interactions.csv\"\n",
"a_interactions_df.to_csv((data_dir + \"/\"+interactions_filename), index=False, float_format='%.0f')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building the Users Data set\n",
"\n",
"Let's build the users dataset. By following the these steps:\n",
"\n",
"- Drop the columns we are not interested in\n",
"- Create a new column to account for Nationality as user metadata\n",
"- Rename the columns to a more standard naming convention for you Amazon Personalize import job\n"
]
},
{
"cell_type": "code",
"execution_count": 286,
"metadata": {},
"outputs": [],
"source": [
"# Copy the complete airlines data set\n",
"a_users_df = airline_df.copy()\n",
"# Select only interested columns\n",
"a_users_df = a_users_df[['author', 'author_country']]\n",
"# Clean up the authors string\n",
"a_users_df['author'] = a_users_df['author'].str.replace(\" \",\"\")\n",
"# Rename the columns\n",
"a_users_df.rename(columns = { 'author':'USER_ID', 'author_country':'NATIONALITY'}, inplace = True) \n",
"# Drop any null values\n",
"a_users_df = a_users_df.dropna()\n",
"# Save your file locally\n",
"users_filename = \"a_users.csv\"\n",
"a_users_df.to_csv((data_dir +\"/\"+users_filename), index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure an S3 bucket and an IAM role \n",
"[Back to top](#top)\n",
"\n",
"So far, we have downloaded, manipulated, and saved the data onto the Amazon EBS instance attached to instance running this Jupyter notebook. However, Amazon Personalize will need an S3 bucket to act as the source of your data, as well as IAM roles for accessing that bucket. Let's set all of that up.\n",
"\n",
"Use the metadata stored on the instance underlying this Amazon SageMaker notebook, to determine the region it is operating in. If you are using a Jupyter notebook outside of Amazon SageMaker, simply define the region as a string below. The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources we have been creating so far."
]
},
{
"cell_type": "code",
"execution_count": 288,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"us-east-1\n"
]
}
],
"source": [
"with open('/opt/ml/metadata/resource-metadata.json') as notebook_info:\n",
" data = json.load(notebook_info)\n",
" resource_arn = data['ResourceArn']\n",
" region = resource_arn.split(':')[3]\n",
"print(region)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Amazon S3 bucket names are globally unique. To create a unique bucket name, the code below will append the string `personalizepoc` to your AWS account number. Then it creates a bucket with this name in the region discovered in the previous cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s3 = boto3.client('s3')\n",
"account_id = boto3.client('sts').get_caller_identity().get('Account')\n",
"suffix = str(np.random.uniform())[4:9]\n",
"bucket_name = \"personalize-user-personalization-example\" + suffix\n",
"print(bucket_name)\n",
"if region != \"us-east-1\":\n",
" s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={'LocationConstraint': region})\n",
"else:\n",
" s3.create_bucket(Bucket=bucket_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Upload data to S3\n",
"\n",
"Now that your Amazon S3 bucket has been created, upload the CSV file of our user-item-interaction data. "
]
},
{
"cell_type": "code",
"execution_count": 294,
"metadata": {},
"outputs": [],
"source": [
"interactions_filename = data_dir + '/a_interactions.csv'\n",
"boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_filename).upload_file(interactions_filename)"
]
},
{
"cell_type": "code",
"execution_count": 295,
"metadata": {},
"outputs": [],
"source": [
"user_metadata_file = data_dir + '/a_users.csv'\n",
"boto3.Session().resource('s3').Bucket(bucket_name).Object(user_metadata_file).upload_file(user_metadata_file)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create dataset groups and the interactions dataset \n",
"[Back to top](#top)\n",
"\n",
"The highest level of isolation and abstraction with Amazon Personalize is a *dataset group*. Information stored within one of these dataset groups has no impact on any other dataset group or models created from one - they are completely isolated. This allows you to run many experiments and is part of how we keep your models private and fully trained only on your data. \n",
"\n",
"Before importing the data prepared earlier, there needs to be a dataset group and a dataset added to it that handles the interactions.\n",
"\n",
"Dataset groups can house the following types of information:\n",
"\n",
"* User-item-interactions\n",
"* Event streams (real-time interactions)\n",
"* User metadata\n",
"* Item metadata\n",
"\n",
"Before we create the dataset group and the dataset for our interaction data, let's validate that your environment can communicate successfully with Amazon Personalize."
]
},
{
"cell_type": "code",
"execution_count": 293,
"metadata": {},
"outputs": [],
"source": [
"personalize = boto3.client(service_name='personalize')\n",
"personalize_runtime = boto3.client(service_name='personalize-runtime')\n",
"personalize_events = boto3.client(service_name='personalize-events')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a Dataset Group\n",
"\n",
"The following cell will create a new dataset group with the name *airlines-dataset-group* + a suffix"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"datasetGroupArn\": \"arn:aws:personalize:us-east-1:144386903708:dataset-group/airlines-dataset-group-55035\",\n",
" \"ResponseMetadata\": {\n",
" \"RequestId\": \"a8bb75fb-f15b-45da-997e-08eb14d7733a\",\n",
" \"HTTPStatusCode\": 200,\n",
" \"HTTPHeaders\": {\n",
" \"content-type\": \"application/x-amz-json-1.1\",\n",
" \"date\": \"Mon, 15 Jun 2020 21:14:12 GMT\",\n",
" \"x-amzn-requestid\": \"a8bb75fb-f15b-45da-997e-08eb14d7733a\",\n",
" \"content-length\": \"107\",\n",
" \"connection\": \"keep-alive\"\n",
" },\n",
" \"RetryAttempts\": 0\n",
" }\n",
"}\n"
]
}
],
"source": [
"dataset_group_name = \"airlines-dataset-group-\" + suffix\n",
"\n",
"create_dataset_group_response = personalize.create_dataset_group(\n",
" name = dataset_group_name\n",
")\n",
"\n",
"dataset_group_arn = create_dataset_group_response['datasetGroupArn']\n",
"print(json.dumps(create_dataset_group_response, indent=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we can use the dataset group, it must be active. This can take a minute or two. Execute the cell below and wait for it to show the ACTIVE status. It checks the status of the dataset group every second, up to a maximum of 3 hours."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"DatasetGroup: CREATE PENDING\n",
"DatasetGroup: ACTIVE\n"
]
}
],
"source": [
"status = None\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
" describe_dataset_group_response = personalize.describe_dataset_group(\n",
" datasetGroupArn = dataset_group_arn\n",
" )\n",
" status = describe_dataset_group_response[\"datasetGroup\"][\"status\"]\n",
" print(\"DatasetGroup: {}\".format(status))\n",
" \n",
" if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
" break\n",
" \n",
" time.sleep(20)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that you have a dataset group, you can create a dataset for the interaction data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create datasets\n",
"\n",
"### Interactions Dataset\n",
"\n",
"First, define a schema to tell Amazon Personalize what type of dataset you are uploading. There are several reserved and mandatory keywords required in the schema, based on the type of dataset. More detailed information can be found in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html).\n",
"\n",
"Here, you will create a schema for interactions data, which needs the `USER_ID`, `ITEM_ID`, `TIMESTAMP`, `CABIN_TYPE`, `EVENT_TYPE`, `EVENT_VALUE`, and `TIMESTAMP` fields. These must be defined in the same order in the schema as they appear in the dataset."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"schema_name=\"airlines-interaction-schema-\"+suffix"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"schemaArn\": \"arn:aws:personalize:us-east-1:144386903708:schema/airlines-interaction-schema-55035\",\n",
" \"ResponseMetadata\": {\n",
" \"RequestId\": \"4e045a61-d479-485c-93ff-6072076ccaa9\",\n",
" \"HTTPStatusCode\": 200,\n",
" \"HTTPHeaders\": {\n",
" \"content-type\": \"application/x-amz-json-1.1\",\n",
" \"date\": \"Mon, 15 Jun 2020 21:10:34 GMT\",\n",
" \"x-amzn-requestid\": \"4e045a61-d479-485c-93ff-6072076ccaa9\",\n",
" \"content-length\": \"99\",\n",
" \"connection\": \"keep-alive\"\n",
" },\n",
" \"RetryAttempts\": 0\n",
" }\n",
"}\n"
]
}
],
"source": [
"schema = {\n",
" \"type\": \"record\",\n",
" \"name\": \"Interactions\",\n",
" \"namespace\": \"com.amazonaws.personalize.schema\",\n",
" \"fields\": [\n",
" {\n",
" \"name\": \"ITEM_ID\",\n",
" \"type\": \"string\"\n",
" },\n",
" {\n",
" \"name\": \"USER_ID\",\n",
" \"type\": \"string\"\n",
" },\n",
" {\n",
" \"name\": \"TIMESTAMP\",\n",
" \"type\": \"long\"\n",
" },\n",
" {\n",
" \"name\":\"CABIN_TYPE\",\n",
" \"type\": \"string\",\n",
" \"categorical\": True\n",
" },\n",
" {\n",
" \"name\": \"EVENT_TYPE\",\n",
" \"type\": \"string\"\n",
" },\n",
" {\n",
" \"name\": \"EVENT_VALUE\",\n",
" \"type\": \"float\"\n",
" }\n",
" ],\n",
" \"version\": \"1.0\"\n",
"}\n",
"\n",
"create_schema_response = personalize.create_schema(\n",
" name = schema_name,\n",
" schema = json.dumps(schema)\n",
")\n",
"\n",
"schema_arn = create_schema_response['schemaArn']\n",
"print(json.dumps(create_schema_response, indent=2))"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"datasetArn\": \"arn:aws:personalize:us-east-1:144386903708:dataset/airlines-dataset-group-55035/INTERACTIONS\",\n",
" \"ResponseMetadata\": {\n",
" \"RequestId\": \"968e6cac-310a-4889-8243-e86ef90696ed\",\n",
" \"HTTPStatusCode\": 200,\n",
" \"HTTPHeaders\": {\n",
" \"content-type\": \"application/x-amz-json-1.1\",\n",
" \"date\": \"Mon, 15 Jun 2020 21:15:14 GMT\",\n",
" \"x-amzn-requestid\": \"968e6cac-310a-4889-8243-e86ef90696ed\",\n",
" \"content-length\": \"109\",\n",
" \"connection\": \"keep-alive\"\n",
" },\n",
" \"RetryAttempts\": 0\n",
" }\n",
"}\n"
]
}
],
"source": [
"dataset_type = \"INTERACTIONS\"\n",
"create_dataset_response = personalize.create_dataset(\n",
" datasetType = dataset_type,\n",
" datasetGroupArn = dataset_group_arn,\n",
" schemaArn = schema_arn,\n",
" name = \"airlines-dataset-interactions-\" + suffix\n",
")\n",
"\n",
"interactions_dataset_arn = create_dataset_response['datasetArn']\n",
"print(json.dumps(create_dataset_response, indent=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Users Dataset\n",
"\n",
"Here, you will create a schema for the users data, which needs the `USER_ID`, and `NATIONALITY` fields. These must be defined in the same order in the schema as they appear in the dataset.\n"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"metadata_schema_name=\"airlines-users-schema-\"+suffix"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"schemaArn\": \"arn:aws:personalize:us-east-1:144386903708:schema/airlines-users-schema-55035\",\n",
" \"ResponseMetadata\": {\n",
" \"RequestId\": \"17844e2f-860d-484a-bb39-ab4a10e7b9fd\",\n",
" \"HTTPStatusCode\": 200,\n",
" \"HTTPHeaders\": {\n",
" \"content-type\": \"application/x-amz-json-1.1\",\n",
" \"date\": \"Mon, 15 Jun 2020 21:13:50 GMT\",\n",
" \"x-amzn-requestid\": \"17844e2f-860d-484a-bb39-ab4a10e7b9fd\",\n",
" \"content-length\": \"93\",\n",
" \"connection\": \"keep-alive\"\n",
" },\n",
" \"RetryAttempts\": 0\n",
" }\n",
"}\n"
]
}
],
"source": [
"metadata_schema = {\n",
" \"type\": \"record\",\n",
" \"name\": \"Users\",\n",
" \"namespace\": \"com.amazonaws.personalize.schema\",\n",
" \"fields\": [\n",
" {\n",
" \"name\": \"USER_ID\",\n",
" \"type\": \"string\"\n",
" },\n",
" {\n",
" \"name\": \"NATIONALITY\",\n",
" \"type\": \"string\",\n",
" \"categorical\": True\n",
" }\n",
" ],\n",
" \"version\": \"1.0\"\n",
"}\n",
"\n",
"create_metadata_schema_response = personalize.create_schema(\n",
" name = metadata_schema_name,\n",
" schema = json.dumps(metadata_schema)\n",
")\n",
"\n",
"metadata_schema_arn = create_metadata_schema_response['schemaArn']\n",
"print(json.dumps(create_metadata_schema_response, indent=2))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset_type = \"USERS\"\n",
"create_metadata_dataset_response = personalize.create_dataset(\n",
" datasetType = dataset_type,\n",
" datasetGroupArn = dataset_group_arn,\n",
" schemaArn = metadata_schema_arn,\n",
" name = \"airlines-metadata-dataset-users-\" + suffix\n",
")\n",
"\n",
"metadata_dataset_arn = create_metadata_dataset_response['datasetArn']\n",
"print(json.dumps(create_metadata_dataset_response, indent=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure an S3 bucket and an IAM role \n",
"\n",
"So far, we have downloaded, manipulated, and saved the data onto the Amazon EBS instance attached to instance running this Jupyter notebook. However, Amazon Personalize will need an S3 bucket to act as the source of your data, as well as IAM roles for accessing that bucket. Let's set all of that up.\n",
"\n",
"Use the metadata stored on the instance underlying this Amazon SageMaker notebook, to determine the region it is operating in. If you are using a Jupyter notebook outside of Amazon SageMaker, simply define the region as a string below. The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources we have been creating so far."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set the S3 bucket policy\n",
"Amazon Personalize needs to be able to read the contents of your S3 bucket. So add a bucket policy which allows that."
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"s3 = boto3.client(\"s3\")\n",
"\n",
"policy = {\n",
" \"Version\": \"2012-10-17\",\n",
" \"Id\": \"PersonalizeS3BucketAccessPolicy\",\n",
" \"Statement\": [\n",
" {\n",
" \"Sid\": \"PersonalizeS3BucketAccessPolicy\",\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\n",
" \"Service\": \"personalize.amazonaws.com\"\n",
" },\n",
" \"Action\": [\n",
" \"s3:GetObject\",\n",
" \"s3:ListBucket\"\n",
" ],\n",
" \"Resource\": [\n",
" \"arn:aws:s3:::{}\".format(bucket_name),\n",
" \"arn:aws:s3:::{}/*\".format(bucket_name)\n",
" ]\n",
" }\n",
" ]\n",
"}\n",
"\n",
"s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an IAM role\n",
"\n",
"Amazon Personalize needs the ability to assume roles in AWS in order to have the permissions to execute certain tasks. Let's create an IAM role and attach the required policies to it. The code below attaches very permissive policies; please use more restrictive policies for any production application."
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arn:aws:iam::144386903708:role/PersonalizeS3Role-55035\n"
]
}
],
"source": [
"iam = boto3.client(\"iam\")\n",
"\n",
"role_name = \"PersonalizeS3Role-\"+suffix\n",
"assume_role_policy_document = {\n",
" \"Version\": \"2012-10-17\",\n",
" \"Statement\": [\n",
" {\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\n",
" \"Service\": \"personalize.amazonaws.com\"\n",
" },\n",
" \"Action\": \"sts:AssumeRole\"\n",
" }\n",
" ]\n",
"}\n",
"try:\n",
" create_role_response = iam.create_role(\n",
" RoleName = role_name,\n",
" AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)\n",
" );\n",
"\n",
" iam.attach_role_policy(\n",
" RoleName = role_name,\n",
" PolicyArn = \"arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess\"\n",
" );\n",
"\n",
" role_arn = create_role_response[\"Role\"][\"Arn\"]\n",
"except ClientError as e:\n",
" if e.response['Error']['Code'] == 'EntityAlreadyExists':\n",
" role_arn = iam.get_role(RoleName=role_name)['Role']['Arn']\n",
" else:\n",
" raise\n",
" \n",
"# sometimes need to wait a bit for the role to be created\n",
"time.sleep(45)\n",
"print(role_arn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create your Dataset import jobs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import the interactions data \n",
"\n",
"Earlier you created the dataset group and dataset to house your information, so now you will execute an import job that will load the data from the S3 bucket into the Amazon Personalize dataset. "
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"datasetImportJobArn\": \"arn:aws:personalize:us-east-1:144386903708:dataset-import-job/airlines-dataset-import-job-14078\",\n",
" \"ResponseMetadata\": {\n",
" \"RequestId\": \"d118cf11-b568-4767-99d5-30a15871981a\",\n",
" \"HTTPStatusCode\": 200,\n",
" \"HTTPHeaders\": {\n",
" \"content-type\": \"application/x-amz-json-1.1\",\n",
" \"date\": \"Mon, 15 Jun 2020 21:46:38 GMT\",\n",
" \"x-amzn-requestid\": \"d118cf11-b568-4767-99d5-30a15871981a\",\n",
" \"content-length\": \"121\",\n",
" \"connection\": \"keep-alive\"\n",
" },\n",
" \"RetryAttempts\": 0\n",
" }\n",
"}\n"
]
}
],
"source": [
"create_dataset_import_job_response = personalize.create_dataset_import_job(\n",
" jobName = \"airlines-dataset-import-job-\"+suffix,\n",
" datasetArn = interactions_dataset_arn,\n",
" dataSource = {\n",
" \"dataLocation\": \"s3://{}/{}\".format(bucket_name, interactions_filename)\n",
" },\n",
" roleArn = role_arn\n",
")\n",
"\n",
"dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']\n",
"print(json.dumps(create_dataset_import_job_response, indent=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import the Users data \n",
"\n",
"Earlier you created the dataset group and dataset to house your information, so now you will execute an import job that will load the data from the S3 bucket into the Amazon Personalize dataset. "
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"datasetImportJobArn\": \"arn:aws:personalize:us-east-1:144386903708:dataset-import-job/airlines-users-metadata-dataset-import-job-14078\",\n",
" \"ResponseMetadata\": {\n",
" \"RequestId\": \"679d9401-568d-45e0-ba8c-df8f1574228d\",\n",
" \"HTTPStatusCode\": 200,\n",
" \"HTTPHeaders\": {\n",
" \"content-type\": \"application/x-amz-json-1.1\",\n",
" \"date\": \"Mon, 15 Jun 2020 21:46:40 GMT\",\n",
" \"x-amzn-requestid\": \"679d9401-568d-45e0-ba8c-df8f1574228d\",\n",
" \"content-length\": \"136\",\n",
" \"connection\": \"keep-alive\"\n",
" },\n",
" \"RetryAttempts\": 0\n",
" }\n",
"}\n"
]
}
],
"source": [
"create_metadata_dataset_import_job_response = personalize.create_dataset_import_job(\n",
" jobName = \"airlines-users-metadata-dataset-import-job-\"+suffix,\n",
" datasetArn = metadata_dataset_arn,\n",
" dataSource = {\n",
" \"dataLocation\": \"s3://{}/{}\".format(bucket_name, user_metadata_file)\n",
" },\n",
" roleArn = role_arn\n",
")\n",
"\n",
"metadata_dataset_import_job_arn = create_metadata_dataset_import_job_response['datasetImportJobArn']\n",
"print(json.dumps(create_metadata_dataset_import_job_response, indent=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Wait for the Dataset Import Jobs to have ACTIVE Status"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we can use the dataset, the import job must be active. Execute the cell below and wait for it to show the ACTIVE status. It checks the status of the import job every second, up to a maximum of 3 hours.\n",
"\n",
"Importing the data can take some time, depending on the size of the dataset. In this demo, the data import job has been pre executed."
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"DatasetImportJob: CREATE PENDING\n",
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: ACTIVE\n"
]
}
],
"source": [
"status = None\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
" describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
" datasetImportJobArn = dataset_import_job_arn\n",
" )\n",
" \n",
" dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n",
" if \"latestDatasetImportJobRun\" not in dataset_import_job:\n",
" status = dataset_import_job[\"status\"]\n",
" print(\"DatasetImportJob: {}\".format(status))\n",
" else:\n",
" status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n",
" print(\"LatestDatasetImportJobRun: {}\".format(status))\n",
" \n",
" if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
" break\n",
" \n",
" time.sleep(60)"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: CREATE IN_PROGRESS\n",
"DatasetImportJob: ACTIVE\n"
]
}
],
"source": [
"status = None\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
" describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
" datasetImportJobArn = metadata_dataset_import_job_arn\n",
" )\n",
" \n",
" dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n",
" if \"latestDatasetImportJobRun\" not in dataset_import_job:\n",
" status = dataset_import_job[\"status\"]\n",
" print(\"DatasetImportJob: {}\".format(status))\n",
" else:\n",
" status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n",
" print(\"LatestDatasetImportJobRun: {}\".format(status))\n",
" \n",
" if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
" break\n",
" \n",
" time.sleep(60)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When the dataset import is active, you are ready to start building models with the AWS User Personalization recipe."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create solutions \n",
"[Back to top](#top)\n",
"\n",
"In this notebook, we will create a solution with the following recipe:\n",
"\n",
"1. aws-user-personalization\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Amazon Personalize, a specific variation of an algorithm is called a recipe. Different recipes are suitable for different situations. A trained model is called a solution, and each solution can have many versions that relate to a given volume of data when the model was trained.\n",
"\n",
"To start, we will list all the recipes that are supported. This will allow you to select one and use that to build your model."
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arn:aws:personalize:::recipe/aws-hrnn\n",
"arn:aws:personalize:::recipe/aws-hrnn-coldstart\n",
"arn:aws:personalize:::recipe/aws-hrnn-metadata\n",
"arn:aws:personalize:::recipe/aws-personalized-ranking\n",
"arn:aws:personalize:::recipe/aws-popularity-count\n",
"arn:aws:personalize:::recipe/aws-sims\n",
"arn:aws:personalize:::recipe/aws-user-personalization\n"
]
}
],
"source": [
"recipe_list = personalize.list_recipes()\n",
"for recipe in recipe_list['recipes']:\n",
" print(recipe['recipeArn'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The output is just a JSON representation of all of the algorithms mentioned in the introduction.\n",
"\n",
"Next we will select specific recipes and build models with them.\n",
"\n",
"### AWS User Personalization\n",
"\n",
"AWS User Personalization is one of the more advanced recommendation models that you can use and it allows for real-time updates of recommendations based on user behavior. It also tends to outperform other approaches, like collaborative filtering. This recipe takes the longest to train, so let's start with this recipe first.\n",
"\n",
"For our use case, using the Airlines reviews data, we can use the AWS User Personalization to recommend airlines to a user based on the user's previous artist tagging behavior. Remember, we used the tagging data to represent positive interactions between a user and an artist.\n",
"\n",
"First, select the recipe by finding the ARN in the list of recipes above."
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {},
"outputs": [],
"source": [
"recipe_arn = \"arn:aws:personalize:::recipe/aws-user-personalization\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create the solution\n",
"\n",
"First you create a solution using the recipe. Although you provide the dataset ARN in this step, the model is not yet trained. See this as an identifier instead of a trained model.\n",
"\n",
"Note that we have HPO activated here. This is a good idea when \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"create_solution_response = personalize.create_solution(\n",
" name = \"airlines-user-personalization-solution-HPO-\"+suffix,\n",
" datasetGroupArn = dataset_group_arn,\n",
" recipeArn = recipe_arn,\n",
" performHPO=True\n",
")\n",
"\n",
"solution_arn = create_solution_response['solutionArn']\n",
"print(json.dumps(create_solution_response, indent=2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create the solution version\n",
"\n",
"Once you have a solution, you need to create a version in order to complete the model training. The training can take a while to complete, upwards of 25 minutes, and an average of 40 minutes for this recipe with our dataset. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create many models and deploy them quickly. So we will set up the while loop for all of the solutions further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console."
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"solutionVersionArn\": \"arn:aws:personalize:us-east-1:144386903708:solution/airlines-hrnn-metadata-solution-HPO-14078/54a6c563\",\n",
" \"ResponseMetadata\": {\n",
" \"RequestId\": \"148a0fbd-5465-4619-ac90-449c0ef23b73\",\n",
" \"HTTPStatusCode\": 200,\n",
" \"HTTPHeaders\": {\n",
" \"content-type\": \"application/x-amz-json-1.1\",\n",
" \"date\": \"Mon, 15 Jun 2020 22:01:32 GMT\",\n",
" \"x-amzn-requestid\": \"148a0fbd-5465-4619-ac90-449c0ef23b73\",\n",
" \"content-length\": \"127\",\n",
" \"connection\": \"keep-alive\"\n",
" },\n",
" \"RetryAttempts\": 0\n",
" }\n",
"}\n"
]
}
],
"source": [
"create_solution_version_response = personalize.create_solution_version(\n",
" solutionArn = solution_arn\n",
")\n",
"\n",
"solution_version_arn = create_solution_version_response['solutionVersionArn']\n",
"print(json.dumps(create_solution_version_response, indent=2))"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"SolutionVersion: CREATE PENDING\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: CREATE IN_PROGRESS\n",
"SolutionVersion: ACTIVE\n"
]
}
],
"source": [
"status = None\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
" describe_solution_version_response = personalize.describe_solution_version(\n",
" solutionVersionArn = solution_version_arn\n",
" )\n",
" status = describe_solution_version_response[\"solutionVersion\"][\"status\"]\n",
" print(\"SolutionVersion: {}\".format(status))\n",
" \n",
" if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
" break\n",
" \n",
" time.sleep(60)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluate solution versions \n",
"[Back to top](#top)\n",
"\n",
"It should not take more than an hour to train all the solutions from this notebook. While training is in progress, we recommend taking the time to read up on the various algorithms (recipes) and their behavior in detail. This is also a good time to consider alternatives to how the data was fed into the system and what kind of results you expect to see.\n",
"\n",
"When the solutions finish creating, the next step is to obtain the evaluation metrics. Personalize calculates these metrics based on a subset of the training data. The image below illustrates how Personalize splits the data. Given 10 users, with 10 interactions each (a circle represents an interaction), the interactions are ordered from oldest to newest based on the timestamp. Personalize uses all of the interaction data from 90% of the users (blue circles) to train the solution version, and the remaining 10% for evaluation. For each of the users in the remaining 10%, 90% of their interaction data (green circles) is used as input for the call to the trained model. The remaining 10% of their data (orange circle) is compared to the output produced by the model and used to calculate the evaluation metrics.\n",
"\n",
"\n",
"\n",
"We recommend reading [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html) to understand the metrics, but we have also copied parts of the documentation below for convenience.\n",
"\n",
"You need to understand the following terms regarding evaluation in Personalize:\n",
"\n",
"* *Relevant recommendation* refers to a recommendation that matches a value in the testing data for the particular user.\n",
"* *Rank* refers to the position of a recommended item in the list of recommendations. Position 1 (the top of the list) is presumed to be the most relevant to the user.\n",
"* *Query* refers to the internal equivalent of a GetRecommendations call.\n",
"\n",
"The metrics produced by Personalize are:\n",
"- **coverage**: The proportion of unique recommended items from all queries out of the total number of unique items in the training data (includes both the Items and Interactions datasets).\n",
"- **mean_reciprocal_rank_at_25**: The [mean of the reciprocal ranks](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation.\n",
"- **normalized_discounted_cumulative_gain_at_K**: Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the [cumulative discounted gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) (DCG) at K, each relevant discounted recommendation in the top K recommendations is summed together. The normalized discounted cumulative gain (NDCG) is the DCG divided by the ideal DCG such that NDCG is between 0 - 1. (The ideal DCG is where the top K recommendations are sorted by relevance.) Amazon Personalize uses a weighting factor of 1/log(1 + position), where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.\n",
"- **precision_at_K**: The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items.\n",
"\n",
"Let's take a look at the evaluation metrics for each of the solutions produced in this notebook. *Please note, your results might differ from the results described in the text of this notebook, due to the quality of the LastFM dataset.* \n",
"\n",
"### AWS User Personalizatioin metrics\n",
"\n",
"First, retrieve the evaluation metrics for the AWS User Personalization solution version."
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"solutionVersionArn\": \"arn:aws:personalize:us-east-1:144386903708:solution/airlines-hrnn-metadata-solution-HPO-14078/54a6c563\",\n",
" \"metrics\": {\n",
" \"coverage\": 0.4046,\n",
" \"mean_reciprocal_rank_at_25\": 0.2035,\n",
" \"normalized_discounted_cumulative_gain_at_10\": 0.2909,\n",
" \"normalized_discounted_cumulative_gain_at_25\": 0.3174,\n",
" \"normalized_discounted_cumulative_gain_at_5\": 0.2418,\n",
" \"precision_at_10\": 0.0444,\n",
" \"precision_at_25\": 0.022,\n",
" \"precision_at_5\": 0.0605\n",
" },\n",
" \"ResponseMetadata\": {\n",
" \"RequestId\": \"156a6e70-152b-4940-8b17-048252419fd0\",\n",
" \"HTTPStatusCode\": 200,\n",
" \"HTTPHeaders\": {\n",
" \"content-type\": \"application/x-amz-json-1.1\",\n",
" \"date\": \"Mon, 15 Jun 2020 23:02:24 GMT\",\n",
" \"x-amzn-requestid\": \"156a6e70-152b-4940-8b17-048252419fd0\",\n",
" \"content-length\": \"424\",\n",
" \"connection\": \"keep-alive\"\n",
" },\n",
" \"RetryAttempts\": 0\n",
" }\n",
"}\n"
]
}
],
"source": [
"get_solution_metrics_response = personalize.get_solution_metrics(\n",
" solutionVersionArn = solution_version_arn\n",
")\n",
"\n",
"print(json.dumps(get_solution_metrics_response, indent=2))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create a campaign from the solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create campaigns \n",
"\n",
"A campaign is a hosted solution version; an endpoint which you can query for recommendations. Pricing is set by estimating throughput capacity (requests from users for personalization per second). When deploying a campaign, you set a minimum throughput per second (TPS) value. This service, like many within AWS, will automatically scale based on demand, but if latency is critical, you may want to provision ahead for larger demand. For this POC and demo, all minimum throughput thresholds are set to 1. For more information, see the [pricing page](https://aws.amazon.com/personalize/pricing/).\n",
"\n",
"Let's start deploying the campaigns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### AWS User Personalization\n",
"\n",
"Deploy a campaign for your AWS User Personalization solution version. It can take around 10 minutes to deploy a campaign. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create multiple campaigns. So we will set up the while loop for all of the campaigns further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console."
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"campaignArn\": \"arn:aws:personalize:us-east-1:144386903708:campaign/airlines-metadata-campaign-14078\",\n",
" \"ResponseMetadata\": {\n",
" \"RequestId\": \"4c882630-86aa-4c5d-accd-75225f9804a4\",\n",
" \"HTTPStatusCode\": 200,\n",
" \"HTTPHeaders\": {\n",
" \"content-type\": \"application/x-amz-json-1.1\",\n",
" \"date\": \"Mon, 15 Jun 2020 23:25:37 GMT\",\n",
" \"x-amzn-requestid\": \"4c882630-86aa-4c5d-accd-75225f9804a4\",\n",
" \"content-length\": \"102\",\n",
" \"connection\": \"keep-alive\"\n",
" },\n",
" \"RetryAttempts\": 0\n",
" }\n",
"}\n"
]
}
],
"source": [
"create_campaign_response = personalize.create_campaign(\n",
" name = \"airlines-metadata-campaign-\"+suffix,\n",
" solutionVersionArn = solution_version_arn,\n",
" minProvisionedTPS = 2, \n",
")\n",
"\n",
"campaign_arn = create_campaign_response['campaignArn']\n",
"print(json.dumps(create_campaign_response, indent=2))"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Campaign: CREATE PENDING\n",
"Campaign: CREATE IN_PROGRESS\n",
"Campaign: CREATE IN_PROGRESS\n",
"Campaign: CREATE IN_PROGRESS\n",
"Campaign: CREATE IN_PROGRESS\n",
"Campaign: CREATE IN_PROGRESS\n",
"Campaign: CREATE IN_PROGRESS\n",
"Campaign: CREATE IN_PROGRESS\n",
"Campaign: ACTIVE\n"
]
}
],
"source": [
"status = None\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
" describe_campaign_response = personalize.describe_campaign(\n",
" campaignArn = campaign_arn\n",
" )\n",
" status = describe_campaign_response[\"campaign\"][\"status\"]\n",
" print(\"Campaign: {}\".format(status))\n",
" \n",
" if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
" break\n",
" \n",
" time.sleep(60)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### AWS User Personalization\n",
"\n",
"AWS User Personalization is one of the more advanced algorithms provided by Amazon Personalize. It supports personalization of the items for a specific user based on their past behavior and can intake real time events in order to alter recommendations for a user without retraining. \n",
"\n",
"Since the AWS User Personalization algorithm relies on having a sampling of users, let's load the data we need for that and select 3 random users."
]
},
{
"cell_type": "code",
"execution_count": 262,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" USER_ID | \n",
" NATIONALITY | \n",
"
\n",
" \n",
" \n",
" \n",
" 3918 | \n",
" JHartley | \n",
" United Kingdom | \n",
"
\n",
" \n",
" 27477 | \n",
" DDriscoll | \n",
" United Kingdom | \n",
"
\n",
" \n",
" 19563 | \n",
" BrianElliott | \n",
" United Kingdom | \n",
"
\n",
" \n",
" 22989 | \n",
" AHornbuckle | \n",
" Australia | \n",
"
\n",
" \n",
" 35724 | \n",
" CMoon | \n",
" United Kingdom | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" USER_ID NATIONALITY\n",
"3918 JHartley United Kingdom\n",
"27477 DDriscoll United Kingdom\n",
"19563 BrianElliott United Kingdom\n",
"22989 AHornbuckle Australia\n",
"35724 CMoon United Kingdom"
]
},
"execution_count": 262,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"users_df = pd.read_csv(data_dir + '/a_users.csv')\n",
"# Render some sample data\n",
"users_df.sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we render the recommendations for our 3 random users from above. After that, we will explore real-time interactions before moving on to Personalized Ranking.\n",
"\n",
"Again, we create a helper function to render the results in a nice dataframe.\n",
"\n",
"#### API call results"
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {},
"outputs": [],
"source": [
"# Update DF rendering\n",
"pd.set_option('display.max_rows', 30)\n",
"\n",
"def get_new_recommendations_df_users(recommendations_df, user_id):\n",
" \n",
"# Context Recommendations\n",
" context_options = ['None','Economy', 'Business Class','Premium Economy', 'First Class']\n",
" \n",
" for context in context_options:\n",
" # Get the recommendations\n",
" if context=='none':\n",
" get_recommendations_response = personalize_runtime.get_recommendations(\n",
" campaignArn = campaign_arn,\n",
" userId = str(user_id),\n",
" )\n",
" else:\n",
" get_recommendations_response = personalize_runtime.get_recommendations(\n",
" campaignArn = campaign_arn,\n",
" userId = str(user_id),\n",
" context = {\n",
" 'CABIN_TYPE': context\n",
" }\n",
" )\n",
" # Build a new dataframe of recommendations\n",
" item_list = get_recommendations_response['itemList']\n",
" recommendation_list = []\n",
" for item in item_list:\n",
" recommendation_list.append(item['itemId'])\n",
" # print(recommendation_list)\n",
" new_rec_DF = pd.DataFrame(recommendation_list, columns = [context])\n",
" # Add this dataframe to the old one\n",
" recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
" return recommendations_df"
]
},
{
"cell_type": "code",
"execution_count": 263,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" USER_ID NATIONALITY\n",
"37013 RDow United Kingdom\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" None | \n",
" Economy | \n",
" Business Class | \n",
" Premium Economy | \n",
" First Class | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" british-airways | \n",
" thomson-airways | \n",
" british-airways | \n",
" thomson-airways | \n",
" united-airlines | \n",
"
\n",
" \n",
" 1 | \n",
" united-airlines | \n",
" thomas-cook-airlines | \n",
" virgin-atlantic-airways | \n",
" virgin-atlantic-airways | \n",
" british-airways | \n",
"
\n",
" \n",
" 2 | \n",
" thomson-airways | \n",
" united-airlines | \n",
" turkish-airlines | \n",
" air-new-zealand | \n",
" american-airlines | \n",
"
\n",
" \n",
" 3 | \n",
" virgin-atlantic-airways | \n",
" easyjet | \n",
" united-airlines | \n",
" british-airways | \n",
" china-southern-airlines | \n",
"
\n",
" \n",
" 4 | \n",
" air-new-zealand | \n",
" monarch-airlines | \n",
" china-southern-airlines | \n",
" united-airlines | \n",
" delta-air-lines | \n",
"
\n",
" \n",
" 5 | \n",
" turkish-airlines | \n",
" virgin-atlantic-airways | \n",
" qatar-airways | \n",
" thomas-cook-airlines | \n",
" lufthansa | \n",
"
\n",
" \n",
" 6 | \n",
" china-southern-airlines | \n",
" british-airways | \n",
" emirates | \n",
" turkish-airlines | \n",
" alaska-airlines | \n",
"
\n",
" \n",
" 7 | \n",
" thomas-cook-airlines | \n",
" jet2-com | \n",
" air-france | \n",
" monarch-airlines | \n",
" virgin-atlantic-airways | \n",
"
\n",
" \n",
" 8 | \n",
" lufthansa | \n",
" lufthansa | \n",
" american-airlines | \n",
" china-southern-airlines | \n",
" us-airways | \n",
"
\n",
" \n",
" 9 | \n",
" american-airlines | \n",
" american-airlines | \n",
" lufthansa | \n",
" eva-air | \n",
" thomson-airways | \n",
"
\n",
" \n",
" 10 | \n",
" delta-air-lines | \n",
" flybe | \n",
" etihad-airways | \n",
" cathay-pacific-airways | \n",
" air-new-zealand | \n",
"
\n",
" \n",
" 11 | \n",
" air-france | \n",
" norwegian | \n",
" air-new-zealand | \n",
" air-france | \n",
" turkish-airlines | \n",
"
\n",
" \n",
" 12 | \n",
" monarch-airlines | \n",
" turkish-airlines | \n",
" klm-royal-dutch-airlines | \n",
" lufthansa | \n",
" emirates | \n",
"
\n",
" \n",
" 13 | \n",
" klm-royal-dutch-airlines | \n",
" ryanair | \n",
" cathay-pacific-airways | \n",
" klm-royal-dutch-airlines | \n",
" etihad-airways | \n",
"
\n",
" \n",
" 14 | \n",
" us-airways | \n",
" aer-lingus | \n",
" thai-airways | \n",
" air-transat | \n",
" virgin-america | \n",
"
\n",
" \n",
" 15 | \n",
" eva-air | \n",
" china-southern-airlines | \n",
" eva-air | \n",
" delta-air-lines | \n",
" cathay-pacific-airways | \n",
"
\n",
" \n",
" 16 | \n",
" cathay-pacific-airways | \n",
" air-new-zealand | \n",
" asiana-airlines | \n",
" sas-scandinavian-airlines | \n",
" qantas-airways | \n",
"
\n",
" \n",
" 17 | \n",
" air-transat | \n",
" tap-portugal | \n",
" air-canada | \n",
" american-airlines | \n",
" qatar-airways | \n",
"
\n",
" \n",
" 18 | \n",
" sas-scandinavian-airlines | \n",
" delta-air-lines | \n",
" air-transat | \n",
" icelandair | \n",
" thai-airways | \n",
"
\n",
" \n",
" 19 | \n",
" icelandair | \n",
" klm-royal-dutch-airlines | \n",
" delta-air-lines | \n",
" vietnam-airlines | \n",
" asiana-airlines | \n",
"
\n",
" \n",
" 20 | \n",
" easyjet | \n",
" us-airways | \n",
" vietnam-airlines | \n",
" norwegian | \n",
" thomas-cook-airlines | \n",
"
\n",
" \n",
" 21 | \n",
" norwegian | \n",
" air-transat | \n",
" thomson-airways | \n",
" us-airways | \n",
" aer-lingus | \n",
"
\n",
" \n",
" 22 | \n",
" vietnam-airlines | \n",
" icelandair | \n",
" icelandair | \n",
" air-canada-rouge | \n",
" hawaiian-airlines | \n",
"
\n",
" \n",
" 23 | \n",
" etihad-airways | \n",
" air-france | \n",
" tap-portugal | \n",
" flybe | \n",
" air-canada | \n",
"
\n",
" \n",
" 24 | \n",
" air-canada | \n",
" cityjet | \n",
" aer-lingus | \n",
" qantas-airways | \n",
" air-france | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" None Economy \\\n",
"0 british-airways thomson-airways \n",
"1 united-airlines thomas-cook-airlines \n",
"2 thomson-airways united-airlines \n",
"3 virgin-atlantic-airways easyjet \n",
"4 air-new-zealand monarch-airlines \n",
"5 turkish-airlines virgin-atlantic-airways \n",
"6 china-southern-airlines british-airways \n",
"7 thomas-cook-airlines jet2-com \n",
"8 lufthansa lufthansa \n",
"9 american-airlines american-airlines \n",
"10 delta-air-lines flybe \n",
"11 air-france norwegian \n",
"12 monarch-airlines turkish-airlines \n",
"13 klm-royal-dutch-airlines ryanair \n",
"14 us-airways aer-lingus \n",
"15 eva-air china-southern-airlines \n",
"16 cathay-pacific-airways air-new-zealand \n",
"17 air-transat tap-portugal \n",
"18 sas-scandinavian-airlines delta-air-lines \n",
"19 icelandair klm-royal-dutch-airlines \n",
"20 easyjet us-airways \n",
"21 norwegian air-transat \n",
"22 vietnam-airlines icelandair \n",
"23 etihad-airways air-france \n",
"24 air-canada cityjet \n",
"\n",
" Business Class Premium Economy \\\n",
"0 british-airways thomson-airways \n",
"1 virgin-atlantic-airways virgin-atlantic-airways \n",
"2 turkish-airlines air-new-zealand \n",
"3 united-airlines british-airways \n",
"4 china-southern-airlines united-airlines \n",
"5 qatar-airways thomas-cook-airlines \n",
"6 emirates turkish-airlines \n",
"7 air-france monarch-airlines \n",
"8 american-airlines china-southern-airlines \n",
"9 lufthansa eva-air \n",
"10 etihad-airways cathay-pacific-airways \n",
"11 air-new-zealand air-france \n",
"12 klm-royal-dutch-airlines lufthansa \n",
"13 cathay-pacific-airways klm-royal-dutch-airlines \n",
"14 thai-airways air-transat \n",
"15 eva-air delta-air-lines \n",
"16 asiana-airlines sas-scandinavian-airlines \n",
"17 air-canada american-airlines \n",
"18 air-transat icelandair \n",
"19 delta-air-lines vietnam-airlines \n",
"20 vietnam-airlines norwegian \n",
"21 thomson-airways us-airways \n",
"22 icelandair air-canada-rouge \n",
"23 tap-portugal flybe \n",
"24 aer-lingus qantas-airways \n",
"\n",
" First Class \n",
"0 united-airlines \n",
"1 british-airways \n",
"2 american-airlines \n",
"3 china-southern-airlines \n",
"4 delta-air-lines \n",
"5 lufthansa \n",
"6 alaska-airlines \n",
"7 virgin-atlantic-airways \n",
"8 us-airways \n",
"9 thomson-airways \n",
"10 air-new-zealand \n",
"11 turkish-airlines \n",
"12 emirates \n",
"13 etihad-airways \n",
"14 virgin-america \n",
"15 cathay-pacific-airways \n",
"16 qantas-airways \n",
"17 qatar-airways \n",
"18 thai-airways \n",
"19 asiana-airlines \n",
"20 thomas-cook-airlines \n",
"21 aer-lingus \n",
"22 hawaiian-airlines \n",
"23 air-canada \n",
"24 air-france "
]
},
"execution_count": 263,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"recommendations_df_users = pd.DataFrame()\n",
"users = users_df.sample()\n",
"print(users)\n",
"users= users['USER_ID'].tolist()\n",
"for user in users:\n",
" recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)\n",
"\n",
"recommendations_df_users"
]
},
{
"cell_type": "code",
"execution_count": 264,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" USER_ID NATIONALITY\n",
"26198 CJeff Singapore\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" None | \n",
" Economy | \n",
" Business Class | \n",
" Premium Economy | \n",
" First Class | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" philippine-airlines | \n",
" philippine-airlines | \n",
" cathay-pacific-airways | \n",
" cathay-pacific-airways | \n",
" singapore-airlines | \n",
"
\n",
" \n",
" 1 | \n",
" cathay-pacific-airways | \n",
" singapore-airlines | \n",
" philippine-airlines | \n",
" air-france | \n",
" thai-airways | \n",
"
\n",
" \n",
" 2 | \n",
" singapore-airlines | \n",
" tigerair | \n",
" malaysia-airlines | \n",
" eva-air | \n",
" philippine-airlines | \n",
"
\n",
" \n",
" 3 | \n",
" ana-all-nippon-airways | \n",
" jetstar-asia | \n",
" srilankan-airlines | \n",
" air-new-zealand | \n",
" delta-air-lines | \n",
"
\n",
" \n",
" 4 | \n",
" thai-airways | \n",
" cathay-pacific-airways | \n",
" thai-airways | \n",
" philippine-airlines | \n",
" emirates | \n",
"
\n",
" \n",
" 5 | \n",
" air-india | \n",
" airasia | \n",
" singapore-airlines | \n",
" klm-royal-dutch-airlines | \n",
" ana-all-nippon-airways | \n",
"
\n",
" \n",
" 6 | \n",
" china-eastern-airlines | \n",
" cebu-pacific | \n",
" air-india | \n",
" airasia-x | \n",
" china-southern-airlines | \n",
"
\n",
" \n",
" 7 | \n",
" dragonair | \n",
" scoot | \n",
" emirates | \n",
" qantas-airways | \n",
" alaska-airlines | \n",
"
\n",
" \n",
" 8 | \n",
" srilankan-airlines | \n",
" ana-all-nippon-airways | \n",
" ana-all-nippon-airways | \n",
" china-southern-airlines | \n",
" american-airlines | \n",
"
\n",
" \n",
" 9 | \n",
" air-france | \n",
" dragonair | \n",
" asiana-airlines | \n",
" united-airlines | \n",
" china-eastern-airlines | \n",
"
\n",
" \n",
" 10 | \n",
" jetstar-airways | \n",
" jetstar-airways | \n",
" qatar-airways | \n",
" japan-airlines | \n",
" united-airlines | \n",
"
\n",
" \n",
" 11 | \n",
" malaysia-airlines | \n",
" air-india | \n",
" china-eastern-airlines | \n",
" turkish-airlines | \n",
" swiss-international-air-lines | \n",
"
\n",
" \n",
" 12 | \n",
" japan-airlines | \n",
" china-eastern-airlines | \n",
" eva-air | \n",
" virgin-australia | \n",
" cathay-pacific-airways | \n",
"
\n",
" \n",
" 13 | \n",
" air-new-zealand | \n",
" malaysia-airlines | \n",
" dragonair | \n",
" dragonair | \n",
" asiana-airlines | \n",
"
\n",
" \n",
" 14 | \n",
" klm-royal-dutch-airlines | \n",
" bangkok-airways | \n",
" swiss-international-air-lines | \n",
" thai-airways | \n",
" garuda-indonesia | \n",
"
\n",
" \n",
" 15 | \n",
" china-southern-airlines | \n",
" thai-airways | \n",
" qantas-airways | \n",
" ana-all-nippon-airways | \n",
" air-india | \n",
"
\n",
" \n",
" 16 | \n",
" asiana-airlines | \n",
" silkair | \n",
" korean-air | \n",
" singapore-airlines | \n",
" hawaiian-airlines | \n",
"
\n",
" \n",
" 17 | \n",
" eva-air | \n",
" srilankan-airlines | \n",
" china-southern-airlines | \n",
" vietnam-airlines | \n",
" malaysia-airlines | \n",
"
\n",
" \n",
" 18 | \n",
" scoot | \n",
" airasia-x | \n",
" garuda-indonesia | \n",
" delta-air-lines | \n",
" lufthansa | \n",
"
\n",
" \n",
" 19 | \n",
" delta-air-lines | \n",
" jet-airways | \n",
" air-france | \n",
" virgin-atlantic-airways | \n",
" virgin-america | \n",
"
\n",
" \n",
" 20 | \n",
" jetstar-asia | \n",
" korean-air | \n",
" japan-airlines | \n",
" air-india | \n",
" jetstar-airways | \n",
"
\n",
" \n",
" 21 | \n",
" qantas-airways | \n",
" south-african-airways | \n",
" south-african-airways | \n",
" scoot | \n",
" qantas-airways | \n",
"
\n",
" \n",
" 22 | \n",
" airasia-x | \n",
" hong-kong-airlines | \n",
" royal-brunei-airlines | \n",
" hong-kong-airlines | \n",
" tigerair | \n",
"
\n",
" \n",
" 23 | \n",
" korean-air | \n",
" japan-airlines | \n",
" finnair | \n",
" china-eastern-airlines | \n",
" qatar-airways | \n",
"
\n",
" \n",
" 24 | \n",
" emirates | \n",
" malindo-air | \n",
" china-airlines | \n",
" air-china | \n",
" air-china | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" None Economy \\\n",
"0 philippine-airlines philippine-airlines \n",
"1 cathay-pacific-airways singapore-airlines \n",
"2 singapore-airlines tigerair \n",
"3 ana-all-nippon-airways jetstar-asia \n",
"4 thai-airways cathay-pacific-airways \n",
"5 air-india airasia \n",
"6 china-eastern-airlines cebu-pacific \n",
"7 dragonair scoot \n",
"8 srilankan-airlines ana-all-nippon-airways \n",
"9 air-france dragonair \n",
"10 jetstar-airways jetstar-airways \n",
"11 malaysia-airlines air-india \n",
"12 japan-airlines china-eastern-airlines \n",
"13 air-new-zealand malaysia-airlines \n",
"14 klm-royal-dutch-airlines bangkok-airways \n",
"15 china-southern-airlines thai-airways \n",
"16 asiana-airlines silkair \n",
"17 eva-air srilankan-airlines \n",
"18 scoot airasia-x \n",
"19 delta-air-lines jet-airways \n",
"20 jetstar-asia korean-air \n",
"21 qantas-airways south-african-airways \n",
"22 airasia-x hong-kong-airlines \n",
"23 korean-air japan-airlines \n",
"24 emirates malindo-air \n",
"\n",
" Business Class Premium Economy \\\n",
"0 cathay-pacific-airways cathay-pacific-airways \n",
"1 philippine-airlines air-france \n",
"2 malaysia-airlines eva-air \n",
"3 srilankan-airlines air-new-zealand \n",
"4 thai-airways philippine-airlines \n",
"5 singapore-airlines klm-royal-dutch-airlines \n",
"6 air-india airasia-x \n",
"7 emirates qantas-airways \n",
"8 ana-all-nippon-airways china-southern-airlines \n",
"9 asiana-airlines united-airlines \n",
"10 qatar-airways japan-airlines \n",
"11 china-eastern-airlines turkish-airlines \n",
"12 eva-air virgin-australia \n",
"13 dragonair dragonair \n",
"14 swiss-international-air-lines thai-airways \n",
"15 qantas-airways ana-all-nippon-airways \n",
"16 korean-air singapore-airlines \n",
"17 china-southern-airlines vietnam-airlines \n",
"18 garuda-indonesia delta-air-lines \n",
"19 air-france virgin-atlantic-airways \n",
"20 japan-airlines air-india \n",
"21 south-african-airways scoot \n",
"22 royal-brunei-airlines hong-kong-airlines \n",
"23 finnair china-eastern-airlines \n",
"24 china-airlines air-china \n",
"\n",
" First Class \n",
"0 singapore-airlines \n",
"1 thai-airways \n",
"2 philippine-airlines \n",
"3 delta-air-lines \n",
"4 emirates \n",
"5 ana-all-nippon-airways \n",
"6 china-southern-airlines \n",
"7 alaska-airlines \n",
"8 american-airlines \n",
"9 china-eastern-airlines \n",
"10 united-airlines \n",
"11 swiss-international-air-lines \n",
"12 cathay-pacific-airways \n",
"13 asiana-airlines \n",
"14 garuda-indonesia \n",
"15 air-india \n",
"16 hawaiian-airlines \n",
"17 malaysia-airlines \n",
"18 lufthansa \n",
"19 virgin-america \n",
"20 jetstar-airways \n",
"21 qantas-airways \n",
"22 tigerair \n",
"23 qatar-airways \n",
"24 air-china "
]
},
"execution_count": 264,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"recommendations_df_users = pd.DataFrame()\n",
"users = users_df.sample()\n",
"print(users)\n",
"users= users['USER_ID'].tolist()\n",
"for user in users:\n",
" recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)\n",
"\n",
"recommendations_df_users"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we clearly see that the recommendations for each user are different. If you were to need a cache for these results, you could start by running the API calls through all your users and store the results, or you could use a batch export, which will be covered later in this notebook.\n",
"\n",
"The next topic is real-time events. Personalize has the ability to listen to events from your application in order to update the recommendations shown to the user. This is especially useful in media workloads, like video-on-demand, where a customer's intent may differ based on if they are watching with their children or on their own.\n",
"\n",
"Additionally the events that are recorded via this system are stored until a delete call from you is issued, and they are used as historical data alongside the other interaction data you provided when you train your next models.\n",
"\n",
"#### Real time events\n",
"\n",
"Start by creating an event tracker that is attached to the campaign."
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arn:aws:personalize:us-east-1:144386903708:event-tracker/d2e7ccdc\n",
"820029aa-b00c-4eff-9e6f-60830bb68508\n"
]
}
],
"source": [
"response = personalize.create_event_tracker(\n",
" name='AirlinesEventsTracker',\n",
" datasetGroupArn=dataset_group_arn\n",
")\n",
"print(response['eventTrackerArn'])\n",
"print(response['trackingId'])\n",
"TRACKING_ID = response['trackingId']\n",
"event_tracker_arn = response['eventTrackerArn']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will create some code that simulates a user interacting with a particular item. After running this code, you will get recommendations that differ from the results above.\n",
"\n",
"We start by creating some methods for the simulation of real time events."
]
},
{
"cell_type": "code",
"execution_count": 200,
"metadata": {},
"outputs": [],
"source": [
"session_dict = {}\n",
"\n",
"def send_user_rating(USER_ID, ITEM_ID):\n",
" \"\"\"\n",
" Simulates a click as an envent\n",
" to send an event to Amazon Personalize's Event Tracker\n",
" \"\"\"\n",
" # Configure Session\n",
" try:\n",
" session_ID = session_dict[str(USER_ID)]\n",
" except:\n",
" session_dict[str(USER_ID)] = str(uuid.uuid1())\n",
" session_ID = session_dict[str(USER_ID)]\n",
" \n",
" # Configure Properties:\n",
" event = {\n",
" \"itemId\": str(ITEM_ID),\n",
" \"eventValue\": 10,\n",
" \"cabinType\": \"Economy\"\n",
" }\n",
" event_json = json.dumps(event)\n",
" \n",
" # Make Call\n",
" personalize_events.put_events(\n",
" trackingId = TRACKING_ID,\n",
" userId= str(USER_ID),\n",
" sessionId = session_ID,\n",
" eventList = [{\n",
" 'sentAt': int(time.time()),\n",
" 'eventType': 'RATING',\n",
" 'properties': event_json\n",
" }]\n",
" )\n",
"\n",
"def get_new_recommendations_df_users_real_time(recommendations_df, user_id, item_id):\n",
" # Interact with the airline\n",
" # Sending a rating of 10 in Economy class for the airline with that user\n",
" send_user_rating(USER_ID=user_id, ITEM_ID=item_id)\n",
" \n",
" \n",
" # Context Recommendations\n",
" get_recommendations_response = personalize_runtime.get_recommendations(\n",
" campaignArn = campaign_arn,\n",
" userId = str(user_id),\n",
" context = {\n",
" 'CABIN_TYPE': 'Economy'\n",
" }\n",
" )\n",
" # Build a new dataframe of recommendations\n",
" item_list = get_recommendations_response['itemList']\n",
" recommendation_list = []\n",
" for item in item_list:\n",
" recommendation_list.append(item['itemId'])\n",
" new_rec_DF = pd.DataFrame(recommendation_list, columns = [item_id+'|Economy'])\n",
" recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
" return recommendations_df\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At this point, we haven't generated any real-time events yet; we have only set up the code. To compare the recommendations before and after the real-time events, let's pick one user and generate the original recommendations for them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Recommendations before using the Event Tracker"
]
},
{
"cell_type": "code",
"execution_count": 265,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" USER_ID NATIONALITY\n",
"6313 ACrociani Italy\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" None | \n",
" Economy | \n",
" Business Class | \n",
" Premium Economy | \n",
" First Class | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" british-airways | \n",
" alitalia | \n",
" iberia | \n",
" british-airways | \n",
" american-airlines | \n",
"
\n",
" \n",
" 1 | \n",
" alitalia | \n",
" brussels-airlines | \n",
" british-airways | \n",
" virgin-atlantic-airways | \n",
" british-airways | \n",
"
\n",
" \n",
" 2 | \n",
" iberia | \n",
" ryanair | \n",
" qatar-airways | \n",
" united-airlines | \n",
" delta-air-lines | \n",
"
\n",
" \n",
" 3 | \n",
" brussels-airlines | \n",
" easyjet | \n",
" alitalia | \n",
" brussels-airlines | \n",
" united-airlines | \n",
"
\n",
" \n",
" 4 | \n",
" qatar-airways | \n",
" iberia | \n",
" brussels-airlines | \n",
" turkish-airlines | \n",
" lufthansa | \n",
"
\n",
" \n",
" 5 | \n",
" lufthansa | \n",
" aer-lingus | \n",
" emirates | \n",
" alitalia | \n",
" emirates | \n",
"
\n",
" \n",
" 6 | \n",
" american-airlines | \n",
" aegean-airlines | \n",
" lufthansa | \n",
" air-france | \n",
" qatar-airways | \n",
"
\n",
" \n",
" 7 | \n",
" united-airlines | \n",
" lufthansa | \n",
" turkish-airlines | \n",
" lufthansa | \n",
" swiss-international-air-lines | \n",
"
\n",
" \n",
" 8 | \n",
" icelandair | \n",
" qatar-airways | \n",
" swiss-international-air-lines | \n",
" icelandair | \n",
" us-airways | \n",
"
\n",
" \n",
" 9 | \n",
" delta-air-lines | \n",
" tap-portugal | \n",
" oman-air | \n",
" delta-air-lines | \n",
" iberia | \n",
"
\n",
" \n",
" 10 | \n",
" turkish-airlines | \n",
" air-berlin | \n",
" air-europa | \n",
" iberia | \n",
" air-berlin | \n",
"
\n",
" \n",
" 11 | \n",
" aer-lingus | \n",
" germanwings | \n",
" egyptair | \n",
" klm-royal-dutch-airlines | \n",
" alitalia | \n",
"
\n",
" \n",
" 12 | \n",
" air-berlin | \n",
" vueling-airlines | \n",
" american-airlines | \n",
" thomson-airways | \n",
" icelandair | \n",
"
\n",
" \n",
" 13 | \n",
" emirates | \n",
" british-airways | \n",
" tap-portugal | \n",
" sas-scandinavian-airlines | \n",
" aer-lingus | \n",
"
\n",
" \n",
" 14 | \n",
" air-france | \n",
" icelandair | \n",
" icelandair | \n",
" american-airlines | \n",
" turkish-airlines | \n",
"
\n",
" \n",
" 15 | \n",
" virgin-atlantic-airways | \n",
" air-europa | \n",
" air-berlin | \n",
" norwegian | \n",
" brussels-airlines | \n",
"
\n",
" \n",
" 16 | \n",
" tap-portugal | \n",
" egyptair | \n",
" austrian-airlines | \n",
" us-airways | \n",
" austrian-airlines | \n",
"
\n",
" \n",
" 17 | \n",
" air-europa | \n",
" austrian-airlines | \n",
" united-airlines | \n",
" air-canada-rouge | \n",
" alaska-airlines | \n",
"
\n",
" \n",
" 18 | \n",
" swiss-international-air-lines | \n",
" wizz-air | \n",
" finnair | \n",
" thomas-cook-airlines | \n",
" ana-all-nippon-airways | \n",
"
\n",
" \n",
" 19 | \n",
" us-airways | \n",
" ethiopian-airlines | \n",
" virgin-atlantic-airways | \n",
" qatar-airways | \n",
" air-france | \n",
"
\n",
" \n",
" 20 | \n",
" ryanair | \n",
" air-india | \n",
" aegean-airlines | \n",
" openskies | \n",
" sas-scandinavian-airlines | \n",
"
\n",
" \n",
" 21 | \n",
" easyjet | \n",
" american-airlines | \n",
" etihad-airways | \n",
" easyjet | \n",
" etihad-airways | \n",
"
\n",
" \n",
" 22 | \n",
" austrian-airlines | \n",
" meridiana | \n",
" south-african-airways | \n",
" cathay-pacific-airways | \n",
" air-canada | \n",
"
\n",
" \n",
" 23 | \n",
" aegean-airlines | \n",
" jet-airways | \n",
" avianca | \n",
" vueling-airlines | \n",
" norwegian | \n",
"
\n",
" \n",
" 24 | \n",
" egyptair | \n",
" ukraine-international-airlines | \n",
" air-india | \n",
" egyptair | \n",
" finnair | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" None Economy \\\n",
"0 british-airways alitalia \n",
"1 alitalia brussels-airlines \n",
"2 iberia ryanair \n",
"3 brussels-airlines easyjet \n",
"4 qatar-airways iberia \n",
"5 lufthansa aer-lingus \n",
"6 american-airlines aegean-airlines \n",
"7 united-airlines lufthansa \n",
"8 icelandair qatar-airways \n",
"9 delta-air-lines tap-portugal \n",
"10 turkish-airlines air-berlin \n",
"11 aer-lingus germanwings \n",
"12 air-berlin vueling-airlines \n",
"13 emirates british-airways \n",
"14 air-france icelandair \n",
"15 virgin-atlantic-airways air-europa \n",
"16 tap-portugal egyptair \n",
"17 air-europa austrian-airlines \n",
"18 swiss-international-air-lines wizz-air \n",
"19 us-airways ethiopian-airlines \n",
"20 ryanair air-india \n",
"21 easyjet american-airlines \n",
"22 austrian-airlines meridiana \n",
"23 aegean-airlines jet-airways \n",
"24 egyptair ukraine-international-airlines \n",
"\n",
" Business Class Premium Economy \\\n",
"0 iberia british-airways \n",
"1 british-airways virgin-atlantic-airways \n",
"2 qatar-airways united-airlines \n",
"3 alitalia brussels-airlines \n",
"4 brussels-airlines turkish-airlines \n",
"5 emirates alitalia \n",
"6 lufthansa air-france \n",
"7 turkish-airlines lufthansa \n",
"8 swiss-international-air-lines icelandair \n",
"9 oman-air delta-air-lines \n",
"10 air-europa iberia \n",
"11 egyptair klm-royal-dutch-airlines \n",
"12 american-airlines thomson-airways \n",
"13 tap-portugal sas-scandinavian-airlines \n",
"14 icelandair american-airlines \n",
"15 air-berlin norwegian \n",
"16 austrian-airlines us-airways \n",
"17 united-airlines air-canada-rouge \n",
"18 finnair thomas-cook-airlines \n",
"19 virgin-atlantic-airways qatar-airways \n",
"20 aegean-airlines openskies \n",
"21 etihad-airways easyjet \n",
"22 south-african-airways cathay-pacific-airways \n",
"23 avianca vueling-airlines \n",
"24 air-india egyptair \n",
"\n",
" First Class \n",
"0 american-airlines \n",
"1 british-airways \n",
"2 delta-air-lines \n",
"3 united-airlines \n",
"4 lufthansa \n",
"5 emirates \n",
"6 qatar-airways \n",
"7 swiss-international-air-lines \n",
"8 us-airways \n",
"9 iberia \n",
"10 air-berlin \n",
"11 alitalia \n",
"12 icelandair \n",
"13 aer-lingus \n",
"14 turkish-airlines \n",
"15 brussels-airlines \n",
"16 austrian-airlines \n",
"17 alaska-airlines \n",
"18 ana-all-nippon-airways \n",
"19 air-france \n",
"20 sas-scandinavian-airlines \n",
"21 etihad-airways \n",
"22 air-canada \n",
"23 norwegian \n",
"24 finnair "
]
},
"execution_count": 265,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"recommendations_df_users = pd.DataFrame()\n",
"users = users_df.sample()\n",
"print(users)\n",
"users= users['USER_ID'].tolist()\n",
"for user in users:\n",
" recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)\n",
"user_id = users[0]\n",
"recommendations_df_users"
]
},
{
"cell_type": "code",
"execution_count": 266,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'ACrociani'"
]
},
"execution_count": 266,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok, so now we have a list of recommendations for this user before we have applied any real-time events. Now let's pick 3 random artists which we will simulate our user interacting with, and then see how this changes the recommendations."
]
},
{
"cell_type": "code",
"execution_count": 267,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['thai-airways', 'austrian-airlines', 'united-airlines']"
]
},
"execution_count": 267,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Next generate 3 random Airlines\n",
"airlines = a_interactions_df.sample(3)['ITEM_ID'].tolist()\n",
"airlines"
]
},
{
"cell_type": "code",
"execution_count": 268,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ACrociani\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" thai-airways|Economy | \n",
" austrian-airlines|Economy | \n",
" united-airlines|Economy | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" alitalia | \n",
" ryanair | \n",
" brussels-airlines | \n",
"
\n",
" \n",
" 1 | \n",
" brussels-airlines | \n",
" brussels-airlines | \n",
" lufthansa | \n",
"
\n",
" \n",
" 2 | \n",
" ryanair | \n",
" easyjet | \n",
" turkish-airlines | \n",
"
\n",
" \n",
" 3 | \n",
" easyjet | \n",
" tap-portugal | \n",
" austrian-airlines | \n",
"
\n",
" \n",
" 4 | \n",
" iberia | \n",
" aegean-airlines | \n",
" tap-portugal | \n",
"
\n",
" \n",
" 5 | \n",
" aer-lingus | \n",
" turkish-airlines | \n",
" alitalia | \n",
"
\n",
" \n",
" 6 | \n",
" aegean-airlines | \n",
" lufthansa | \n",
" germanwings | \n",
"
\n",
" \n",
" 7 | \n",
" lufthansa | \n",
" alitalia | \n",
" aegean-airlines | \n",
"
\n",
" \n",
" 8 | \n",
" qatar-airways | \n",
" iberia | \n",
" swiss-international-air-lines | \n",
"
\n",
" \n",
" 9 | \n",
" tap-portugal | \n",
" air-india | \n",
" air-berlin | \n",
"
\n",
" \n",
" 10 | \n",
" air-berlin | \n",
" qatar-airways | \n",
" easyjet | \n",
"
\n",
" \n",
" 11 | \n",
" germanwings | \n",
" swiss-international-air-lines | \n",
" ryanair | \n",
"
\n",
" \n",
" 12 | \n",
" vueling-airlines | \n",
" egyptair | \n",
" egyptair | \n",
"
\n",
" \n",
" 13 | \n",
" british-airways | \n",
" austrian-airlines | \n",
" iberia | \n",
"
\n",
" \n",
" 14 | \n",
" icelandair | \n",
" air-berlin | \n",
" icelandair | \n",
"
\n",
" \n",
" 15 | \n",
" air-europa | \n",
" emirates | \n",
" ethiopian-airlines | \n",
"
\n",
" \n",
" 16 | \n",
" egyptair | \n",
" germanwings | \n",
" air-india | \n",
"
\n",
" \n",
" 17 | \n",
" austrian-airlines | \n",
" aeroflot-russian-airlines | \n",
" qatar-airways | \n",
"
\n",
" \n",
" 18 | \n",
" wizz-air | \n",
" ethiopian-airlines | \n",
" air-europa | \n",
"
\n",
" \n",
" 19 | \n",
" ethiopian-airlines | \n",
" vueling-airlines | \n",
" vueling-airlines | \n",
"
\n",
" \n",
" 20 | \n",
" air-india | \n",
" aer-lingus | \n",
" aeroflot-russian-airlines | \n",
"
\n",
" \n",
" 21 | \n",
" american-airlines | \n",
" british-airways | \n",
" united-airlines | \n",
"
\n",
" \n",
" 22 | \n",
" meridiana | \n",
" tam-airlines | \n",
" norwegian | \n",
"
\n",
" \n",
" 23 | \n",
" jet-airways | \n",
" american-airlines | \n",
" american-airlines | \n",
"
\n",
" \n",
" 24 | \n",
" ukraine-international-airlines | \n",
" bangkok-airways | \n",
" tam-airlines | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" thai-airways|Economy austrian-airlines|Economy \\\n",
"0 alitalia ryanair \n",
"1 brussels-airlines brussels-airlines \n",
"2 ryanair easyjet \n",
"3 easyjet tap-portugal \n",
"4 iberia aegean-airlines \n",
"5 aer-lingus turkish-airlines \n",
"6 aegean-airlines lufthansa \n",
"7 lufthansa alitalia \n",
"8 qatar-airways iberia \n",
"9 tap-portugal air-india \n",
"10 air-berlin qatar-airways \n",
"11 germanwings swiss-international-air-lines \n",
"12 vueling-airlines egyptair \n",
"13 british-airways austrian-airlines \n",
"14 icelandair air-berlin \n",
"15 air-europa emirates \n",
"16 egyptair germanwings \n",
"17 austrian-airlines aeroflot-russian-airlines \n",
"18 wizz-air ethiopian-airlines \n",
"19 ethiopian-airlines vueling-airlines \n",
"20 air-india aer-lingus \n",
"21 american-airlines british-airways \n",
"22 meridiana tam-airlines \n",
"23 jet-airways american-airlines \n",
"24 ukraine-international-airlines bangkok-airways \n",
"\n",
" united-airlines|Economy \n",
"0 brussels-airlines \n",
"1 lufthansa \n",
"2 turkish-airlines \n",
"3 austrian-airlines \n",
"4 tap-portugal \n",
"5 alitalia \n",
"6 germanwings \n",
"7 aegean-airlines \n",
"8 swiss-international-air-lines \n",
"9 air-berlin \n",
"10 easyjet \n",
"11 ryanair \n",
"12 egyptair \n",
"13 iberia \n",
"14 icelandair \n",
"15 ethiopian-airlines \n",
"16 air-india \n",
"17 qatar-airways \n",
"18 air-europa \n",
"19 vueling-airlines \n",
"20 aeroflot-russian-airlines \n",
"21 united-airlines \n",
"22 norwegian \n",
"23 american-airlines \n",
"24 tam-airlines "
]
},
"execution_count": 268,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_recommendations_df = pd.DataFrame()\n",
"# Note this will take about 15 seconds to complete due to the sleeps\n",
"for airline in airlines:\n",
" user_recommendations_df = get_new_recommendations_df_users_real_time(user_recommendations_df, user_id, airline)\n",
" time.sleep(5)\n",
"print(user_id)\n",
"user_recommendations_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the cell above, the first column after the index is the user's default recommendations from the AWS User Personalization model, and each column after that has a header of the airlines that they interacted with via a real time event, and the recommendations after this event occurred. \n",
"\n",
"The behavior may not shift very much; this is due to the relatively limited nature of this dataset. If you wanted to better understand this, try simulating rating random airlines with random ratings, and you should see a more pronounced impact."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_mxnet_p36",
"language": "python",
"name": "conda_mxnet_p36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}