{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Amazon Personalize AWS User Personalization + Contextual Recommendations Example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Introduction <a class=\"anchor\" id=\"intro\"></a>\n",
    "\n",
    "For the most part, the algorithms in Amazon Personalize (called recipes) look to solve different tasks, explained here:\n",
    "\n",
    "1. **User Personalization** - Recommends items based on previous user interactions with items.\n",
    "1. **Personalized-Ranking** - Takes a collection of items and then orders them in probable order of interest using an HRNN-like approach.\n",
    "1. **SIMS (Similar Items)** - Given one item, recommends other items also interacted with by users.\n",
    "\n",
    "No matter the use case, the algorithms all share a base of learning on user-item-interaction data which is defined by 3 core attributes:\n",
    "\n",
    "1. **UserID** - The user who interacted\n",
    "1. **ItemID** - The item the user interacted with\n",
    "1. **Timestamp** - The time at which the interaction occurred\n",
    "\n",
    "\n",
    "## Choose a dataset or data source <a class=\"anchor\" id=\"source\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "As we mentioned, the user-item-iteraction data is key for getting started with the service. This means we need to look for use cases that generate that kind of data, a few common examples are:\n",
    "\n",
    "1. Video-on-demand applications\n",
    "1. E-commerce platforms\n",
    "1. Social media aggregators / platforms\n",
    "\n",
    "There are a few guidelines for scoping a problem suitable for Personalize. We recommend the values below as a starting point, although the [official limits](https://docs.aws.amazon.com/personalize/latest/dg/limits.html) lie a little lower.\n",
    "\n",
    "* Authenticated users\n",
    "* At least 50 unique users\n",
    "* At least 100 unique items\n",
    "* At least 2 dozen interactions for each user \n",
    "\n",
    "Most of the time this is easily attainable, and if you are low in one category, you can often make up for it by having a larger number in another category.\n",
    "\n",
    "Generally speaking your data will not arrive in a perfect form for Personalize, and will take some modification to be structured correctly. This notebook looks to guide you through all of that. \n",
    "\n",
    "To begin with, we are going to use an airlines review dataset. A scraped dataset created from all user reviews found on Skytrax (www.airlinequality.com). The data can be found at https://github.com/quankiquanki/skytrax-reviews-dataset "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 274,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd, numpy as np\n",
    "import io\n",
    "import scipy.sparse as ss\n",
    "import json\n",
    "import time\n",
    "import datetime\n",
    "import os\n",
    "import sagemaker.amazon.common as smac\n",
    "import boto3\n",
    "import uuid\n",
    "from botocore.exceptions import ClientError"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import and Explore your dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 275,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2020-06-18 16:33:44--  https://raw.githubusercontent.com/quankiquanki/skytrax-reviews-dataset/master/data/airline.csv\n",
      "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.64.133\n",
      "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.64.133|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 34752262 (33M) [text/plain]\n",
      "Saving to: ‘airline.csv’\n",
      "\n",
      "airline.csv         100%[===================>]  33.14M   113MB/s    in 0.3s    \n",
      "\n",
      "2020-06-18 16:33:45 (113 MB/s) - ‘airline.csv’ saved [34752262/34752262]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "data_dir = \"airlines_data\"\n",
    "!mkdir $data_dir\n",
    "!cd $data_dir && wget https://raw.githubusercontent.com/quankiquanki/skytrax-reviews-dataset/master/data/airline.csv\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 276,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>airline_name</th>\n",
       "      <th>link</th>\n",
       "      <th>title</th>\n",
       "      <th>author</th>\n",
       "      <th>author_country</th>\n",
       "      <th>date</th>\n",
       "      <th>content</th>\n",
       "      <th>aircraft</th>\n",
       "      <th>type_traveller</th>\n",
       "      <th>cabin_flown</th>\n",
       "      <th>route</th>\n",
       "      <th>overall_rating</th>\n",
       "      <th>seat_comfort_rating</th>\n",
       "      <th>cabin_staff_rating</th>\n",
       "      <th>food_beverages_rating</th>\n",
       "      <th>inflight_entertainment_rating</th>\n",
       "      <th>ground_service_rating</th>\n",
       "      <th>wifi_connectivity_rating</th>\n",
       "      <th>value_money_rating</th>\n",
       "      <th>recommended</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>/airline-reviews/adria-airways</td>\n",
       "      <td>Adria Airways customer review</td>\n",
       "      <td>D Ito</td>\n",
       "      <td>Germany</td>\n",
       "      <td>2015-04-10</td>\n",
       "      <td>Outbound flight FRA/PRN A319. 2 hours 10 min f...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Economy</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>/airline-reviews/adria-airways</td>\n",
       "      <td>Adria Airways customer review</td>\n",
       "      <td>Ron Kuhlmann</td>\n",
       "      <td>United States</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Two short hops ZRH-LJU and LJU-VIE. Very fast ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Business Class</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>/airline-reviews/adria-airways</td>\n",
       "      <td>Adria Airways customer review</td>\n",
       "      <td>E Albin</td>\n",
       "      <td>Switzerland</td>\n",
       "      <td>2014-09-14</td>\n",
       "      <td>Flew Zurich-Ljubljana on JP365 newish CRJ900. ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Economy</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>/airline-reviews/adria-airways</td>\n",
       "      <td>Adria Airways customer review</td>\n",
       "      <td>Tercon Bojan</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>2014-09-06</td>\n",
       "      <td>Adria serves this 100 min flight from Ljubljan...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Business Class</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>/airline-reviews/adria-airways</td>\n",
       "      <td>Adria Airways customer review</td>\n",
       "      <td>L James</td>\n",
       "      <td>Poland</td>\n",
       "      <td>2014-06-16</td>\n",
       "      <td>WAW-SKJ Economy. No free snacks or drinks on t...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Economy</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    airline_name                            link  \\\n",
       "0  adria-airways  /airline-reviews/adria-airways   \n",
       "1  adria-airways  /airline-reviews/adria-airways   \n",
       "2  adria-airways  /airline-reviews/adria-airways   \n",
       "3  adria-airways  /airline-reviews/adria-airways   \n",
       "4  adria-airways  /airline-reviews/adria-airways   \n",
       "\n",
       "                           title        author author_country        date  \\\n",
       "0  Adria Airways customer review         D Ito        Germany  2015-04-10   \n",
       "1  Adria Airways customer review  Ron Kuhlmann  United States  2015-01-05   \n",
       "2  Adria Airways customer review       E Albin    Switzerland  2014-09-14   \n",
       "3  Adria Airways customer review  Tercon Bojan      Singapore  2014-09-06   \n",
       "4  Adria Airways customer review       L James         Poland  2014-06-16   \n",
       "\n",
       "                                             content aircraft type_traveller  \\\n",
       "0  Outbound flight FRA/PRN A319. 2 hours 10 min f...      NaN            NaN   \n",
       "1  Two short hops ZRH-LJU and LJU-VIE. Very fast ...      NaN            NaN   \n",
       "2  Flew Zurich-Ljubljana on JP365 newish CRJ900. ...      NaN            NaN   \n",
       "3  Adria serves this 100 min flight from Ljubljan...      NaN            NaN   \n",
       "4  WAW-SKJ Economy. No free snacks or drinks on t...      NaN            NaN   \n",
       "\n",
       "      cabin_flown route  overall_rating  seat_comfort_rating  \\\n",
       "0         Economy   NaN             7.0                  4.0   \n",
       "1  Business Class   NaN            10.0                  4.0   \n",
       "2         Economy   NaN             9.0                  5.0   \n",
       "3  Business Class   NaN             8.0                  4.0   \n",
       "4         Economy   NaN             4.0                  4.0   \n",
       "\n",
       "   cabin_staff_rating  food_beverages_rating  inflight_entertainment_rating  \\\n",
       "0                 4.0                    4.0                            0.0   \n",
       "1                 5.0                    4.0                            1.0   \n",
       "2                 5.0                    4.0                            0.0   \n",
       "3                 4.0                    3.0                            1.0   \n",
       "4                 2.0                    1.0                            2.0   \n",
       "\n",
       "   ground_service_rating  wifi_connectivity_rating  value_money_rating  \\\n",
       "0                    NaN                       NaN                 4.0   \n",
       "1                    NaN                       NaN                 5.0   \n",
       "2                    NaN                       NaN                 5.0   \n",
       "3                    NaN                       NaN                 4.0   \n",
       "4                    NaN                       NaN                 2.0   \n",
       "\n",
       "   recommended  \n",
       "0            1  \n",
       "1            1  \n",
       "2            1  \n",
       "3            1  \n",
       "4            0  "
      ]
     },
     "execution_count": 276,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airline_df = pd.read_csv(data_dir + '/airline.csv')\n",
    "airline_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see here the dataset has a lot of columns we can use to create the required data sets in Amazon Personalize.\n",
    "\n",
    "The first thing we are going to do is make 2 copies of the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 277,
   "metadata": {},
   "outputs": [],
   "source": [
    "a_interactions_df = airline_df.copy()\n",
    "a_users_df = airline_df.copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the Interactions Data set\n",
    "\n",
    "Let's build the interactions dataset. By following the these steps:\n",
    "\n",
    "- Drop the columns we are not interested in\n",
    "- Create a new column to account for Event Type\n",
    "- Rename the columns to a more standard naming convention for you Amazon Personalize import job\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 278,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ITEM_ID</th>\n",
       "      <th>USER_ID</th>\n",
       "      <th>TIMESTAMP</th>\n",
       "      <th>CABIN_TYPE</th>\n",
       "      <th>EVENT_VALUE</th>\n",
       "      <th>EVENT_TYPE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>DIto</td>\n",
       "      <td>2015-04-10</td>\n",
       "      <td>Economy</td>\n",
       "      <td>7.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>RonKuhlmann</td>\n",
       "      <td>2015-01-05</td>\n",
       "      <td>Business Class</td>\n",
       "      <td>10.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>EAlbin</td>\n",
       "      <td>2014-09-14</td>\n",
       "      <td>Economy</td>\n",
       "      <td>9.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>TerconBojan</td>\n",
       "      <td>2014-09-06</td>\n",
       "      <td>Business Class</td>\n",
       "      <td>8.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>LJames</td>\n",
       "      <td>2014-06-16</td>\n",
       "      <td>Economy</td>\n",
       "      <td>4.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         ITEM_ID      USER_ID   TIMESTAMP      CABIN_TYPE  EVENT_VALUE  \\\n",
       "0  adria-airways         DIto  2015-04-10         Economy          7.0   \n",
       "1  adria-airways  RonKuhlmann  2015-01-05  Business Class         10.0   \n",
       "2  adria-airways       EAlbin  2014-09-14         Economy          9.0   \n",
       "3  adria-airways  TerconBojan  2014-09-06  Business Class          8.0   \n",
       "4  adria-airways       LJames  2014-06-16         Economy          4.0   \n",
       "\n",
       "  EVENT_TYPE  \n",
       "0     RATING  \n",
       "1     RATING  \n",
       "2     RATING  \n",
       "3     RATING  \n",
       "4     RATING  "
      ]
     },
     "execution_count": 278,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Keeping only 5 columns\n",
    "a_interactions_df = a_interactions_df[['airline_name', 'author', 'date', 'cabin_flown', 'overall_rating']]\n",
    "# Creating an additional column for Event Type\n",
    "a_interactions_df['EVENT_TYPE']='RATING'\n",
    "# Making sure the author name is unique without spaces\n",
    "a_interactions_df['author'] = a_interactions_df['author'].str.replace(\" \",\"\")\n",
    "# Rename the columns to a more Amazon Personalize standar notation\n",
    "a_interactions_df.rename(columns = {'airline_name':'ITEM_ID', 'author':'USER_ID',\n",
    "                              'date':'TIMESTAMP', 'cabin_flown': 'CABIN_TYPE', 'overall_rating': 'EVENT_VALUE'}, inplace = True) \n",
    "a_interactions_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Amazon Personalize supports **contextual recommendations**, through which you can improve relevance of recommendations by generating them within a context, for instance device type, location, time of day, etc. Contextual information is also useful in personalization for new/unidentified users even when the past interactions of these users are not known.\n",
    "\n",
    "In our case we are going to use **Cabin Type** as a context to recommend which airline is the best fit for our user. Let's explore which values we are going to be able to pass as our context when getting recommendations\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 279,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Economy', 'Business Class', nan, 'Premium Economy', 'First Class'],\n",
       "      dtype=object)"
      ]
     },
     "execution_count": 279,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a_interactions_df.CABIN_TYPE.unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see our current **Timestamp** value in the dataset is a string. Amazon Personalize requires the timestamp value as Unix type. Let's take a random timestamp value and convert it to Unix type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 280,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Date: 2004-03-18\n",
      "Unix Time:  1079568000.0\n"
     ]
    }
   ],
   "source": [
    "# Get a random value from the timestamp column\n",
    "arb_time_stamp = a_interactions_df.iloc[50]['TIMESTAMP']\n",
    "# Transform this string to date time\n",
    "date_time_obj = datetime.datetime.strptime(arb_time_stamp, '%Y-%m-%d')\n",
    "print('Date:', date_time_obj.date())\n",
    "# Get the date of this object\n",
    "d = date_time_obj.date()\n",
    "# Transform the date object to Unix time\n",
    "unixtime = time.mktime(d.timetuple())\n",
    "print('Unix Time: ', unixtime)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are going to do the same transformation to all of our values in the timestamp column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 281,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ITEM_ID</th>\n",
       "      <th>USER_ID</th>\n",
       "      <th>TIMESTAMP</th>\n",
       "      <th>CABIN_TYPE</th>\n",
       "      <th>EVENT_VALUE</th>\n",
       "      <th>EVENT_TYPE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>DIto</td>\n",
       "      <td>1.428624e+09</td>\n",
       "      <td>Economy</td>\n",
       "      <td>7.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>RonKuhlmann</td>\n",
       "      <td>1.420416e+09</td>\n",
       "      <td>Business Class</td>\n",
       "      <td>10.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>EAlbin</td>\n",
       "      <td>1.410653e+09</td>\n",
       "      <td>Economy</td>\n",
       "      <td>9.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>TerconBojan</td>\n",
       "      <td>1.409962e+09</td>\n",
       "      <td>Business Class</td>\n",
       "      <td>8.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>adria-airways</td>\n",
       "      <td>LJames</td>\n",
       "      <td>1.402877e+09</td>\n",
       "      <td>Economy</td>\n",
       "      <td>4.0</td>\n",
       "      <td>RATING</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         ITEM_ID      USER_ID     TIMESTAMP      CABIN_TYPE  EVENT_VALUE  \\\n",
       "0  adria-airways         DIto  1.428624e+09         Economy          7.0   \n",
       "1  adria-airways  RonKuhlmann  1.420416e+09  Business Class         10.0   \n",
       "2  adria-airways       EAlbin  1.410653e+09         Economy          9.0   \n",
       "3  adria-airways  TerconBojan  1.409962e+09  Business Class          8.0   \n",
       "4  adria-airways       LJames  1.402877e+09         Economy          4.0   \n",
       "\n",
       "  EVENT_TYPE  \n",
       "0     RATING  \n",
       "1     RATING  \n",
       "2     RATING  \n",
       "3     RATING  \n",
       "4     RATING  "
      ]
     },
     "execution_count": 281,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define a function\n",
    "def convert_to_unix(string_date):\n",
    "    date_time_obj = datetime.datetime.strptime(string_date, '%Y-%m-%d')\n",
    "    d = date_time_obj.date()\n",
    "    return time.mktime(d.timetuple())\n",
    "\n",
    "# Apply this function across the Timestamp column\n",
    "a_interactions_df['TIMESTAMP'] = a_interactions_df['TIMESTAMP'].apply(convert_to_unix)\n",
    "a_interactions_df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at some of our dataset properties"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 282,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TIMESTAMP</th>\n",
       "      <th>EVENT_VALUE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>4.139600e+04</td>\n",
       "      <td>36861.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>1.373950e+09</td>\n",
       "      <td>6.039527</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>5.771909e+07</td>\n",
       "      <td>3.214680</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1.350864e+09</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1.389658e+09</td>\n",
       "      <td>7.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>1.412122e+09</td>\n",
       "      <td>9.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1.438474e+09</td>\n",
       "      <td>10.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          TIMESTAMP   EVENT_VALUE\n",
       "count  4.139600e+04  36861.000000\n",
       "mean   1.373950e+09      6.039527\n",
       "std    5.771909e+07      3.214680\n",
       "min    0.000000e+00      1.000000\n",
       "25%    1.350864e+09      3.000000\n",
       "50%    1.389658e+09      7.000000\n",
       "75%    1.412122e+09      9.000000\n",
       "max    1.438474e+09     10.000000"
      ]
     },
     "execution_count": 282,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a_interactions_df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Are there any Null values??"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 283,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 283,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a_interactions_df.isnull().values.any()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's drop those Null values and make sure there aren't any"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 284,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 284,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a_interactions_df = a_interactions_df.dropna()\n",
    "a_interactions_df.isnull().values.any()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have our data ready for Amazon Personalize, let's save it into a file locally"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 285,
   "metadata": {},
   "outputs": [],
   "source": [
    "interactions_filename = \"a_interactions.csv\"\n",
    "a_interactions_df.to_csv((data_dir + \"/\"+interactions_filename), index=False, float_format='%.0f')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the Users Data set\n",
    "\n",
    "Let's build the users dataset. By following the these steps:\n",
    "\n",
    "- Drop the columns we are not interested in\n",
    "- Create a new column to account for Nationality as user metadata\n",
    "- Rename the columns to a more standard naming convention for you Amazon Personalize import job\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 286,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copy the complete airlines data set\n",
    "a_users_df = airline_df.copy()\n",
    "# Select only interested columns\n",
    "a_users_df = a_users_df[['author', 'author_country']]\n",
    "# Clean up the authors string\n",
    "a_users_df['author'] = a_users_df['author'].str.replace(\" \",\"\")\n",
    "# Rename the columns\n",
    "a_users_df.rename(columns = { 'author':'USER_ID', 'author_country':'NATIONALITY'}, inplace = True) \n",
    "# Drop any null values\n",
    "a_users_df = a_users_df.dropna()\n",
    "# Save your file locally\n",
    "users_filename = \"a_users.csv\"\n",
    "a_users_df.to_csv((data_dir +\"/\"+users_filename), index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure an S3 bucket and an IAM  role <a class=\"anchor\" id=\"bucket_role\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "So far, we have downloaded, manipulated, and saved the data onto the Amazon EBS instance attached to instance running this Jupyter notebook. However, Amazon Personalize will need an S3 bucket to act as the source of your data, as well as IAM roles for accessing that bucket. Let's set all of that up.\n",
    "\n",
    "Use the metadata stored on the instance underlying this Amazon SageMaker notebook, to determine the region it is operating in. If you are using a Jupyter notebook outside of Amazon SageMaker, simply define the region as a string below. The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources we have been creating so far."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 288,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "us-east-1\n"
     ]
    }
   ],
   "source": [
    "with open('/opt/ml/metadata/resource-metadata.json') as notebook_info:\n",
    "    data = json.load(notebook_info)\n",
    "    resource_arn = data['ResourceArn']\n",
    "    region = resource_arn.split(':')[3]\n",
    "print(region)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Amazon S3 bucket names are globally unique. To create a unique bucket name, the code below will append the string `personalizepoc` to your AWS account number. Then it creates a bucket with this name in the region discovered in the previous cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = boto3.client('s3')\n",
    "account_id = boto3.client('sts').get_caller_identity().get('Account')\n",
    "suffix = str(np.random.uniform())[4:9]\n",
    "bucket_name = \"personalize-user-personalization-example\" + suffix\n",
    "print(bucket_name)\n",
    "if region != \"us-east-1\":\n",
    "    s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={'LocationConstraint': region})\n",
    "else:\n",
    "    s3.create_bucket(Bucket=bucket_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Upload data to S3\n",
    "\n",
    "Now that your Amazon S3 bucket has been created, upload the CSV file of our user-item-interaction data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 294,
   "metadata": {},
   "outputs": [],
   "source": [
    "interactions_filename = data_dir + '/a_interactions.csv'\n",
    "boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_filename).upload_file(interactions_filename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 295,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_metadata_file = data_dir + '/a_users.csv'\n",
    "boto3.Session().resource('s3').Bucket(bucket_name).Object(user_metadata_file).upload_file(user_metadata_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create dataset groups and the interactions dataset <a class=\"anchor\" id=\"group_dataset\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "The highest level of isolation and abstraction with Amazon Personalize is a *dataset group*. Information stored within one of these dataset groups has no impact on any other dataset group or models created from one - they are completely isolated. This allows you to run many experiments and is part of how we keep your models private and fully trained only on your data. \n",
    "\n",
    "Before importing the data prepared earlier, there needs to be a dataset group and a dataset added to it that handles the interactions.\n",
    "\n",
    "Dataset groups can house the following types of information:\n",
    "\n",
    "* User-item-interactions\n",
    "* Event streams (real-time interactions)\n",
    "* User metadata\n",
    "* Item metadata\n",
    "\n",
    "Before we create the dataset group and the dataset for our interaction data, let's validate that your environment can communicate successfully with Amazon Personalize."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 293,
   "metadata": {},
   "outputs": [],
   "source": [
    "personalize = boto3.client(service_name='personalize')\n",
    "personalize_runtime = boto3.client(service_name='personalize-runtime')\n",
    "personalize_events = boto3.client(service_name='personalize-events')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a Dataset Group\n",
    "\n",
    "The following cell will create a new dataset group with the name *airlines-dataset-group* + a suffix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetGroupArn\": \"arn:aws:personalize:us-east-1:144386903708:dataset-group/airlines-dataset-group-55035\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"a8bb75fb-f15b-45da-997e-08eb14d7733a\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 15 Jun 2020 21:14:12 GMT\",\n",
      "      \"x-amzn-requestid\": \"a8bb75fb-f15b-45da-997e-08eb14d7733a\",\n",
      "      \"content-length\": \"107\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "dataset_group_name = \"airlines-dataset-group-\" + suffix\n",
    "\n",
    "create_dataset_group_response = personalize.create_dataset_group(\n",
    "    name = dataset_group_name\n",
    ")\n",
    "\n",
    "dataset_group_arn = create_dataset_group_response['datasetGroupArn']\n",
    "print(json.dumps(create_dataset_group_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can use the dataset group, it must be active. This can take a minute or two. Execute the cell below and wait for it to show the ACTIVE status. It checks the status of the dataset group every second, up to a maximum of 3 hours."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DatasetGroup: CREATE PENDING\n",
      "DatasetGroup: ACTIVE\n"
     ]
    }
   ],
   "source": [
    "status = None\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_group_response = personalize.describe_dataset_group(\n",
    "        datasetGroupArn = dataset_group_arn\n",
    "    )\n",
    "    status = describe_dataset_group_response[\"datasetGroup\"][\"status\"]\n",
    "    print(\"DatasetGroup: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you have a dataset group, you can create a dataset for the interaction data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create datasets\n",
    "\n",
    "### Interactions Dataset\n",
    "\n",
    "First, define a schema to tell Amazon Personalize what type of dataset you are uploading. There are several reserved and mandatory keywords required in the schema, based on the type of dataset. More detailed information can be found in the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html).\n",
    "\n",
    "Here, you will create a schema for interactions data, which needs the `USER_ID`, `ITEM_ID`, `TIMESTAMP`, `CABIN_TYPE`, `EVENT_TYPE`, `EVENT_VALUE`, and `TIMESTAMP` fields. These must be defined in the same order in the schema as they appear in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "schema_name=\"airlines-interaction-schema-\"+suffix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"schemaArn\": \"arn:aws:personalize:us-east-1:144386903708:schema/airlines-interaction-schema-55035\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"4e045a61-d479-485c-93ff-6072076ccaa9\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 15 Jun 2020 21:10:34 GMT\",\n",
      "      \"x-amzn-requestid\": \"4e045a61-d479-485c-93ff-6072076ccaa9\",\n",
      "      \"content-length\": \"99\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "schema = {\n",
    "    \"type\": \"record\",\n",
    "    \"name\": \"Interactions\",\n",
    "    \"namespace\": \"com.amazonaws.personalize.schema\",\n",
    "    \"fields\": [\n",
    "        {\n",
    "            \"name\": \"ITEM_ID\",\n",
    "            \"type\": \"string\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"USER_ID\",\n",
    "            \"type\": \"string\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"TIMESTAMP\",\n",
    "            \"type\": \"long\"\n",
    "        },\n",
    "        {\n",
    "            \"name\":\"CABIN_TYPE\",\n",
    "            \"type\": \"string\",\n",
    "            \"categorical\": True\n",
    "        },\n",
    "        {\n",
    "          \"name\": \"EVENT_TYPE\",\n",
    "          \"type\": \"string\"\n",
    "        },\n",
    "        {\n",
    "          \"name\": \"EVENT_VALUE\",\n",
    "          \"type\": \"float\"\n",
    "        }\n",
    "    ],\n",
    "    \"version\": \"1.0\"\n",
    "}\n",
    "\n",
    "create_schema_response = personalize.create_schema(\n",
    "    name = schema_name,\n",
    "    schema = json.dumps(schema)\n",
    ")\n",
    "\n",
    "schema_arn = create_schema_response['schemaArn']\n",
    "print(json.dumps(create_schema_response, indent=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetArn\": \"arn:aws:personalize:us-east-1:144386903708:dataset/airlines-dataset-group-55035/INTERACTIONS\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"968e6cac-310a-4889-8243-e86ef90696ed\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 15 Jun 2020 21:15:14 GMT\",\n",
      "      \"x-amzn-requestid\": \"968e6cac-310a-4889-8243-e86ef90696ed\",\n",
      "      \"content-length\": \"109\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "dataset_type = \"INTERACTIONS\"\n",
    "create_dataset_response = personalize.create_dataset(\n",
    "    datasetType = dataset_type,\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    schemaArn = schema_arn,\n",
    "    name = \"airlines-dataset-interactions-\" + suffix\n",
    ")\n",
    "\n",
    "interactions_dataset_arn = create_dataset_response['datasetArn']\n",
    "print(json.dumps(create_dataset_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Users Dataset\n",
    "\n",
    "Here, you will create a schema for the users data, which needs the `USER_ID`, and `NATIONALITY` fields. These must be defined in the same order in the schema as they appear in the dataset.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "metadata_schema_name=\"airlines-users-schema-\"+suffix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"schemaArn\": \"arn:aws:personalize:us-east-1:144386903708:schema/airlines-users-schema-55035\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"17844e2f-860d-484a-bb39-ab4a10e7b9fd\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 15 Jun 2020 21:13:50 GMT\",\n",
      "      \"x-amzn-requestid\": \"17844e2f-860d-484a-bb39-ab4a10e7b9fd\",\n",
      "      \"content-length\": \"93\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "metadata_schema = {\n",
    "    \"type\": \"record\",\n",
    "    \"name\": \"Users\",\n",
    "    \"namespace\": \"com.amazonaws.personalize.schema\",\n",
    "    \"fields\": [\n",
    "        {\n",
    "            \"name\": \"USER_ID\",\n",
    "            \"type\": \"string\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"NATIONALITY\",\n",
    "            \"type\": \"string\",\n",
    "            \"categorical\": True\n",
    "        }\n",
    "    ],\n",
    "    \"version\": \"1.0\"\n",
    "}\n",
    "\n",
    "create_metadata_schema_response = personalize.create_schema(\n",
    "    name = metadata_schema_name,\n",
    "    schema = json.dumps(metadata_schema)\n",
    ")\n",
    "\n",
    "metadata_schema_arn = create_metadata_schema_response['schemaArn']\n",
    "print(json.dumps(create_metadata_schema_response, indent=2))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_type = \"USERS\"\n",
    "create_metadata_dataset_response = personalize.create_dataset(\n",
    "    datasetType = dataset_type,\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    schemaArn = metadata_schema_arn,\n",
    "    name = \"airlines-metadata-dataset-users-\" + suffix\n",
    ")\n",
    "\n",
    "metadata_dataset_arn = create_metadata_dataset_response['datasetArn']\n",
    "print(json.dumps(create_metadata_dataset_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure an S3 bucket and an IAM  role <a class=\"anchor\" id=\"bucket_role\"></a>\n",
    "\n",
    "So far, we have downloaded, manipulated, and saved the data onto the Amazon EBS instance attached to instance running this Jupyter notebook. However, Amazon Personalize will need an S3 bucket to act as the source of your data, as well as IAM roles for accessing that bucket. Let's set all of that up.\n",
    "\n",
    "Use the metadata stored on the instance underlying this Amazon SageMaker notebook, to determine the region it is operating in. If you are using a Jupyter notebook outside of Amazon SageMaker, simply define the region as a string below. The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources we have been creating so far."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set the S3 bucket policy\n",
    "Amazon Personalize needs to be able to read the contents of your S3 bucket. So add a bucket policy which allows that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = boto3.client(\"s3\")\n",
    "\n",
    "policy = {\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Id\": \"PersonalizeS3BucketAccessPolicy\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "            \"Sid\": \"PersonalizeS3BucketAccessPolicy\",\n",
    "            \"Effect\": \"Allow\",\n",
    "            \"Principal\": {\n",
    "                \"Service\": \"personalize.amazonaws.com\"\n",
    "            },\n",
    "            \"Action\": [\n",
    "                \"s3:GetObject\",\n",
    "                \"s3:ListBucket\"\n",
    "            ],\n",
    "            \"Resource\": [\n",
    "                \"arn:aws:s3:::{}\".format(bucket_name),\n",
    "                \"arn:aws:s3:::{}/*\".format(bucket_name)\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an IAM role\n",
    "\n",
    "Amazon Personalize needs the ability to assume roles in AWS in order to have the permissions to execute certain tasks. Let's create an IAM role and attach the required policies to it. The code below attaches very permissive policies; please use more restrictive policies for any production application."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "arn:aws:iam::144386903708:role/PersonalizeS3Role-55035\n"
     ]
    }
   ],
   "source": [
    "iam = boto3.client(\"iam\")\n",
    "\n",
    "role_name = \"PersonalizeS3Role-\"+suffix\n",
    "assume_role_policy_document = {\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "          \"Effect\": \"Allow\",\n",
    "          \"Principal\": {\n",
    "            \"Service\": \"personalize.amazonaws.com\"\n",
    "          },\n",
    "          \"Action\": \"sts:AssumeRole\"\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "try:\n",
    "    create_role_response = iam.create_role(\n",
    "        RoleName = role_name,\n",
    "        AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)\n",
    "    );\n",
    "\n",
    "    iam.attach_role_policy(\n",
    "        RoleName = role_name,\n",
    "        PolicyArn = \"arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess\"\n",
    "    );\n",
    "\n",
    "    role_arn = create_role_response[\"Role\"][\"Arn\"]\n",
    "except ClientError as e:\n",
    "    if e.response['Error']['Code'] == 'EntityAlreadyExists':\n",
    "        role_arn = iam.get_role(RoleName=role_name)['Role']['Arn']\n",
    "    else:\n",
    "        raise\n",
    "        \n",
    "# sometimes need to wait a bit for the role to be created\n",
    "time.sleep(45)\n",
    "print(role_arn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create your Dataset import jobs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import the interactions data <a class=\"anchor\" id=\"import\"></a>\n",
    "\n",
    "Earlier you created the dataset group and dataset to house your information, so now you will execute an import job that will load the data from the S3 bucket into the Amazon Personalize dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetImportJobArn\": \"arn:aws:personalize:us-east-1:144386903708:dataset-import-job/airlines-dataset-import-job-14078\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"d118cf11-b568-4767-99d5-30a15871981a\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 15 Jun 2020 21:46:38 GMT\",\n",
      "      \"x-amzn-requestid\": \"d118cf11-b568-4767-99d5-30a15871981a\",\n",
      "      \"content-length\": \"121\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "create_dataset_import_job_response = personalize.create_dataset_import_job(\n",
    "    jobName = \"airlines-dataset-import-job-\"+suffix,\n",
    "    datasetArn = interactions_dataset_arn,\n",
    "    dataSource = {\n",
    "        \"dataLocation\": \"s3://{}/{}\".format(bucket_name, interactions_filename)\n",
    "    },\n",
    "    roleArn = role_arn\n",
    ")\n",
    "\n",
    "dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']\n",
    "print(json.dumps(create_dataset_import_job_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import the Users data <a class=\"anchor\" id=\"import\"></a>\n",
    "\n",
    "Earlier you created the dataset group and dataset to house your information, so now you will execute an import job that will load the data from the S3 bucket into the Amazon Personalize dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetImportJobArn\": \"arn:aws:personalize:us-east-1:144386903708:dataset-import-job/airlines-users-metadata-dataset-import-job-14078\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"679d9401-568d-45e0-ba8c-df8f1574228d\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 15 Jun 2020 21:46:40 GMT\",\n",
      "      \"x-amzn-requestid\": \"679d9401-568d-45e0-ba8c-df8f1574228d\",\n",
      "      \"content-length\": \"136\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "create_metadata_dataset_import_job_response = personalize.create_dataset_import_job(\n",
    "    jobName = \"airlines-users-metadata-dataset-import-job-\"+suffix,\n",
    "    datasetArn = metadata_dataset_arn,\n",
    "    dataSource = {\n",
    "        \"dataLocation\": \"s3://{}/{}\".format(bucket_name, user_metadata_file)\n",
    "    },\n",
    "    roleArn = role_arn\n",
    ")\n",
    "\n",
    "metadata_dataset_import_job_arn = create_metadata_dataset_import_job_response['datasetImportJobArn']\n",
    "print(json.dumps(create_metadata_dataset_import_job_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wait for the Dataset Import Jobs to have ACTIVE Status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we can use the dataset, the import job must be active. Execute the cell below and wait for it to show the ACTIVE status. It checks the status of the import job every second, up to a maximum of 3 hours.\n",
    "\n",
    "Importing the data can take some time, depending on the size of the dataset. In this demo, the data import job has been pre executed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DatasetImportJob: CREATE PENDING\n",
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: ACTIVE\n"
     ]
    }
   ],
   "source": [
    "status = None\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
    "        datasetImportJobArn = dataset_import_job_arn\n",
    "    )\n",
    "    \n",
    "    dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n",
    "    if \"latestDatasetImportJobRun\" not in dataset_import_job:\n",
    "        status = dataset_import_job[\"status\"]\n",
    "        print(\"DatasetImportJob: {}\".format(status))\n",
    "    else:\n",
    "        status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n",
    "        print(\"LatestDatasetImportJobRun: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: CREATE IN_PROGRESS\n",
      "DatasetImportJob: ACTIVE\n"
     ]
    }
   ],
   "source": [
    "status = None\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
    "        datasetImportJobArn = metadata_dataset_import_job_arn\n",
    "    )\n",
    "    \n",
    "    dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n",
    "    if \"latestDatasetImportJobRun\" not in dataset_import_job:\n",
    "        status = dataset_import_job[\"status\"]\n",
    "        print(\"DatasetImportJob: {}\".format(status))\n",
    "    else:\n",
    "        status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n",
    "        print(\"LatestDatasetImportJobRun: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When the dataset import is active, you are ready to start building models with the AWS User Personalization recipe."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create solutions <a class=\"anchor\" id=\"solutions\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "In this notebook, we will create a solution with the following recipe:\n",
    "\n",
    "1. aws-user-personalization\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In Amazon Personalize, a specific variation of an algorithm is called a recipe. Different recipes are suitable for different situations. A trained model is called a solution, and each solution can have many versions that relate to a given volume of data when the model was trained.\n",
    "\n",
    "To start, we will list all the recipes that are supported. This will allow you to select one and use that to build your model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "arn:aws:personalize:::recipe/aws-hrnn\n",
      "arn:aws:personalize:::recipe/aws-hrnn-coldstart\n",
      "arn:aws:personalize:::recipe/aws-hrnn-metadata\n",
      "arn:aws:personalize:::recipe/aws-personalized-ranking\n",
      "arn:aws:personalize:::recipe/aws-popularity-count\n",
      "arn:aws:personalize:::recipe/aws-sims\n",
      "arn:aws:personalize:::recipe/aws-user-personalization\n"
     ]
    }
   ],
   "source": [
    "recipe_list = personalize.list_recipes()\n",
    "for recipe in recipe_list['recipes']:\n",
    "    print(recipe['recipeArn'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output is just a JSON representation of all of the algorithms mentioned in the introduction.\n",
    "\n",
    "Next we will select specific recipes and build models with them.\n",
    "\n",
    "### AWS User Personalization\n",
    "\n",
    "AWS User Personalization is one of the more advanced recommendation models that you can use and it allows for real-time updates of recommendations based on user behavior. It also tends to outperform other approaches, like collaborative filtering. This recipe takes the longest to train, so let's start with this recipe first.\n",
    "\n",
    "For our use case, using the Airlines reviews data, we can use the AWS User Personalization to recommend airlines to a user based on the user's previous artist tagging behavior. Remember, we used the tagging data to represent positive interactions between a user and an artist.\n",
    "\n",
    "First, select the recipe by finding the ARN in the list of recipes above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {},
   "outputs": [],
   "source": [
    "recipe_arn = \"arn:aws:personalize:::recipe/aws-user-personalization\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create the solution\n",
    "\n",
    "First you create a solution using the recipe. Although you provide the dataset ARN in this step, the model is not yet trained. See this as an identifier instead of a trained model.\n",
    "\n",
    "Note that we have HPO activated here. This is a good idea when \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "create_solution_response = personalize.create_solution(\n",
    "    name = \"airlines-user-personalization-solution-HPO-\"+suffix,\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    recipeArn = recipe_arn,\n",
    "    performHPO=True\n",
    ")\n",
    "\n",
    "solution_arn = create_solution_response['solutionArn']\n",
    "print(json.dumps(create_solution_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create the solution version\n",
    "\n",
    "Once you have a solution, you need to create a version in order to complete the model training. The training can take a while to complete, upwards of 25 minutes, and an average of 40 minutes for this recipe with our dataset. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create many models and deploy them quickly. So we will set up the while loop for all of the solutions further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"solutionVersionArn\": \"arn:aws:personalize:us-east-1:144386903708:solution/airlines-hrnn-metadata-solution-HPO-14078/54a6c563\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"148a0fbd-5465-4619-ac90-449c0ef23b73\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 15 Jun 2020 22:01:32 GMT\",\n",
      "      \"x-amzn-requestid\": \"148a0fbd-5465-4619-ac90-449c0ef23b73\",\n",
      "      \"content-length\": \"127\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "create_solution_version_response = personalize.create_solution_version(\n",
    "    solutionArn = solution_arn\n",
    ")\n",
    "\n",
    "solution_version_arn = create_solution_version_response['solutionVersionArn']\n",
    "print(json.dumps(create_solution_version_response, indent=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SolutionVersion: CREATE PENDING\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: CREATE IN_PROGRESS\n",
      "SolutionVersion: ACTIVE\n"
     ]
    }
   ],
   "source": [
    "status = None\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_solution_version_response = personalize.describe_solution_version(\n",
    "        solutionVersionArn = solution_version_arn\n",
    "    )\n",
    "    status = describe_solution_version_response[\"solutionVersion\"][\"status\"]\n",
    "    print(\"SolutionVersion: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate solution versions <a class=\"anchor\" id=\"eval\"></a>\n",
    "[Back to top](#top)\n",
    "\n",
    "It should not take more than an hour to train all the solutions from this notebook. While training is in progress, we recommend taking the time to read up on the various algorithms (recipes) and their behavior in detail. This is also a good time to consider alternatives to how the data was fed into the system and what kind of results you expect to see.\n",
    "\n",
    "When the solutions finish creating, the next step is to obtain the evaluation metrics. Personalize calculates these metrics based on a subset of the training data. The image below illustrates how Personalize splits the data. Given 10 users, with 10 interactions each (a circle represents an interaction), the interactions are ordered from oldest to newest based on the timestamp. Personalize uses all of the interaction data from 90% of the users (blue circles) to train the solution version, and the remaining 10% for evaluation. For each of the users in the remaining 10%, 90% of their interaction data (green circles) is used as input for the call to the trained model. The remaining 10% of their data (orange circle) is compared to the output produced by the model and used to calculate the evaluation metrics.\n",
    "\n",
    "![personalize metrics](static/imgs/personalize_metrics.png)\n",
    "\n",
    "We recommend reading [the documentation](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html) to understand the metrics, but we have also copied parts of the documentation below for convenience.\n",
    "\n",
    "You need to understand the following terms regarding evaluation in Personalize:\n",
    "\n",
    "* *Relevant recommendation* refers to a recommendation that matches a value in the testing data for the particular user.\n",
    "* *Rank* refers to the position of a recommended item in the list of recommendations. Position 1 (the top of the list) is presumed to be the most relevant to the user.\n",
    "* *Query* refers to the internal equivalent of a GetRecommendations call.\n",
    "\n",
    "The metrics produced by Personalize are:\n",
    "- **coverage**: The proportion of unique recommended items from all queries out of the total number of unique items in the training data (includes both the Items and Interactions datasets).\n",
    "- **mean_reciprocal_rank_at_25**: The [mean of the reciprocal ranks](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) of the first relevant recommendation out of the top 25 recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation.\n",
    "- **normalized_discounted_cumulative_gain_at_K**: Discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the [cumulative discounted gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) (DCG) at K, each relevant discounted recommendation in the top K recommendations is summed together. The normalized discounted cumulative gain (NDCG) is the DCG divided by the ideal DCG such that NDCG is between 0 - 1. (The ideal DCG is where the top K recommendations are sorted by relevance.) Amazon Personalize uses a weighting factor of 1/log(1 + position), where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.\n",
    "- **precision_at_K**: The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items.\n",
    "\n",
    "Let's take a look at the evaluation metrics for each of the solutions produced in this notebook. *Please note, your results might differ from the results described in the text of this notebook, due to the quality of the LastFM dataset.* \n",
    "\n",
    "### AWS User Personalizatioin metrics\n",
    "\n",
    "First, retrieve the evaluation metrics for the AWS User Personalization solution version."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"solutionVersionArn\": \"arn:aws:personalize:us-east-1:144386903708:solution/airlines-hrnn-metadata-solution-HPO-14078/54a6c563\",\n",
      "  \"metrics\": {\n",
      "    \"coverage\": 0.4046,\n",
      "    \"mean_reciprocal_rank_at_25\": 0.2035,\n",
      "    \"normalized_discounted_cumulative_gain_at_10\": 0.2909,\n",
      "    \"normalized_discounted_cumulative_gain_at_25\": 0.3174,\n",
      "    \"normalized_discounted_cumulative_gain_at_5\": 0.2418,\n",
      "    \"precision_at_10\": 0.0444,\n",
      "    \"precision_at_25\": 0.022,\n",
      "    \"precision_at_5\": 0.0605\n",
      "  },\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"156a6e70-152b-4940-8b17-048252419fd0\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 15 Jun 2020 23:02:24 GMT\",\n",
      "      \"x-amzn-requestid\": \"156a6e70-152b-4940-8b17-048252419fd0\",\n",
      "      \"content-length\": \"424\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "get_solution_metrics_response = personalize.get_solution_metrics(\n",
    "    solutionVersionArn = solution_version_arn\n",
    ")\n",
    "\n",
    "print(json.dumps(get_solution_metrics_response, indent=2))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create a campaign from the solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create campaigns <a class=\"anchor\" id=\"create\"></a>\n",
    "\n",
    "A campaign is a hosted solution version; an endpoint which you can query for recommendations. Pricing is set by estimating throughput capacity (requests from users for personalization per second). When deploying a campaign, you set a minimum throughput per second (TPS) value. This service, like many within AWS, will automatically scale based on demand, but if latency is critical, you may want to provision ahead for larger demand. For this POC and demo, all minimum throughput thresholds are set to 1. For more information, see the [pricing page](https://aws.amazon.com/personalize/pricing/).\n",
    "\n",
    "Let's start deploying the campaigns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### AWS User Personalization\n",
    "\n",
    "Deploy a campaign for your AWS User Personalization solution version. It can take around 10 minutes to deploy a campaign. Normally, we would use a while loop to poll until the task is completed. However the task would block other cells from executing, and the goal here is to create multiple campaigns. So we will set up the while loop for all of the campaigns further down in the notebook. There, you will also find instructions for viewing the progress in the AWS console."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"campaignArn\": \"arn:aws:personalize:us-east-1:144386903708:campaign/airlines-metadata-campaign-14078\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"4c882630-86aa-4c5d-accd-75225f9804a4\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 15 Jun 2020 23:25:37 GMT\",\n",
      "      \"x-amzn-requestid\": \"4c882630-86aa-4c5d-accd-75225f9804a4\",\n",
      "      \"content-length\": \"102\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "create_campaign_response = personalize.create_campaign(\n",
    "    name = \"airlines-metadata-campaign-\"+suffix,\n",
    "    solutionVersionArn = solution_version_arn,\n",
    "    minProvisionedTPS = 2,    \n",
    ")\n",
    "\n",
    "campaign_arn = create_campaign_response['campaignArn']\n",
    "print(json.dumps(create_campaign_response, indent=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Campaign: CREATE PENDING\n",
      "Campaign: CREATE IN_PROGRESS\n",
      "Campaign: CREATE IN_PROGRESS\n",
      "Campaign: CREATE IN_PROGRESS\n",
      "Campaign: CREATE IN_PROGRESS\n",
      "Campaign: CREATE IN_PROGRESS\n",
      "Campaign: CREATE IN_PROGRESS\n",
      "Campaign: CREATE IN_PROGRESS\n",
      "Campaign: ACTIVE\n"
     ]
    }
   ],
   "source": [
    "status = None\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_campaign_response = personalize.describe_campaign(\n",
    "        campaignArn = campaign_arn\n",
    "    )\n",
    "    status = describe_campaign_response[\"campaign\"][\"status\"]\n",
    "    print(\"Campaign: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### AWS User Personalization\n",
    "\n",
    "AWS User Personalization is one of the more advanced algorithms provided by Amazon Personalize. It supports personalization of the items for a specific user based on their past behavior and can intake real time events in order to alter recommendations for a user without retraining. \n",
    "\n",
    "Since the AWS User Personalization algorithm relies on having a sampling of users, let's load the data we need for that and select 3 random users."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 262,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>USER_ID</th>\n",
       "      <th>NATIONALITY</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3918</th>\n",
       "      <td>JHartley</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27477</th>\n",
       "      <td>DDriscoll</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19563</th>\n",
       "      <td>BrianElliott</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22989</th>\n",
       "      <td>AHornbuckle</td>\n",
       "      <td>Australia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35724</th>\n",
       "      <td>CMoon</td>\n",
       "      <td>United Kingdom</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            USER_ID     NATIONALITY\n",
       "3918       JHartley  United Kingdom\n",
       "27477     DDriscoll  United Kingdom\n",
       "19563  BrianElliott  United Kingdom\n",
       "22989   AHornbuckle       Australia\n",
       "35724         CMoon  United Kingdom"
      ]
     },
     "execution_count": 262,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users_df = pd.read_csv(data_dir + '/a_users.csv')\n",
    "# Render some sample data\n",
    "users_df.sample(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we render the recommendations for our 3 random users from above. After that, we will explore real-time interactions before moving on to Personalized Ranking.\n",
    "\n",
    "Again, we create a helper function to render the results in a nice dataframe.\n",
    "\n",
    "#### API call results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Update DF rendering\n",
    "pd.set_option('display.max_rows', 30)\n",
    "\n",
    "def get_new_recommendations_df_users(recommendations_df, user_id):\n",
    "    \n",
    "#   Context Recommendations\n",
    "    context_options = ['None','Economy', 'Business Class','Premium Economy', 'First Class']\n",
    "    \n",
    "    for context in context_options:\n",
    "        # Get the recommendations\n",
    "        if context=='none':\n",
    "            get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "                campaignArn = campaign_arn,\n",
    "                userId = str(user_id),\n",
    "            )\n",
    "        else:\n",
    "            get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "                campaignArn = campaign_arn,\n",
    "                userId = str(user_id),\n",
    "                context = {\n",
    "                  'CABIN_TYPE': context\n",
    "                }\n",
    "            )\n",
    "        # Build a new dataframe of recommendations\n",
    "        item_list = get_recommendations_response['itemList']\n",
    "        recommendation_list = []\n",
    "        for item in item_list:\n",
    "            recommendation_list.append(item['itemId'])\n",
    "    #     print(recommendation_list)\n",
    "        new_rec_DF = pd.DataFrame(recommendation_list, columns = [context])\n",
    "        # Add this dataframe to the old one\n",
    "        recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
    "    return recommendations_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 263,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      USER_ID     NATIONALITY\n",
      "37013    RDow  United Kingdom\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>None</th>\n",
       "      <th>Economy</th>\n",
       "      <th>Business Class</th>\n",
       "      <th>Premium Economy</th>\n",
       "      <th>First Class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>british-airways</td>\n",
       "      <td>thomson-airways</td>\n",
       "      <td>british-airways</td>\n",
       "      <td>thomson-airways</td>\n",
       "      <td>united-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>united-airlines</td>\n",
       "      <td>thomas-cook-airlines</td>\n",
       "      <td>virgin-atlantic-airways</td>\n",
       "      <td>virgin-atlantic-airways</td>\n",
       "      <td>british-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>thomson-airways</td>\n",
       "      <td>united-airlines</td>\n",
       "      <td>turkish-airlines</td>\n",
       "      <td>air-new-zealand</td>\n",
       "      <td>american-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>virgin-atlantic-airways</td>\n",
       "      <td>easyjet</td>\n",
       "      <td>united-airlines</td>\n",
       "      <td>british-airways</td>\n",
       "      <td>china-southern-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>air-new-zealand</td>\n",
       "      <td>monarch-airlines</td>\n",
       "      <td>china-southern-airlines</td>\n",
       "      <td>united-airlines</td>\n",
       "      <td>delta-air-lines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>turkish-airlines</td>\n",
       "      <td>virgin-atlantic-airways</td>\n",
       "      <td>qatar-airways</td>\n",
       "      <td>thomas-cook-airlines</td>\n",
       "      <td>lufthansa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>china-southern-airlines</td>\n",
       "      <td>british-airways</td>\n",
       "      <td>emirates</td>\n",
       "      <td>turkish-airlines</td>\n",
       "      <td>alaska-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>thomas-cook-airlines</td>\n",
       "      <td>jet2-com</td>\n",
       "      <td>air-france</td>\n",
       "      <td>monarch-airlines</td>\n",
       "      <td>virgin-atlantic-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>lufthansa</td>\n",
       "      <td>lufthansa</td>\n",
       "      <td>american-airlines</td>\n",
       "      <td>china-southern-airlines</td>\n",
       "      <td>us-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>american-airlines</td>\n",
       "      <td>american-airlines</td>\n",
       "      <td>lufthansa</td>\n",
       "      <td>eva-air</td>\n",
       "      <td>thomson-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>delta-air-lines</td>\n",
       "      <td>flybe</td>\n",
       "      <td>etihad-airways</td>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "      <td>air-new-zealand</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>air-france</td>\n",
       "      <td>norwegian</td>\n",
       "      <td>air-new-zealand</td>\n",
       "      <td>air-france</td>\n",
       "      <td>turkish-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>monarch-airlines</td>\n",
       "      <td>turkish-airlines</td>\n",
       "      <td>klm-royal-dutch-airlines</td>\n",
       "      <td>lufthansa</td>\n",
       "      <td>emirates</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>klm-royal-dutch-airlines</td>\n",
       "      <td>ryanair</td>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "      <td>klm-royal-dutch-airlines</td>\n",
       "      <td>etihad-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>us-airways</td>\n",
       "      <td>aer-lingus</td>\n",
       "      <td>thai-airways</td>\n",
       "      <td>air-transat</td>\n",
       "      <td>virgin-america</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>eva-air</td>\n",
       "      <td>china-southern-airlines</td>\n",
       "      <td>eva-air</td>\n",
       "      <td>delta-air-lines</td>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "      <td>air-new-zealand</td>\n",
       "      <td>asiana-airlines</td>\n",
       "      <td>sas-scandinavian-airlines</td>\n",
       "      <td>qantas-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>air-transat</td>\n",
       "      <td>tap-portugal</td>\n",
       "      <td>air-canada</td>\n",
       "      <td>american-airlines</td>\n",
       "      <td>qatar-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>sas-scandinavian-airlines</td>\n",
       "      <td>delta-air-lines</td>\n",
       "      <td>air-transat</td>\n",
       "      <td>icelandair</td>\n",
       "      <td>thai-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>icelandair</td>\n",
       "      <td>klm-royal-dutch-airlines</td>\n",
       "      <td>delta-air-lines</td>\n",
       "      <td>vietnam-airlines</td>\n",
       "      <td>asiana-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>easyjet</td>\n",
       "      <td>us-airways</td>\n",
       "      <td>vietnam-airlines</td>\n",
       "      <td>norwegian</td>\n",
       "      <td>thomas-cook-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>norwegian</td>\n",
       "      <td>air-transat</td>\n",
       "      <td>thomson-airways</td>\n",
       "      <td>us-airways</td>\n",
       "      <td>aer-lingus</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>vietnam-airlines</td>\n",
       "      <td>icelandair</td>\n",
       "      <td>icelandair</td>\n",
       "      <td>air-canada-rouge</td>\n",
       "      <td>hawaiian-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>etihad-airways</td>\n",
       "      <td>air-france</td>\n",
       "      <td>tap-portugal</td>\n",
       "      <td>flybe</td>\n",
       "      <td>air-canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>air-canada</td>\n",
       "      <td>cityjet</td>\n",
       "      <td>aer-lingus</td>\n",
       "      <td>qantas-airways</td>\n",
       "      <td>air-france</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         None                   Economy  \\\n",
       "0             british-airways           thomson-airways   \n",
       "1             united-airlines      thomas-cook-airlines   \n",
       "2             thomson-airways           united-airlines   \n",
       "3     virgin-atlantic-airways                   easyjet   \n",
       "4             air-new-zealand          monarch-airlines   \n",
       "5            turkish-airlines   virgin-atlantic-airways   \n",
       "6     china-southern-airlines           british-airways   \n",
       "7        thomas-cook-airlines                  jet2-com   \n",
       "8                   lufthansa                 lufthansa   \n",
       "9           american-airlines         american-airlines   \n",
       "10            delta-air-lines                     flybe   \n",
       "11                 air-france                 norwegian   \n",
       "12           monarch-airlines          turkish-airlines   \n",
       "13   klm-royal-dutch-airlines                   ryanair   \n",
       "14                 us-airways                aer-lingus   \n",
       "15                    eva-air   china-southern-airlines   \n",
       "16     cathay-pacific-airways           air-new-zealand   \n",
       "17                air-transat              tap-portugal   \n",
       "18  sas-scandinavian-airlines           delta-air-lines   \n",
       "19                 icelandair  klm-royal-dutch-airlines   \n",
       "20                    easyjet                us-airways   \n",
       "21                  norwegian               air-transat   \n",
       "22           vietnam-airlines                icelandair   \n",
       "23             etihad-airways                air-france   \n",
       "24                 air-canada                   cityjet   \n",
       "\n",
       "              Business Class            Premium Economy  \\\n",
       "0            british-airways            thomson-airways   \n",
       "1    virgin-atlantic-airways    virgin-atlantic-airways   \n",
       "2           turkish-airlines            air-new-zealand   \n",
       "3            united-airlines            british-airways   \n",
       "4    china-southern-airlines            united-airlines   \n",
       "5              qatar-airways       thomas-cook-airlines   \n",
       "6                   emirates           turkish-airlines   \n",
       "7                 air-france           monarch-airlines   \n",
       "8          american-airlines    china-southern-airlines   \n",
       "9                  lufthansa                    eva-air   \n",
       "10            etihad-airways     cathay-pacific-airways   \n",
       "11           air-new-zealand                 air-france   \n",
       "12  klm-royal-dutch-airlines                  lufthansa   \n",
       "13    cathay-pacific-airways   klm-royal-dutch-airlines   \n",
       "14              thai-airways                air-transat   \n",
       "15                   eva-air            delta-air-lines   \n",
       "16           asiana-airlines  sas-scandinavian-airlines   \n",
       "17                air-canada          american-airlines   \n",
       "18               air-transat                 icelandair   \n",
       "19           delta-air-lines           vietnam-airlines   \n",
       "20          vietnam-airlines                  norwegian   \n",
       "21           thomson-airways                 us-airways   \n",
       "22                icelandair           air-canada-rouge   \n",
       "23              tap-portugal                      flybe   \n",
       "24                aer-lingus             qantas-airways   \n",
       "\n",
       "                First Class  \n",
       "0           united-airlines  \n",
       "1           british-airways  \n",
       "2         american-airlines  \n",
       "3   china-southern-airlines  \n",
       "4           delta-air-lines  \n",
       "5                 lufthansa  \n",
       "6           alaska-airlines  \n",
       "7   virgin-atlantic-airways  \n",
       "8                us-airways  \n",
       "9           thomson-airways  \n",
       "10          air-new-zealand  \n",
       "11         turkish-airlines  \n",
       "12                 emirates  \n",
       "13           etihad-airways  \n",
       "14           virgin-america  \n",
       "15   cathay-pacific-airways  \n",
       "16           qantas-airways  \n",
       "17            qatar-airways  \n",
       "18             thai-airways  \n",
       "19          asiana-airlines  \n",
       "20     thomas-cook-airlines  \n",
       "21               aer-lingus  \n",
       "22        hawaiian-airlines  \n",
       "23               air-canada  \n",
       "24               air-france  "
      ]
     },
     "execution_count": 263,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "recommendations_df_users = pd.DataFrame()\n",
    "users = users_df.sample()\n",
    "print(users)\n",
    "users= users['USER_ID'].tolist()\n",
    "for user in users:\n",
    "    recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)\n",
    "\n",
    "recommendations_df_users"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 264,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      USER_ID NATIONALITY\n",
      "26198   CJeff   Singapore\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>None</th>\n",
       "      <th>Economy</th>\n",
       "      <th>Business Class</th>\n",
       "      <th>Premium Economy</th>\n",
       "      <th>First Class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>philippine-airlines</td>\n",
       "      <td>philippine-airlines</td>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "      <td>singapore-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "      <td>singapore-airlines</td>\n",
       "      <td>philippine-airlines</td>\n",
       "      <td>air-france</td>\n",
       "      <td>thai-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>singapore-airlines</td>\n",
       "      <td>tigerair</td>\n",
       "      <td>malaysia-airlines</td>\n",
       "      <td>eva-air</td>\n",
       "      <td>philippine-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ana-all-nippon-airways</td>\n",
       "      <td>jetstar-asia</td>\n",
       "      <td>srilankan-airlines</td>\n",
       "      <td>air-new-zealand</td>\n",
       "      <td>delta-air-lines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>thai-airways</td>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "      <td>thai-airways</td>\n",
       "      <td>philippine-airlines</td>\n",
       "      <td>emirates</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>air-india</td>\n",
       "      <td>airasia</td>\n",
       "      <td>singapore-airlines</td>\n",
       "      <td>klm-royal-dutch-airlines</td>\n",
       "      <td>ana-all-nippon-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>china-eastern-airlines</td>\n",
       "      <td>cebu-pacific</td>\n",
       "      <td>air-india</td>\n",
       "      <td>airasia-x</td>\n",
       "      <td>china-southern-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>dragonair</td>\n",
       "      <td>scoot</td>\n",
       "      <td>emirates</td>\n",
       "      <td>qantas-airways</td>\n",
       "      <td>alaska-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>srilankan-airlines</td>\n",
       "      <td>ana-all-nippon-airways</td>\n",
       "      <td>ana-all-nippon-airways</td>\n",
       "      <td>china-southern-airlines</td>\n",
       "      <td>american-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>air-france</td>\n",
       "      <td>dragonair</td>\n",
       "      <td>asiana-airlines</td>\n",
       "      <td>united-airlines</td>\n",
       "      <td>china-eastern-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>jetstar-airways</td>\n",
       "      <td>jetstar-airways</td>\n",
       "      <td>qatar-airways</td>\n",
       "      <td>japan-airlines</td>\n",
       "      <td>united-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>malaysia-airlines</td>\n",
       "      <td>air-india</td>\n",
       "      <td>china-eastern-airlines</td>\n",
       "      <td>turkish-airlines</td>\n",
       "      <td>swiss-international-air-lines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>japan-airlines</td>\n",
       "      <td>china-eastern-airlines</td>\n",
       "      <td>eva-air</td>\n",
       "      <td>virgin-australia</td>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>air-new-zealand</td>\n",
       "      <td>malaysia-airlines</td>\n",
       "      <td>dragonair</td>\n",
       "      <td>dragonair</td>\n",
       "      <td>asiana-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>klm-royal-dutch-airlines</td>\n",
       "      <td>bangkok-airways</td>\n",
       "      <td>swiss-international-air-lines</td>\n",
       "      <td>thai-airways</td>\n",
       "      <td>garuda-indonesia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>china-southern-airlines</td>\n",
       "      <td>thai-airways</td>\n",
       "      <td>qantas-airways</td>\n",
       "      <td>ana-all-nippon-airways</td>\n",
       "      <td>air-india</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>asiana-airlines</td>\n",
       "      <td>silkair</td>\n",
       "      <td>korean-air</td>\n",
       "      <td>singapore-airlines</td>\n",
       "      <td>hawaiian-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>eva-air</td>\n",
       "      <td>srilankan-airlines</td>\n",
       "      <td>china-southern-airlines</td>\n",
       "      <td>vietnam-airlines</td>\n",
       "      <td>malaysia-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>scoot</td>\n",
       "      <td>airasia-x</td>\n",
       "      <td>garuda-indonesia</td>\n",
       "      <td>delta-air-lines</td>\n",
       "      <td>lufthansa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>delta-air-lines</td>\n",
       "      <td>jet-airways</td>\n",
       "      <td>air-france</td>\n",
       "      <td>virgin-atlantic-airways</td>\n",
       "      <td>virgin-america</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>jetstar-asia</td>\n",
       "      <td>korean-air</td>\n",
       "      <td>japan-airlines</td>\n",
       "      <td>air-india</td>\n",
       "      <td>jetstar-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>qantas-airways</td>\n",
       "      <td>south-african-airways</td>\n",
       "      <td>south-african-airways</td>\n",
       "      <td>scoot</td>\n",
       "      <td>qantas-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>airasia-x</td>\n",
       "      <td>hong-kong-airlines</td>\n",
       "      <td>royal-brunei-airlines</td>\n",
       "      <td>hong-kong-airlines</td>\n",
       "      <td>tigerair</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>korean-air</td>\n",
       "      <td>japan-airlines</td>\n",
       "      <td>finnair</td>\n",
       "      <td>china-eastern-airlines</td>\n",
       "      <td>qatar-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>emirates</td>\n",
       "      <td>malindo-air</td>\n",
       "      <td>china-airlines</td>\n",
       "      <td>air-china</td>\n",
       "      <td>air-china</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        None                 Economy  \\\n",
       "0        philippine-airlines     philippine-airlines   \n",
       "1     cathay-pacific-airways      singapore-airlines   \n",
       "2         singapore-airlines                tigerair   \n",
       "3     ana-all-nippon-airways            jetstar-asia   \n",
       "4               thai-airways  cathay-pacific-airways   \n",
       "5                  air-india                 airasia   \n",
       "6     china-eastern-airlines            cebu-pacific   \n",
       "7                  dragonair                   scoot   \n",
       "8         srilankan-airlines  ana-all-nippon-airways   \n",
       "9                 air-france               dragonair   \n",
       "10           jetstar-airways         jetstar-airways   \n",
       "11         malaysia-airlines               air-india   \n",
       "12            japan-airlines  china-eastern-airlines   \n",
       "13           air-new-zealand       malaysia-airlines   \n",
       "14  klm-royal-dutch-airlines         bangkok-airways   \n",
       "15   china-southern-airlines            thai-airways   \n",
       "16           asiana-airlines                 silkair   \n",
       "17                   eva-air      srilankan-airlines   \n",
       "18                     scoot               airasia-x   \n",
       "19           delta-air-lines             jet-airways   \n",
       "20              jetstar-asia              korean-air   \n",
       "21            qantas-airways   south-african-airways   \n",
       "22                 airasia-x      hong-kong-airlines   \n",
       "23                korean-air          japan-airlines   \n",
       "24                  emirates             malindo-air   \n",
       "\n",
       "                   Business Class           Premium Economy  \\\n",
       "0          cathay-pacific-airways    cathay-pacific-airways   \n",
       "1             philippine-airlines                air-france   \n",
       "2               malaysia-airlines                   eva-air   \n",
       "3              srilankan-airlines           air-new-zealand   \n",
       "4                    thai-airways       philippine-airlines   \n",
       "5              singapore-airlines  klm-royal-dutch-airlines   \n",
       "6                       air-india                 airasia-x   \n",
       "7                        emirates            qantas-airways   \n",
       "8          ana-all-nippon-airways   china-southern-airlines   \n",
       "9                 asiana-airlines           united-airlines   \n",
       "10                  qatar-airways            japan-airlines   \n",
       "11         china-eastern-airlines          turkish-airlines   \n",
       "12                        eva-air          virgin-australia   \n",
       "13                      dragonair                 dragonair   \n",
       "14  swiss-international-air-lines              thai-airways   \n",
       "15                 qantas-airways    ana-all-nippon-airways   \n",
       "16                     korean-air        singapore-airlines   \n",
       "17        china-southern-airlines          vietnam-airlines   \n",
       "18               garuda-indonesia           delta-air-lines   \n",
       "19                     air-france   virgin-atlantic-airways   \n",
       "20                 japan-airlines                 air-india   \n",
       "21          south-african-airways                     scoot   \n",
       "22          royal-brunei-airlines        hong-kong-airlines   \n",
       "23                        finnair    china-eastern-airlines   \n",
       "24                 china-airlines                 air-china   \n",
       "\n",
       "                      First Class  \n",
       "0              singapore-airlines  \n",
       "1                    thai-airways  \n",
       "2             philippine-airlines  \n",
       "3                 delta-air-lines  \n",
       "4                        emirates  \n",
       "5          ana-all-nippon-airways  \n",
       "6         china-southern-airlines  \n",
       "7                 alaska-airlines  \n",
       "8               american-airlines  \n",
       "9          china-eastern-airlines  \n",
       "10                united-airlines  \n",
       "11  swiss-international-air-lines  \n",
       "12         cathay-pacific-airways  \n",
       "13                asiana-airlines  \n",
       "14               garuda-indonesia  \n",
       "15                      air-india  \n",
       "16              hawaiian-airlines  \n",
       "17              malaysia-airlines  \n",
       "18                      lufthansa  \n",
       "19                 virgin-america  \n",
       "20                jetstar-airways  \n",
       "21                 qantas-airways  \n",
       "22                       tigerair  \n",
       "23                  qatar-airways  \n",
       "24                      air-china  "
      ]
     },
     "execution_count": 264,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "recommendations_df_users = pd.DataFrame()\n",
    "users = users_df.sample()\n",
    "print(users)\n",
    "users= users['USER_ID'].tolist()\n",
    "for user in users:\n",
    "    recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)\n",
    "\n",
    "recommendations_df_users"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we clearly see that the recommendations for each user are different. If you were to need a cache for these results, you could start by running the API calls through all your users and store the results, or you could use a batch export, which will be covered later in this notebook.\n",
    "\n",
    "The next topic is real-time events. Personalize has the ability to listen to events from your application in order to update the recommendations shown to the user. This is especially useful in media workloads, like video-on-demand, where a customer's intent may differ based on if they are watching with their children or on their own.\n",
    "\n",
    "Additionally the events that are recorded via this system are stored until a delete call from you is issued, and they are used as historical data alongside the other interaction data you provided when you train your next models.\n",
    "\n",
    "#### Real time events\n",
    "\n",
    "Start by creating an event tracker that is attached to the campaign."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "arn:aws:personalize:us-east-1:144386903708:event-tracker/d2e7ccdc\n",
      "820029aa-b00c-4eff-9e6f-60830bb68508\n"
     ]
    }
   ],
   "source": [
    "response = personalize.create_event_tracker(\n",
    "    name='AirlinesEventsTracker',\n",
    "    datasetGroupArn=dataset_group_arn\n",
    ")\n",
    "print(response['eventTrackerArn'])\n",
    "print(response['trackingId'])\n",
    "TRACKING_ID = response['trackingId']\n",
    "event_tracker_arn = response['eventTrackerArn']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will create some code that simulates a user interacting with a particular item. After running this code, you will get recommendations that differ from the results above.\n",
    "\n",
    "We start by creating some methods for the simulation of real time events."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 200,
   "metadata": {},
   "outputs": [],
   "source": [
    "session_dict = {}\n",
    "\n",
    "def send_user_rating(USER_ID, ITEM_ID):\n",
    "    \"\"\"\n",
    "    Simulates a click as an envent\n",
    "    to send an event to Amazon Personalize's Event Tracker\n",
    "    \"\"\"\n",
    "    # Configure Session\n",
    "    try:\n",
    "        session_ID = session_dict[str(USER_ID)]\n",
    "    except:\n",
    "        session_dict[str(USER_ID)] = str(uuid.uuid1())\n",
    "        session_ID = session_dict[str(USER_ID)]\n",
    "        \n",
    "    # Configure Properties:\n",
    "    event = {\n",
    "        \"itemId\": str(ITEM_ID),\n",
    "        \"eventValue\": 10,\n",
    "        \"cabinType\": \"Economy\"\n",
    "    }\n",
    "    event_json = json.dumps(event)\n",
    "        \n",
    "    # Make Call\n",
    "    personalize_events.put_events(\n",
    "        trackingId = TRACKING_ID,\n",
    "        userId= str(USER_ID),\n",
    "        sessionId = session_ID,\n",
    "        eventList = [{\n",
    "            'sentAt': int(time.time()),\n",
    "            'eventType': 'RATING',\n",
    "            'properties': event_json\n",
    "            }]\n",
    "    )\n",
    "\n",
    "def get_new_recommendations_df_users_real_time(recommendations_df, user_id, item_id):\n",
    "    # Interact with the airline\n",
    "    # Sending a rating of 10 in Economy class for the airline with that user\n",
    "    send_user_rating(USER_ID=user_id, ITEM_ID=item_id)\n",
    "    \n",
    "    \n",
    "    #   Context Recommendations\n",
    "    get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "        campaignArn = campaign_arn,\n",
    "        userId = str(user_id),\n",
    "        context = {\n",
    "          'CABIN_TYPE': 'Economy'\n",
    "        }\n",
    "    )\n",
    "    # Build a new dataframe of recommendations\n",
    "    item_list = get_recommendations_response['itemList']\n",
    "    recommendation_list = []\n",
    "    for item in item_list:\n",
    "        recommendation_list.append(item['itemId'])\n",
    "    new_rec_DF = pd.DataFrame(recommendation_list, columns = [item_id+'|Economy'])\n",
    "    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
    "    return recommendations_df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point, we haven't generated any real-time events yet; we have only set up the code. To compare the recommendations before and after the real-time events, let's pick one user and generate the original recommendations for them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recommendations before using the Event Tracker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 265,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        USER_ID NATIONALITY\n",
      "6313  ACrociani       Italy\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>None</th>\n",
       "      <th>Economy</th>\n",
       "      <th>Business Class</th>\n",
       "      <th>Premium Economy</th>\n",
       "      <th>First Class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>british-airways</td>\n",
       "      <td>alitalia</td>\n",
       "      <td>iberia</td>\n",
       "      <td>british-airways</td>\n",
       "      <td>american-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>alitalia</td>\n",
       "      <td>brussels-airlines</td>\n",
       "      <td>british-airways</td>\n",
       "      <td>virgin-atlantic-airways</td>\n",
       "      <td>british-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>iberia</td>\n",
       "      <td>ryanair</td>\n",
       "      <td>qatar-airways</td>\n",
       "      <td>united-airlines</td>\n",
       "      <td>delta-air-lines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>brussels-airlines</td>\n",
       "      <td>easyjet</td>\n",
       "      <td>alitalia</td>\n",
       "      <td>brussels-airlines</td>\n",
       "      <td>united-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>qatar-airways</td>\n",
       "      <td>iberia</td>\n",
       "      <td>brussels-airlines</td>\n",
       "      <td>turkish-airlines</td>\n",
       "      <td>lufthansa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>lufthansa</td>\n",
       "      <td>aer-lingus</td>\n",
       "      <td>emirates</td>\n",
       "      <td>alitalia</td>\n",
       "      <td>emirates</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>american-airlines</td>\n",
       "      <td>aegean-airlines</td>\n",
       "      <td>lufthansa</td>\n",
       "      <td>air-france</td>\n",
       "      <td>qatar-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>united-airlines</td>\n",
       "      <td>lufthansa</td>\n",
       "      <td>turkish-airlines</td>\n",
       "      <td>lufthansa</td>\n",
       "      <td>swiss-international-air-lines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>icelandair</td>\n",
       "      <td>qatar-airways</td>\n",
       "      <td>swiss-international-air-lines</td>\n",
       "      <td>icelandair</td>\n",
       "      <td>us-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>delta-air-lines</td>\n",
       "      <td>tap-portugal</td>\n",
       "      <td>oman-air</td>\n",
       "      <td>delta-air-lines</td>\n",
       "      <td>iberia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>turkish-airlines</td>\n",
       "      <td>air-berlin</td>\n",
       "      <td>air-europa</td>\n",
       "      <td>iberia</td>\n",
       "      <td>air-berlin</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>aer-lingus</td>\n",
       "      <td>germanwings</td>\n",
       "      <td>egyptair</td>\n",
       "      <td>klm-royal-dutch-airlines</td>\n",
       "      <td>alitalia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>air-berlin</td>\n",
       "      <td>vueling-airlines</td>\n",
       "      <td>american-airlines</td>\n",
       "      <td>thomson-airways</td>\n",
       "      <td>icelandair</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>emirates</td>\n",
       "      <td>british-airways</td>\n",
       "      <td>tap-portugal</td>\n",
       "      <td>sas-scandinavian-airlines</td>\n",
       "      <td>aer-lingus</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>air-france</td>\n",
       "      <td>icelandair</td>\n",
       "      <td>icelandair</td>\n",
       "      <td>american-airlines</td>\n",
       "      <td>turkish-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>virgin-atlantic-airways</td>\n",
       "      <td>air-europa</td>\n",
       "      <td>air-berlin</td>\n",
       "      <td>norwegian</td>\n",
       "      <td>brussels-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>tap-portugal</td>\n",
       "      <td>egyptair</td>\n",
       "      <td>austrian-airlines</td>\n",
       "      <td>us-airways</td>\n",
       "      <td>austrian-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>air-europa</td>\n",
       "      <td>austrian-airlines</td>\n",
       "      <td>united-airlines</td>\n",
       "      <td>air-canada-rouge</td>\n",
       "      <td>alaska-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>swiss-international-air-lines</td>\n",
       "      <td>wizz-air</td>\n",
       "      <td>finnair</td>\n",
       "      <td>thomas-cook-airlines</td>\n",
       "      <td>ana-all-nippon-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>us-airways</td>\n",
       "      <td>ethiopian-airlines</td>\n",
       "      <td>virgin-atlantic-airways</td>\n",
       "      <td>qatar-airways</td>\n",
       "      <td>air-france</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>ryanair</td>\n",
       "      <td>air-india</td>\n",
       "      <td>aegean-airlines</td>\n",
       "      <td>openskies</td>\n",
       "      <td>sas-scandinavian-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>easyjet</td>\n",
       "      <td>american-airlines</td>\n",
       "      <td>etihad-airways</td>\n",
       "      <td>easyjet</td>\n",
       "      <td>etihad-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>austrian-airlines</td>\n",
       "      <td>meridiana</td>\n",
       "      <td>south-african-airways</td>\n",
       "      <td>cathay-pacific-airways</td>\n",
       "      <td>air-canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>aegean-airlines</td>\n",
       "      <td>jet-airways</td>\n",
       "      <td>avianca</td>\n",
       "      <td>vueling-airlines</td>\n",
       "      <td>norwegian</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>egyptair</td>\n",
       "      <td>ukraine-international-airlines</td>\n",
       "      <td>air-india</td>\n",
       "      <td>egyptair</td>\n",
       "      <td>finnair</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             None                         Economy  \\\n",
       "0                 british-airways                        alitalia   \n",
       "1                        alitalia               brussels-airlines   \n",
       "2                          iberia                         ryanair   \n",
       "3               brussels-airlines                         easyjet   \n",
       "4                   qatar-airways                          iberia   \n",
       "5                       lufthansa                      aer-lingus   \n",
       "6               american-airlines                 aegean-airlines   \n",
       "7                 united-airlines                       lufthansa   \n",
       "8                      icelandair                   qatar-airways   \n",
       "9                 delta-air-lines                    tap-portugal   \n",
       "10               turkish-airlines                      air-berlin   \n",
       "11                     aer-lingus                     germanwings   \n",
       "12                     air-berlin                vueling-airlines   \n",
       "13                       emirates                 british-airways   \n",
       "14                     air-france                      icelandair   \n",
       "15        virgin-atlantic-airways                      air-europa   \n",
       "16                   tap-portugal                        egyptair   \n",
       "17                     air-europa               austrian-airlines   \n",
       "18  swiss-international-air-lines                        wizz-air   \n",
       "19                     us-airways              ethiopian-airlines   \n",
       "20                        ryanair                       air-india   \n",
       "21                        easyjet               american-airlines   \n",
       "22              austrian-airlines                       meridiana   \n",
       "23                aegean-airlines                     jet-airways   \n",
       "24                       egyptair  ukraine-international-airlines   \n",
       "\n",
       "                   Business Class            Premium Economy  \\\n",
       "0                          iberia            british-airways   \n",
       "1                 british-airways    virgin-atlantic-airways   \n",
       "2                   qatar-airways            united-airlines   \n",
       "3                        alitalia          brussels-airlines   \n",
       "4               brussels-airlines           turkish-airlines   \n",
       "5                        emirates                   alitalia   \n",
       "6                       lufthansa                 air-france   \n",
       "7                turkish-airlines                  lufthansa   \n",
       "8   swiss-international-air-lines                 icelandair   \n",
       "9                        oman-air            delta-air-lines   \n",
       "10                     air-europa                     iberia   \n",
       "11                       egyptair   klm-royal-dutch-airlines   \n",
       "12              american-airlines            thomson-airways   \n",
       "13                   tap-portugal  sas-scandinavian-airlines   \n",
       "14                     icelandair          american-airlines   \n",
       "15                     air-berlin                  norwegian   \n",
       "16              austrian-airlines                 us-airways   \n",
       "17                united-airlines           air-canada-rouge   \n",
       "18                        finnair       thomas-cook-airlines   \n",
       "19        virgin-atlantic-airways              qatar-airways   \n",
       "20                aegean-airlines                  openskies   \n",
       "21                 etihad-airways                    easyjet   \n",
       "22          south-african-airways     cathay-pacific-airways   \n",
       "23                        avianca           vueling-airlines   \n",
       "24                      air-india                   egyptair   \n",
       "\n",
       "                      First Class  \n",
       "0               american-airlines  \n",
       "1                 british-airways  \n",
       "2                 delta-air-lines  \n",
       "3                 united-airlines  \n",
       "4                       lufthansa  \n",
       "5                        emirates  \n",
       "6                   qatar-airways  \n",
       "7   swiss-international-air-lines  \n",
       "8                      us-airways  \n",
       "9                          iberia  \n",
       "10                     air-berlin  \n",
       "11                       alitalia  \n",
       "12                     icelandair  \n",
       "13                     aer-lingus  \n",
       "14               turkish-airlines  \n",
       "15              brussels-airlines  \n",
       "16              austrian-airlines  \n",
       "17                alaska-airlines  \n",
       "18         ana-all-nippon-airways  \n",
       "19                     air-france  \n",
       "20      sas-scandinavian-airlines  \n",
       "21                 etihad-airways  \n",
       "22                     air-canada  \n",
       "23                      norwegian  \n",
       "24                        finnair  "
      ]
     },
     "execution_count": 265,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "recommendations_df_users = pd.DataFrame()\n",
    "users = users_df.sample()\n",
    "print(users)\n",
    "users= users['USER_ID'].tolist()\n",
    "for user in users:\n",
    "    recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)\n",
    "user_id = users[0]\n",
    "recommendations_df_users"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 266,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'ACrociani'"
      ]
     },
     "execution_count": 266,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_id"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, so now we have a list of recommendations for this user before we have applied any real-time events. Now let's pick 3 random artists which we will simulate our user interacting with, and then see how this changes the recommendations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 267,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['thai-airways', 'austrian-airlines', 'united-airlines']"
      ]
     },
     "execution_count": 267,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Next generate 3 random Airlines\n",
    "airlines = a_interactions_df.sample(3)['ITEM_ID'].tolist()\n",
    "airlines"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 268,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ACrociani\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>thai-airways|Economy</th>\n",
       "      <th>austrian-airlines|Economy</th>\n",
       "      <th>united-airlines|Economy</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>alitalia</td>\n",
       "      <td>ryanair</td>\n",
       "      <td>brussels-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>brussels-airlines</td>\n",
       "      <td>brussels-airlines</td>\n",
       "      <td>lufthansa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ryanair</td>\n",
       "      <td>easyjet</td>\n",
       "      <td>turkish-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>easyjet</td>\n",
       "      <td>tap-portugal</td>\n",
       "      <td>austrian-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>iberia</td>\n",
       "      <td>aegean-airlines</td>\n",
       "      <td>tap-portugal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>aer-lingus</td>\n",
       "      <td>turkish-airlines</td>\n",
       "      <td>alitalia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>aegean-airlines</td>\n",
       "      <td>lufthansa</td>\n",
       "      <td>germanwings</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>lufthansa</td>\n",
       "      <td>alitalia</td>\n",
       "      <td>aegean-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>qatar-airways</td>\n",
       "      <td>iberia</td>\n",
       "      <td>swiss-international-air-lines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>tap-portugal</td>\n",
       "      <td>air-india</td>\n",
       "      <td>air-berlin</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>air-berlin</td>\n",
       "      <td>qatar-airways</td>\n",
       "      <td>easyjet</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>germanwings</td>\n",
       "      <td>swiss-international-air-lines</td>\n",
       "      <td>ryanair</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>vueling-airlines</td>\n",
       "      <td>egyptair</td>\n",
       "      <td>egyptair</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>british-airways</td>\n",
       "      <td>austrian-airlines</td>\n",
       "      <td>iberia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>icelandair</td>\n",
       "      <td>air-berlin</td>\n",
       "      <td>icelandair</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>air-europa</td>\n",
       "      <td>emirates</td>\n",
       "      <td>ethiopian-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>egyptair</td>\n",
       "      <td>germanwings</td>\n",
       "      <td>air-india</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>austrian-airlines</td>\n",
       "      <td>aeroflot-russian-airlines</td>\n",
       "      <td>qatar-airways</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>wizz-air</td>\n",
       "      <td>ethiopian-airlines</td>\n",
       "      <td>air-europa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>ethiopian-airlines</td>\n",
       "      <td>vueling-airlines</td>\n",
       "      <td>vueling-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>air-india</td>\n",
       "      <td>aer-lingus</td>\n",
       "      <td>aeroflot-russian-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>american-airlines</td>\n",
       "      <td>british-airways</td>\n",
       "      <td>united-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>meridiana</td>\n",
       "      <td>tam-airlines</td>\n",
       "      <td>norwegian</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>jet-airways</td>\n",
       "      <td>american-airlines</td>\n",
       "      <td>american-airlines</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>ukraine-international-airlines</td>\n",
       "      <td>bangkok-airways</td>\n",
       "      <td>tam-airlines</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              thai-airways|Economy      austrian-airlines|Economy  \\\n",
       "0                         alitalia                        ryanair   \n",
       "1                brussels-airlines              brussels-airlines   \n",
       "2                          ryanair                        easyjet   \n",
       "3                          easyjet                   tap-portugal   \n",
       "4                           iberia                aegean-airlines   \n",
       "5                       aer-lingus               turkish-airlines   \n",
       "6                  aegean-airlines                      lufthansa   \n",
       "7                        lufthansa                       alitalia   \n",
       "8                    qatar-airways                         iberia   \n",
       "9                     tap-portugal                      air-india   \n",
       "10                      air-berlin                  qatar-airways   \n",
       "11                     germanwings  swiss-international-air-lines   \n",
       "12                vueling-airlines                       egyptair   \n",
       "13                 british-airways              austrian-airlines   \n",
       "14                      icelandair                     air-berlin   \n",
       "15                      air-europa                       emirates   \n",
       "16                        egyptair                    germanwings   \n",
       "17               austrian-airlines      aeroflot-russian-airlines   \n",
       "18                        wizz-air             ethiopian-airlines   \n",
       "19              ethiopian-airlines               vueling-airlines   \n",
       "20                       air-india                     aer-lingus   \n",
       "21               american-airlines                british-airways   \n",
       "22                       meridiana                   tam-airlines   \n",
       "23                     jet-airways              american-airlines   \n",
       "24  ukraine-international-airlines                bangkok-airways   \n",
       "\n",
       "          united-airlines|Economy  \n",
       "0               brussels-airlines  \n",
       "1                       lufthansa  \n",
       "2                turkish-airlines  \n",
       "3               austrian-airlines  \n",
       "4                    tap-portugal  \n",
       "5                        alitalia  \n",
       "6                     germanwings  \n",
       "7                 aegean-airlines  \n",
       "8   swiss-international-air-lines  \n",
       "9                      air-berlin  \n",
       "10                        easyjet  \n",
       "11                        ryanair  \n",
       "12                       egyptair  \n",
       "13                         iberia  \n",
       "14                     icelandair  \n",
       "15             ethiopian-airlines  \n",
       "16                      air-india  \n",
       "17                  qatar-airways  \n",
       "18                     air-europa  \n",
       "19               vueling-airlines  \n",
       "20      aeroflot-russian-airlines  \n",
       "21                united-airlines  \n",
       "22                      norwegian  \n",
       "23              american-airlines  \n",
       "24                   tam-airlines  "
      ]
     },
     "execution_count": 268,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_recommendations_df = pd.DataFrame()\n",
    "# Note this will take about 15 seconds to complete due to the sleeps\n",
    "for airline in airlines:\n",
    "    user_recommendations_df = get_new_recommendations_df_users_real_time(user_recommendations_df, user_id, airline)\n",
    "    time.sleep(5)\n",
    "print(user_id)\n",
    "user_recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the cell above, the first column after the index is the user's default recommendations from the AWS User Personalization model, and each column after that has a header of the airlines that they interacted with via a real time event, and the recommendations after this event occurred. \n",
    "\n",
    "The behavior may not shift very much; this is due to the relatively limited nature of this dataset. If you wanted to better understand this, try simulating rating random airlines with random ratings, and you should see a more pronounced impact."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_mxnet_p36",
   "language": "python",
   "name": "conda_mxnet_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}