{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting recommendations using Amazon Personalization and AWS Data Exchange\n",
    "\n",
    "Recommended Time: 90 Min"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "\n",
    "To use this notebook you need to be suscribed to "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Dependencies and Setup Boto3 Python Clients\n",
    "\n",
    "Throughout this workshop we will need access to some common libraries and clients for connecting to AWS services. We also have to retrieve Uid from a SageMaker notebook instance tag."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import Dependencies\n",
    "\n",
    "import boto3\n",
    "import json\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "import time\n",
    "import requests\n",
    "import csv\n",
    "import sys\n",
    "import botocore\n",
    "import uuid\n",
    "\n",
    "from datetime import datetime\n",
    "from datetime import date\n",
    "from packaging import version\n",
    "from random import randint\n",
    "from botocore.exceptions import ClientError\n",
    "\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "# Setup Clients\n",
    "\n",
    "personalize = boto3.client('personalize')\n",
    "personalize_runtime = boto3.client('personalize-runtime')\n",
    "personalize_events = boto3.client('personalize-events')\n",
    "s3 = boto3.client('s3')\n",
    "\n",
    "with open('/opt/ml/metadata/resource-metadata.json') as f:\n",
    "  data = json.load(f)\n",
    "sagemaker = boto3.client('sagemaker')\n",
    "sagemakerResponce = sagemaker.list_tags(ResourceArn=data[\"ResourceArn\"])\n",
    "for tag in sagemakerResponce[\"Tags\"]:\n",
    "    if tag['Key'] == 'Uid':\n",
    "        Uid = tag['Value']\n",
    "        break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implement some visualization functions for displaying information of the products in a dataframe\n",
    "\n",
    "Throughout this workshop we will need to search information of products several times, this function will help us to do it without repeating the same code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def search_items_in_dataframe(item_list):\n",
    "    df = pd.DataFrame() \n",
    "    for x in range(len(item_list)):\n",
    "        temp = products_dataset_df.loc[products_dataset_df['ITEM_ID'] == int(item_list[x]['itemId'])]\n",
    "        df = df.append(temp, ignore_index=True)\n",
    "    pd.set_option('display.max_rows', 10)\n",
    "    return df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configure Bucket and Data Output Location"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will be configuring some variables that will store the location of our source data. Substitute the name of the bucket we will create later with your own. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "bucket = \"\"     # Use your own bucket\n",
    "items_filename = \"items.csv\"                # Do Not Change\n",
    "users_filename = \"users.csv\"                # Do Not Change\n",
    "interactions_filename = \"interactions.csv\"  # Do Not Change"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get, Prepare, and Upload User, Product, and Interaction Data\n",
    "\n",
    "First we need to create a bucket to store the datasets for Personalize to consume them. \n",
    "\n",
    "Download datasets.\n",
    "\n",
    "Let's get started."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Substitute the '/dataset/object-key' by the key of the file copied by ADX containing the weather dataset\n",
    "\n",
    "s3.download_file(bucket, '/dataset/object-key', 'weather_data.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Download and Explore and clean the Weather Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>date</th>\n",
       "      <th>avgTemp</th>\n",
       "      <th>maxTemp</th>\n",
       "      <th>minTemp</th>\n",
       "      <th>prcp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>20180101</td>\n",
       "      <td>23.5</td>\n",
       "      <td>29</td>\n",
       "      <td>18</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boston</td>\n",
       "      <td>20180101</td>\n",
       "      <td>6.5</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6088</th>\n",
       "      <td>New York City</td>\n",
       "      <td>20201011</td>\n",
       "      <td>63.5</td>\n",
       "      <td>69</td>\n",
       "      <td>58</td>\n",
       "      <td>0.01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6089</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>20201011</td>\n",
       "      <td>54.5</td>\n",
       "      <td>59</td>\n",
       "      <td>50</td>\n",
       "      <td>0.61</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>6090 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               city      date  avgTemp  maxTemp  minTemp  prcp\n",
       "0           Atlanta  20180101     23.5       29       18  0.00\n",
       "1            Boston  20180101      6.5       13        0  0.00\n",
       "...             ...       ...      ...      ...      ...   ...\n",
       "6088  New York City  20201011     63.5       69       58  0.01\n",
       "6089        Seattle  20201011     54.5       59       50  0.61\n",
       "\n",
       "[6090 rows x 6 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather_df = pd.read_csv('weather_data.csv')\n",
    "pd.set_option('display.max_rows', 5)\n",
    "weather_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>date</th>\n",
       "      <th>avgTemp</th>\n",
       "      <th>maxTemp</th>\n",
       "      <th>minTemp</th>\n",
       "      <th>prcp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>20180101</td>\n",
       "      <td>23.5</td>\n",
       "      <td>29</td>\n",
       "      <td>18</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>20180102</td>\n",
       "      <td>24.5</td>\n",
       "      <td>36</td>\n",
       "      <td>13</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2026</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>20201010</td>\n",
       "      <td>71.0</td>\n",
       "      <td>77</td>\n",
       "      <td>65</td>\n",
       "      <td>4.55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2028</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>20201011</td>\n",
       "      <td>72.5</td>\n",
       "      <td>75</td>\n",
       "      <td>70</td>\n",
       "      <td>0.05</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1015 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         city      date  avgTemp  maxTemp  minTemp  prcp\n",
       "0     Atlanta  20180101     23.5       29       18  0.00\n",
       "2     Atlanta  20180102     24.5       36       13  0.00\n",
       "...       ...       ...      ...      ...      ...   ...\n",
       "2026  Atlanta  20201010     71.0       77       65  4.55\n",
       "2028  Atlanta  20201011     72.5       75       70  0.05\n",
       "\n",
       "[1015 rows x 6 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Create a new DF with all the values related to Atlanta \n",
    "weather_df= weather_df[weather_df.city == 'Atlanta']\n",
    "weather_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Helper function to find the temperature in weather dataset\n",
    "\n",
    "def find_weather_data_by_timestamp(timestamp):\n",
    "    date_ds = str(date.fromtimestamp(timestamp)).replace('-', '')\n",
    "    data = weather_df.loc[weather_df['date'] == int(date_ds)]\n",
    "    daily_temp = float (data['avgTemp'])\n",
    "    daily_temp = int ((daily_temp - 32) * 5.0/9.0)\n",
    "    \n",
    "    if daily_temp < 5:\n",
    "        return 'very cold'\n",
    "    elif daily_temp >=5 and daily_temp < 10:\n",
    "        return 'cold'\n",
    "    elif daily_temp >=10 and daily_temp < 15:\n",
    "        return 'slightly cold'\n",
    "    elif daily_temp >= 15 and daily_temp < 21:\n",
    "        return 'lukewarm'\n",
    "    elif daily_temp >= 21 and daily_temp < 28:\n",
    "        return 'hot'\n",
    "    else:\n",
    "        return 'very hot'\n",
    "    return daily_temp\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'lukewarm'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#test a sample timestamp\n",
    "find_weather_data_by_timestamp(1587846700)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Download and Explore and clean the Products Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>sk</th>\n",
       "      <th>name</th>\n",
       "      <th>category</th>\n",
       "      <th>type</th>\n",
       "      <th>size</th>\n",
       "      <th>price</th>\n",
       "      <th>featured</th>\n",
       "      <th>gender_affinity</th>\n",
       "      <th>sugar</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Amber Lager</td>\n",
       "      <td>beers</td>\n",
       "      <td>beer</td>\n",
       "      <td>4x355</td>\n",
       "      <td>4.99</td>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>REGULAR</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cream Stout</td>\n",
       "      <td>beers</td>\n",
       "      <td>beer</td>\n",
       "      <td>4x330</td>\n",
       "      <td>3.99</td>\n",
       "      <td>True</td>\n",
       "      <td>F</td>\n",
       "      <td>REGULAR</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>55</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Peach Nectar</td>\n",
       "      <td>juices</td>\n",
       "      <td>nectar</td>\n",
       "      <td>6x200</td>\n",
       "      <td>1.49</td>\n",
       "      <td>NaN</td>\n",
       "      <td>M</td>\n",
       "      <td>LOW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>56</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Coconut Water</td>\n",
       "      <td>other</td>\n",
       "      <td>coconut water</td>\n",
       "      <td>1x250</td>\n",
       "      <td>1.69</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0%</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>56 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    id  sk           name category           type   size  price featured  \\\n",
       "0    1 NaN    Amber Lager    beers           beer  4x355   4.99     True   \n",
       "1    2 NaN    Cream Stout    beers           beer  4x330   3.99     True   \n",
       "..  ..  ..            ...      ...            ...    ...    ...      ...   \n",
       "54  55 NaN   Peach Nectar   juices         nectar  6x200   1.49      NaN   \n",
       "55  56 NaN  Coconut Water    other  coconut water  1x250   1.69      NaN   \n",
       "\n",
       "   gender_affinity    sugar  \n",
       "0              NaN  REGULAR  \n",
       "1                F  REGULAR  \n",
       "..             ...      ...  \n",
       "54               M      LOW  \n",
       "55             NaN       0%  \n",
       "\n",
       "[56 rows x 10 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "products_df = pd.read_csv('./items-origin.csv')\n",
    "pd.set_option('display.max_rows', 5)\n",
    "products_df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Clean the product dataset and drop columms we don't need.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Prepare products Data\n",
    "\n",
    "When training models in Amazon Personalize, we can provide meta data about our items. For this workshop we will add each product's category and style to the item dataset. The product's unique identifier is required. Then we will rename the columns in our dataset to match our schema (defined later) and those expected by Personalize. Finally, we will save our dataset as a CSV and copy it to our S3 bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_dataset_df = products_df[['id','category','type', 'size']]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['beers', 'sparkling', 'waters', 'juices', 'isotonic', 'other',\n",
       "       'spirits', 'energy'], dtype=object)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "products_dataset_df['category'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_dataset_df = products_dataset_df.rename(columns = {'id':'ITEM_ID','category':'CATEGORY','type':'TYPE', 'size':'SIZE'}) \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ITEM_ID</th>\n",
       "      <th>CATEGORY</th>\n",
       "      <th>TYPE</th>\n",
       "      <th>SIZE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>beers</td>\n",
       "      <td>beer</td>\n",
       "      <td>4x355</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>beers</td>\n",
       "      <td>beer</td>\n",
       "      <td>4x330</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>55</td>\n",
       "      <td>juices</td>\n",
       "      <td>nectar</td>\n",
       "      <td>6x200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55</th>\n",
       "      <td>56</td>\n",
       "      <td>other</td>\n",
       "      <td>coconut water</td>\n",
       "      <td>1x250</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>56 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    ITEM_ID CATEGORY           TYPE   SIZE\n",
       "0         1    beers           beer  4x355\n",
       "1         2    beers           beer  4x330\n",
       "..      ...      ...            ...    ...\n",
       "54       55   juices         nectar  6x200\n",
       "55       56    other  coconut water  1x250\n",
       "\n",
       "[56 rows x 4 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.set_option('display.max_rows', 5)\n",
    "products_dataset_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_dataset_df.to_csv(items_filename, index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Download and Explore the Users Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>username</th>\n",
       "      <th>email</th>\n",
       "      <th>first_name</th>\n",
       "      <th>last_name</th>\n",
       "      <th>addresses</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>persona</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17</td>\n",
       "      <td>user17</td>\n",
       "      <td>annette.baker@example.com</td>\n",
       "      <td>Annette</td>\n",
       "      <td>Baker</td>\n",
       "      <td>[{'first_name': 'Annette', 'last_name': 'Baker...</td>\n",
       "      <td>18</td>\n",
       "      <td>F</td>\n",
       "      <td>sparkling_waters_other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>63</td>\n",
       "      <td>user63</td>\n",
       "      <td>cheryl.duncan@example.com</td>\n",
       "      <td>Cheryl</td>\n",
       "      <td>Duncan</td>\n",
       "      <td>[{'first_name': 'Cheryl', 'last_name': 'Duncan...</td>\n",
       "      <td>18</td>\n",
       "      <td>F</td>\n",
       "      <td>waters_juices_other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5998</th>\n",
       "      <td>203</td>\n",
       "      <td>user203</td>\n",
       "      <td>ryan.barnes@example.com</td>\n",
       "      <td>Ryan</td>\n",
       "      <td>Barnes</td>\n",
       "      <td>[{'first_name': 'Ryan', 'last_name': 'Barnes',...</td>\n",
       "      <td>81</td>\n",
       "      <td>M</td>\n",
       "      <td>waters_juices_other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5999</th>\n",
       "      <td>4521</td>\n",
       "      <td>user4521</td>\n",
       "      <td>paul.nelson@example.com</td>\n",
       "      <td>Paul</td>\n",
       "      <td>Nelson</td>\n",
       "      <td>[{'first_name': 'Paul', 'last_name': 'Nelson',...</td>\n",
       "      <td>81</td>\n",
       "      <td>M</td>\n",
       "      <td>juices_waters_sparkling</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>6000 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        id  username                      email first_name last_name  \\\n",
       "0       17    user17  annette.baker@example.com    Annette     Baker   \n",
       "1       63    user63  cheryl.duncan@example.com     Cheryl    Duncan   \n",
       "...    ...       ...                        ...        ...       ...   \n",
       "5998   203   user203    ryan.barnes@example.com       Ryan    Barnes   \n",
       "5999  4521  user4521    paul.nelson@example.com       Paul    Nelson   \n",
       "\n",
       "                                              addresses  age gender  \\\n",
       "0     [{'first_name': 'Annette', 'last_name': 'Baker...   18      F   \n",
       "1     [{'first_name': 'Cheryl', 'last_name': 'Duncan...   18      F   \n",
       "...                                                 ...  ...    ...   \n",
       "5998  [{'first_name': 'Ryan', 'last_name': 'Barnes',...   81      M   \n",
       "5999  [{'first_name': 'Paul', 'last_name': 'Nelson',...   81      M   \n",
       "\n",
       "                      persona  \n",
       "0      sparkling_waters_other  \n",
       "1         waters_juices_other  \n",
       "...                       ...  \n",
       "5998      waters_juices_other  \n",
       "5999  juices_waters_sparkling  \n",
       "\n",
       "[6000 rows x 9 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users_df = pd.read_csv('./users-origin.csv')\n",
    "pd.set_option('display.max_rows', 5)\n",
    "users_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Prepare products Data\n",
    "\n",
    "Similar to the items dataset we created above, we can provide metadata on our users when training models in Amazon Personalize. For this workshop we will include each user's id and persona. As before, we will name the columns to match our schema, save the data as a CSV, and upload to our S3 bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "users_dataset_df = users_df[['id','persona']]\n",
    "users_dataset_df = users_dataset_df.rename(columns = {'id':'USER_ID','persona':'PERSONA'}) \n",
    "users_dataset_df.head(5)\n",
    "\n",
    "users_dataset_df.to_csv(users_filename, index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>USER_ID</th>\n",
       "      <th>PERSONA</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17</td>\n",
       "      <td>sparkling_waters_other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>63</td>\n",
       "      <td>waters_juices_other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>74</td>\n",
       "      <td>isotonic_spirits_waters</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>88</td>\n",
       "      <td>spirits_beers_sparkling</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>144</td>\n",
       "      <td>waters_juices_other</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   USER_ID                  PERSONA\n",
       "0       17   sparkling_waters_other\n",
       "1       63      waters_juices_other\n",
       "2       74  isotonic_spirits_waters\n",
       "3       88  spirits_beers_sparkling\n",
       "4      144      waters_juices_other"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users_dataset_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_dataset_df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create User-Items Interactions Dataset\n",
    "\n",
    "To mimic user behavior, we will be generating a new dataset that represents user interactions with items. To make the interactions more realistic, we will use a predefined shopper persona for each user to generate event types for products matching that persona. This persona is composed by 3 categories, separated by the symbol \"_\". \n",
    "The upsampling process will create events for viewing products, add products to a cart, checking out, and completing orders."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Minimum interactions to generate: 2000000\n",
      "Days back: 365\n",
      "Starting timestamp: 1529918873 (2018-06-25 09:27:53)\n",
      "Seconds increment: 15\n",
      "Generating interactions... (this may take a few minutes)\n",
      "Done\n",
      "Total interactions: 2000001\n",
      "Total product viewed: 1724140\n",
      "Total product added: 137931\n",
      "Total cart viewed: 86207\n",
      "Total checkout started: 34482\n",
      "Total order completed: 17241\n",
      "CPU times: user 45min 23s, sys: 1.44 s, total: 45min 24s\n",
      "Wall time: 45min 26s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "# Minimum number of interactions to generate\n",
    "min_interactions = 2000000\n",
    "\n",
    "# Percentages of each event type to generate\n",
    "product_added_percent = .08\n",
    "cart_viewed_percent = .05\n",
    "checkout_started_percent = .02\n",
    "order_completed_percent = .01\n",
    "\n",
    "# Count of interactions generated for each event type\n",
    "product_viewed_count = 0\n",
    "product_added_count = 0\n",
    "cart_viewed_count = 0\n",
    "checkout_started_count = 0\n",
    "order_completed_count = 0\n",
    "\n",
    "# How many days in the past (from initial date) to start generating interactions\n",
    "days_back = 365\n",
    "\n",
    "#selecting a start time between 2020/02/23 and 2020/10/22 to match the weather data from the sample\n",
    "date_time_obj = datetime.strptime('2019-06-25 09:27:53', '%Y-%m-%d %H:%M:%S')\n",
    "start_time = int(datetime.timestamp(date_time_obj))\n",
    "#start_time = int(time.time())\n",
    "\n",
    "\n",
    "next_timestamp = start_time - (days_back * 24 * 60 * 60)\n",
    "seconds_increment = int((start_time - next_timestamp) / min_interactions)\n",
    "next_update = start_time + 60\n",
    "\n",
    "assert seconds_increment > 0, \"Increase days_back or reduce min_interactions\"\n",
    "\n",
    "print('Minimum interactions to generate: {}'.format(min_interactions))\n",
    "print('Days back: {}'.format(days_back))\n",
    "print('Starting timestamp: {} ({})'.format(next_timestamp, time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(next_timestamp))))\n",
    "print('Seconds increment: {}'.format(seconds_increment))\n",
    "\n",
    "print(\"Generating interactions... (this may take a few minutes)\")\n",
    "interactions = 0\n",
    "\n",
    "subsets_cache = {}\n",
    "\n",
    "with open(interactions_filename, 'w') as outfile:\n",
    "    f = csv.writer(outfile)\n",
    "    f.writerow([\"ITEM_ID\", \"USER_ID\", \"EVENT_TYPE\", \"TIMESTAMP\", \"DAILY_TEMPERATURE\"])\n",
    "\n",
    "    while interactions < min_interactions:\n",
    "        #if (time.time() > next_update):\n",
    "         #   rate = interactions / (time.time() - start_time)\n",
    "          #  to_go = (min_interactions - interactions) / rate\n",
    "           # print('Generated {} interactions so far ({:0.2f} seconds to go)'.format(interactions, to_go))\n",
    "            #next_update += 60\n",
    "\n",
    "        # Pick a random user\n",
    "        user = users_df.sample().iloc[0]\n",
    "\n",
    "        # Determine category affinity from user's persona\n",
    "        persona = user['persona']\n",
    "        preferred_categories = persona.split('_')       \n",
    "\n",
    "        # Select category based on weighted preference of category order.\n",
    "        category = np.random.choice(preferred_categories, 1, p=[0.6, 0.25, 0.15])[0]\n",
    "        gender = user['gender']\n",
    "\n",
    "        # Check if subset data frame is already cached for category & gender\n",
    "        prods_subset_df = subsets_cache.get(category + gender)\n",
    "                \n",
    "        if prods_subset_df is None:\n",
    "            # Select products from selected category without gender affinity or that match user's gender\n",
    "            prods_subset_df = products_df.loc[(products_df['category'] == category)]\n",
    "            # Update cache\n",
    "            subsets_cache[category + gender] = prods_subset_df\n",
    "\n",
    "        # Pick a random product from gender filtered subset\n",
    "        product = prods_subset_df.sample().iloc[0]\n",
    "\n",
    "        this_timestamp = next_timestamp + randint(0, seconds_increment)\n",
    "        daily_temp = find_weather_data_by_timestamp(this_timestamp)\n",
    "        f.writerow([product['id'],\n",
    "                    user['id'], \n",
    "                    'ProductViewed',\n",
    "                    this_timestamp,\n",
    "                    daily_temp])\n",
    "\n",
    "        next_timestamp += seconds_increment\n",
    "        product_viewed_count += 1\n",
    "        interactions += 1\n",
    "\n",
    "        if product_added_count < int(product_viewed_count * product_added_percent):\n",
    "            this_timestamp += randint(0, int(seconds_increment / 2))\n",
    "            daily_temp = find_weather_data_by_timestamp(this_timestamp)\n",
    "            f.writerow([product['id'],\n",
    "                        user['id'], \n",
    "                        'ProductAdded',\n",
    "                        this_timestamp,\n",
    "                        daily_temp])\n",
    "            interactions += 1\n",
    "            product_added_count += 1\n",
    "\n",
    "        if cart_viewed_count < int(product_viewed_count * cart_viewed_percent):\n",
    "            this_timestamp += randint(0, int(seconds_increment / 2))\n",
    "            daily_temp = find_weather_data_by_timestamp(this_timestamp)\n",
    "            f.writerow([product['id'],\n",
    "                        user['id'], \n",
    "                        'CartViewed',\n",
    "                        this_timestamp,\n",
    "                        daily_temp])\n",
    "            interactions += 1\n",
    "            cart_viewed_count += 1\n",
    "\n",
    "        if checkout_started_count < int(product_viewed_count * checkout_started_percent):\n",
    "            this_timestamp += randint(0, int(seconds_increment / 2))\n",
    "            daily_temp = find_weather_data_by_timestamp(this_timestamp)\n",
    "            f.writerow([product['id'],\n",
    "                        user['id'], \n",
    "                        'CheckoutStarted',\n",
    "                        this_timestamp,\n",
    "                        daily_temp])\n",
    "            interactions += 1\n",
    "            checkout_started_count += 1\n",
    "\n",
    "        if order_completed_count < int(product_viewed_count * order_completed_percent):\n",
    "            this_timestamp += randint(0, int(seconds_increment / 2))\n",
    "            daily_temp = find_weather_data_by_timestamp(this_timestamp)\n",
    "            f.writerow([product['id'],\n",
    "                        user['id'], \n",
    "                        'OrderCompleted',\n",
    "                        this_timestamp,\n",
    "                        daily_temp])\n",
    "            interactions += 1\n",
    "            order_completed_count += 1\n",
    "    \n",
    "print(\"Done\")\n",
    "print(\"Total interactions: \" + str(interactions))\n",
    "print(\"Total product viewed: \" + str(product_viewed_count))\n",
    "print(\"Total product added: \" + str(product_added_count))\n",
    "print(\"Total cart viewed: \" + str(cart_viewed_count))\n",
    "print(\"Total checkout started: \" + str(checkout_started_count))\n",
    "print(\"Total order completed: \" + str(order_completed_count))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Open and Explore the Interactions Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ITEM_ID</th>\n",
       "      <th>USER_ID</th>\n",
       "      <th>EVENT_TYPE</th>\n",
       "      <th>TIMESTAMP</th>\n",
       "      <th>DAILY_TEMPERATURE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>50</td>\n",
       "      <td>3271</td>\n",
       "      <td>ProductViewed</td>\n",
       "      <td>1529918883</td>\n",
       "      <td>hot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>32</td>\n",
       "      <td>2228</td>\n",
       "      <td>ProductViewed</td>\n",
       "      <td>1529918900</td>\n",
       "      <td>hot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1999999</th>\n",
       "      <td>1</td>\n",
       "      <td>3739</td>\n",
       "      <td>ProductViewed</td>\n",
       "      <td>1555780961</td>\n",
       "      <td>cold</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000000</th>\n",
       "      <td>1</td>\n",
       "      <td>3739</td>\n",
       "      <td>CartViewed</td>\n",
       "      <td>1555780962</td>\n",
       "      <td>cold</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2000001 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         ITEM_ID  USER_ID     EVENT_TYPE   TIMESTAMP DAILY_TEMPERATURE\n",
       "0             50     3271  ProductViewed  1529918883               hot\n",
       "1             32     2228  ProductViewed  1529918900               hot\n",
       "...          ...      ...            ...         ...               ...\n",
       "1999999        1     3739  ProductViewed  1555780961              cold\n",
       "2000000        1     3739     CartViewed  1555780962              cold\n",
       "\n",
       "[2000001 rows x 5 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "interactions_df = pd.read_csv(interactions_filename)\n",
    "interactions_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Chart the counts of each `EVENT_TYPE` generated for the interactions dataset. We're simulating a site where visitors heavily view/browse products and to a lesser degree add products to their cart and checkout."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:xlabel='EVENT_TYPE', ylabel='count'>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7YAAADcCAYAAABEfT05AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAhIUlEQVR4nO3deZhldX3n8feHbhBBlsRuXMC2EVsjGiHaARVlMcqgoyEYVIhL3NJu6MhEHUwyaPTRmHGyqIBtTwYRn9A4Lq2ttixxoRFEaQiyRRQBpdMkICCKe8N3/ji/si9FVfXt5XbVKd6v57lPnftbzv3eqt89db7n/M65qSokSZIkSeqr7aY7AEmSJEmStoSJrSRJkiSp10xsJUmSJEm9ZmIrSZIkSeo1E1tJkiRJUq+Z2EqSJEmSem3WJbZJTk1yc5Irh2z/giRXJ7kqyRmjjk+SJEmStHVltn2PbZKDgTuB06vqcRtpuwj4f8DTq+r2JHtU1c3bIk5JkiRJ0tYx687YVtVq4LbBsiT7JDkrySVJzk/yO63qz4CTq+r21tekVpIkSZJ6ZtYltpNYBryhqp4IvBk4pZU/CnhUkguSXJTkiGmLUJIkSZK0WeZOdwCjluQBwFOATyQZK75f+zkXWAQcCuwFnJ/kcVX1o20cpiRJkiRpM836xJburPSPqmr/CerWAhdV1a+B65NcQ5foXrwN45MkSZIkbYFZPxW5qn5Ml7Q+HyCd/Vr1Z4DDWvk8uqnJ101HnJIkSZKkzTPrEtsky4GvA49OsjbJK4EXAa9M8i3gKuDI1vxs4NYkVwNfAd5SVbdOR9ySJEmSpM0z677uR5IkSZJ03zLrzthKkiRJku5bTGwlSZIkSb02q+6KPG/evFq4cOF0hyFJkiRJ2souueSSH1bV/InqZlViu3DhQtasWTPdYUiSJEmStrIk35+sbmSJbZJTgecAN1fV4yaofwvd3YrH4ngMML+qbktyA/AT4C5gfVUtHlWckiRJkqR+G+U1tqcBR0xWWVXvq6r9q2p/4G3AeVV120CTw1q9Sa0kSZIkaVIjS2yrajVw20Ybdo4Flo8qFkmSJEnS7DXtd0VOshPdmd1PDRQXcE6SS5Is2Uj/JUnWJFlzyy23jDJUSZIkSdIMNO2JLfBc4IJx05APqqonAM8CXp/k4Mk6V9WyqlpcVYvnz5/wBlmSJEmSpFlsJiS2xzBuGnJVrWs/bwZWAAdMQ1ySJEmSpB6Y1q/7SbIbcAjw4oGynYHtquonbflw4J3bKqYnvuX0bfVSmmUued9LpzsESZIk6T5plF/3sxw4FJiXZC3wdmB7gKpa2podBZxTVT8d6PogYEWSsfjOqKqzRhWnJEmSJKnfRpbYVtWxQ7Q5je5rgQbLrgP2G01UkiRJkqTZZiZcYytJkiRJ0mYzsZUkSZIk9ZqJrSRJkiSp10xsJUmSJEm9ZmIrSZIkSeo1E1tJkiRJUq+Z2EqSJEmSes3EVpIkSZLUaya2kiRJkqReM7GVJEmSJPWaia0kSZIkqddMbCVJkiRJvWZiK0mSJEnqtZEltklOTXJzkisnqT80yR1JLmuPEwfqjkhyTZJrk5wwqhglSZIkSf03yjO2pwFHbKTN+VW1f3u8EyDJHOBk4FnAvsCxSfYdYZySJEmSpB4bWWJbVauB2zaj6wHAtVV1XVX9CjgTOHKrBidJkiRJmjWm+xrbJyf5VpIvJnlsK9sTuHGgzdpWJkmSJEnSvcydxte+FHh4Vd2Z5NnAZ4BFQCZoW5OtJMkSYAnAggULRhCmJEmSJGkmm7YztlX146q6sy2vArZPMo/uDO3DBpruBaybYj3LqmpxVS2eP3/+SGOWJEmSJM0805bYJnlwkrTlA1ostwIXA4uS7J1kB+AYYOV0xSlJkiRJmtlGNhU5yXLgUGBekrXA24HtAapqKXA08Nok64GfA8dUVQHrkxwHnA3MAU6tqqtGFackSZIkqd9GlthW1bEbqT8JOGmSulXAqlHEJUmSJEmaXab7rsiSJEmSJG0RE1tJkiRJUq+Z2EqSJEmSes3EVpIkSZLUaya2kiRJkqReM7GVJEmSJPWaia0kSZIkqddMbCVJkiRJvWZiK0mSJEnqNRNbSZIkSVKvmdhKkiRJknrNxFaSJEmS1GsmtpIkSZKkXjOxlSRJkiT12sgS2ySnJrk5yZWT1L8oyeXtcWGS/QbqbkhyRZLLkqwZVYySJEmSpP4b5Rnb04Ajpqi/Hjikqh4PvAtYNq7+sKrav6oWjyg+SZIkSdIsMHdUK66q1UkWTlF/4cDTi4C9RhWLJEmSJGn2minX2L4S+OLA8wLOSXJJkiVTdUyyJMmaJGtuueWWkQYpSZIkSZp5RnbGdlhJDqNLbJ86UHxQVa1LsgdwbpJvV9XqifpX1TLaNObFixfXyAOWJEmSJM0o03rGNsnjgX8CjqyqW8fKq2pd+3kzsAI4YHoilCRJkiTNdNOW2CZZAHwaeElVfWegfOcku4wtA4cDE95ZWZIkSZKkkU1FTrIcOBSYl2Qt8HZge4CqWgqcCDwQOCUJwPp2B+QHASta2VzgjKo6a1RxSpIkSZL6bZR3RT52I/WvAl41Qfl1wH737iFJkiRJ0r3NlLsiS5IkSZK0WUxsJUmSJEm9ZmIrSZIkSeo1E1tJkiRJUq+Z2EqSJEmSes3EVpIkSZLUaya2kiRJkqReM7GVJEmSJPWaia0kSZIkqddMbCVJkiRJvWZiK0mSJEnqNRNbSZIkSVKvmdhKkiRJknptZIltklOT3Jzkyknqk+QDSa5NcnmSJwzUHZHkmlZ3wqhilCRJkiT131CJbZIvDVM2zmnAEVPUPwtY1B5LgA+19c4BTm71+wLHJtl3mDglSZIkSfc9c6eqTLIjsBMwL8lvAWlVuwIPnapvVa1OsnCKJkcCp1dVARcl2T3JQ4CFwLVVdV2L4czW9uqNvx1JkiRJ0n3NlIkt8GrgTXRJ7CVsSGx/THdWdUvsCdw48HxtK5uo/MAtfC1JkiRJ0iw1ZWJbVe8H3p/kDVX1wa382pmgrKYon3glyRK6qcwsWLBg60QmSZIkSeqNjZ2xBaCqPpjkKXTThOcOlJ++Ba+9FnjYwPO9gHXADpOUTxbbMmAZwOLFiydNgCVJkiRJs9NQiW2SjwH7AJcBd7XiArYksV0JHNeuoT0QuKOqbkpyC7Aoyd7AvwPHAH+yBa8jSZIkSZrFhkpsgcXAvu1GT0NJshw4lO7GU2uBtwPbA1TVUmAV8GzgWuBnwMtb3fokxwFnA3OAU6vqqmFfV5IkSZJ03zJsYnsl8GDgpmFXXFXHbqS+gNdPUreKLvGVJEmSJGlKwya284Crk3wT+OVYYVX94UiikiRJkiRpSMMmtu8YZRCSJEmSJG2uYe+KfN6oA5EkSZIkaXMMe1fkn7Dhu2R3oLsJ1E+ratdRBSZJkiRJ0jCGPWO7y+DzJH8EHDCKgCRJkiRJ2hTbbU6nqvoM8PStG4okSZIkSZtu2KnIzxt4uh3d99oO/Z22kiRJkiSNyrB3RX7uwPJ64AbgyK0ejSRJkiRJm2jYa2xfPupAJEmSJEnaHENdY5tkryQrktyc5D+TfCrJXqMOTpIkSZKkjRn25lEfAVYCDwX2BD7XyiRJkiRJmlbDJrbzq+ojVbW+PU4D5o8wLkmSJEmShjJsYvvDJC9OMqc9XgzcOsrAJEmSJEkaxrCJ7SuAFwD/AdwEHA14QylJkiRJ0rQbNrF9F/CnVTW/qvagS3TfsbFOSY5Ick2Sa5OcMEH9W5Jc1h5XJrkryW+3uhuSXNHq1mzCe5IkSZIk3YcM+z22j6+q28eeVNVtSX5vqg5J5gAnA88E1gIXJ1lZVVcPrOd9wPta++cCx1fVbQOrOayqfjhkjJIkSZKk+6Bhz9hul+S3xp60s6obS4oPAK6tquuq6lfAmcCRU7Q/Flg+ZDySJEmSJAHDn7H9O+DCJJ8Eiu5623dvpM+ewI0Dz9cCB07UMMlOwBHAcQPFBZyTpIAPV9WySfouAZYALFiwYOPvRJIkSZI0qwyV2FbV6e0616cDAZ43OKV4EploVZO0fS5wwbhpyAdV1bokewDnJvl2Va2eILZlwDKAxYsXT7Z+SZIkSdIsNewZW1oiu7FkdtBa4GEDz/cC1k3S9hjGTUOuqnXt581JVtBNbb5XYitJkiRJum8b9hrbzXExsCjJ3kl2oEteV45vlGQ34BDgswNlOyfZZWwZOBy4coSxSpIkSZJ6augztpuqqtYnOQ44G5gDnFpVVyV5Tatf2poeBZxTVT8d6P4gYEWSsRjPqKqzRhWrJEmSJKm/RpbYAlTVKmDVuLKl456fBpw2ruw6YL9RxiZJkiRJmh1GORVZkiRJkqSRM7GVJEmSJPWaia0kSZIkqddMbCVJkiRJvWZiK0mSJEnqNRNbSZIkSVKvmdhKkiRJknrNxFaSJEmS1GsmtpIkSZKkXjOxlSRJkiT1momtJEmSJKnXTGwlSZIkSb1mYitJkiRJ6rWRJrZJjkhyTZJrk5wwQf2hSe5Icll7nDhsX0mSJEmSAOaOasVJ5gAnA88E1gIXJ1lZVVePa3p+VT1nM/tKkiRJku7jRnnG9gDg2qq6rqp+BZwJHLkN+kqSJEmS7kNGmdjuCdw48HxtKxvvyUm+leSLSR67iX0lSZIkSfdxI5uKDGSCshr3/FLg4VV1Z5JnA58BFg3Zt3uRZAmwBGDBggWbHawkSZIkqZ9GecZ2LfCwged7AesGG1TVj6vqzra8Ctg+ybxh+g6sY1lVLa6qxfPnz9+a8UuSJEmSemCUie3FwKIkeyfZATgGWDnYIMmDk6QtH9DiuXWYvpIkSZIkwQinIlfV+iTHAWcDc4BTq+qqJK9p9UuBo4HXJlkP/Bw4pqoKmLDvqGKVJEmSJPXXKK+xHZtevGpc2dKB5ZOAk4btK0mSJEnSeKOciixJkiRJ0siZ2EqSJEmSes3EVpIkSZLUaya2kiRJkqReM7GVJEmSJPWaia0kSZIkqddMbCVJkiRJvWZiK0mSJEnqNRNbSZIkSVKvmdhKkiRJknrNxFaSJEmS1GsmtpIkSZKkXjOxlSRJkiT1momtJEmSJKnXRprYJjkiyTVJrk1ywgT1L0pyeXtcmGS/gbobklyR5LIka0YZpyRJkiSpv+aOasVJ5gAnA88E1gIXJ1lZVVcPNLseOKSqbk/yLGAZcOBA/WFV9cNRxShJkiRJ6r9RnrE9ALi2qq6rql8BZwJHDjaoqgur6vb29CJgrxHGI0mSJEmahUaZ2O4J3DjwfG0rm8wrgS8OPC/gnCSXJFkyWackS5KsSbLmlltu2aKAJUmSJEn9M7KpyEAmKKsJGyaH0SW2Tx0oPqiq1iXZAzg3yberavW9Vli1jG4KM4sXL55w/ZIkSZKk2WuUZ2zXAg8beL4XsG58oySPB/4JOLKqbh0rr6p17efNwAq6qc2SJEmSJN3DKBPbi4FFSfZOsgNwDLBysEGSBcCngZdU1XcGyndOssvYMnA4cOUIY5UkSZIk9dTIpiJX1fokxwFnA3OAU6vqqiSvafVLgROBBwKnJAFYX1WLgQcBK1rZXOCMqjprVLFKkiRJkvprlNfYUlWrgFXjypYOLL8KeNUE/a4D9htfLkmSJEnSeKOciixJkiRJ0siZ2EqSJEmSes3EVpIkSZLUaya2kiRJkqReG+nNoyRNnx+883enOwT11IITr5juECRJkjaJZ2wlSZIkSb1mYitJkiRJ6jUTW0mSJElSr5nYSpIkSZJ6zZtHSZJmtIM+eNB0h6AeuuANF0x3CJKkbcgztpIkSZKkXjOxlSRJkiT1momtJEmSJKnXRprYJjkiyTVJrk1ywgT1SfKBVn95kicM21eSJEmSJBjhzaOSzAFOBp4JrAUuTrKyqq4eaPYsYFF7HAh8CDhwyL6SJEm9cN7Bh0x3COqpQ1afN90hSL0wyrsiHwBcW1XXASQ5EzgSGExOjwROr6oCLkqye5KHAAuH6CtJkiRpGzrpzz833SGop477u+eOdP2jnIq8J3DjwPO1rWyYNsP0lSRJkiRppGdsM0FZDdlmmL7dCpIlwJL29M4k1wwdoTbHPOCH0x3ETJT//afTHYKG5zieytsn2gRrhnIsTyJvdBz3jGN5MnEs94jjeApv+PutspqHT1YxysR2LfCwged7AeuGbLPDEH0BqKplwLItDVbDSbKmqhZPdxzSlnAca7ZwLGu2cCxrNnAcT69RTkW+GFiUZO8kOwDHACvHtVkJvLTdHflJwB1VddOQfSVJkiRJGt0Z26pan+Q44GxgDnBqVV2V5DWtfimwCng2cC3wM+DlU/UdVaySJEmSpP4a5VRkqmoVXfI6WLZ0YLmA1w/bVzOC0741GziONVs4ljVbOJY1GziOp1G63FKSJEmSpH4a5TW2kiRJkiSNnIntDJfkriSXJbkyySeS7LQF6/pqkk2+U1uS3ZO8buD59UkePa7NPyZ5a5LXJHnp5sa4iXFt1vvR9JmJ43mg/Pgkv0iy2xR9b0gyb4LydyR58ybGceemtFd/JHlwkjOTfC/J1UlWJXnUJvT/i4Hlryb5L+Pq35TklCR/mOSErRn7FDGdluTobfFa2vomGZNLknx+K6z70K24nqcMPH90G/+XJfm3JMta+f5Jnr0Z61+Y5E82o59jfwZJsleSzyb5bhvP7283mt1Yv83dZ3hzkm+3/ZZvjXofd7L9jHFt/mKq+kn6vCzJSZsfWT+Y2M58P6+q/avqccCvgNcMViaZsw1i2B0YTATOpLtT9VgM2wFHAx+vqqVVdfo2iEn9NBPH85hj6e7IftQ2iEGzVJIAK4CvVtU+VbUv8BfAg4bp27angzstyxnY3jbHAMuramVVvXcrha5ZakvG5DZ2KPCUgecfAP6h/c94DPDBVr4/3Y1Hh5ZkLrAQ2OTEVjNHG8ufBj5TVYuARwEPAN49rt0W30MoyZx2w9tnAge0/ZaDgZnwpcKbnNjeV5jY9sv5wCPbUc2vJDkDuCLJjkk+kuSKJP+a5DCAJPdvR2gvT/Jx4P5jKxo8W5Tk6CSnteUHJVnRjkp9qx09fS+wTztq+j7uvaN1MHBDVX1/8MxVkn2SnJXkkiTnJ/mdtqG4ru3A7Z7k7iQHt/bnJ3lkkp2TnJrk4vZ+jtzY+1EvzZTxTJJ96P45/hVdgju2rgcmOafF8WEG/qEl+csk1yT5F+DRA+X3GvetfO8kX2/j+l1b/bepmeIw4NfjbpR4GfCvSb6U5NI2tse2awvTnY06BbgU+L/A/dv4/Gfgk8BzktxvrD3wUOBrg0fgk8xP8qk2vi5OclArv6Jta5Pk1rSzDUk+luQZbZv8vtbn8iSvbvVJclK6s3tfAPbYFr88jcRkY/J84AFJPpnujNQ/JwlAkicmOa9tx85O8pBW/sgk/9K2p5e2bedvJPn9tr18RJI/aMtXtP/pY2P4N2ekkixOdyZtId2BzuPb2H8a8BBg7UDMV6Q7M/dO4IWt3QuTHJDkwvZaF6bNKGufj08k+RxwDt22/2mt3/GO/V56OvCLqvoIQFXdBRwPvCLJ6wb/3pl6n+Hw9v/40tbnAa38hiQnJvka8Hy6BPJ1VfXj9np3VNVHW9upxvd72vrXJHlC+wx9L+2bYdLt96xOt39ydZKl6Q5q3kOSFyf5ZhuzH25j9r3c83/EhO1a+cuTfCfJecBBo/iDzDhV5WMGP4A728+5wGeB19Id1fwpsHer+3PgI235d4AfADsC/53uq5IAHg+sBxYPrrctHw2c1pY/DrypLc8BdqM7ynnluLiuAvZry0uB17fldwBvbstfAha15QOBL7fls4DHAs+hO0P2l8D9gOtb/XuAF7fl3YHvADtP9X589OMxg8fzXwH/k+5g3w3AHq38A8CJbfm/AgXMA54IXAHsBOxK95VlGxv3K4GXtuXXD8bsY/Y8gDfSnWUaXz4X2LUtz2tjJm083g08aaDtneP6fgE4si2fALyvLb8MOKktnwE8tS0vAP6tLS9tY/dxbXv7f1r5d+kO5iwB/qqV3Q9YA+wNPA84t31uHgr8CDh6un+/PrbqmDwUuAPYq237vg48FdgeuBCY39q9cGDb+w3gqLa8Y9sGHgp8nu5s6yVt/O0I3Ag8qrU9fWBbfAMwry0vpjuTDAP7D+35y1t8X6RLXnavceO+Pd8VmNuWnwF8aqDdWuC3B97v5wf6OfZ79phiLP9rqxv8e0+4z0C3/V0N7Nzq/gcb/s/fALy1Le8C3D5JHBsb369ty/8AXN7WNR+4eWAs/gJ4RBtn546NsbHPB/AY4HPA9q38FDbsQwzu80zYju7A0A/a6+4AXDD4uZmtj5F+3Y+2ivsnuawtn093NP8pwDer6vpW/lTaFJ2q+naS79NNzziYbsecqro8yeVDvN7T6T4QVHck7I4kvzVBu+XAMUmuAo4EThysbEe/ngJ8oh0Ahu4fx9j7OJjuH8jfAH8GnEe30wVwOPCH2XDN4o50/yg35/1oZpmp4/kYup21u5N8mu5I7cntNZ/X+n8hye2t/dOAFVX1M4AkK9vPqcb9QcAft+WPAX87RPyaPQK8J90MlbuBPdkwFfT7VXXRFH3HZsl8tv18xQRtngHsOzDudk2yCxu2t98HPgQsSbIncFtV3ZnkcODx2XAN4W7AotZnefvcrEvy5c1505rxvllVawHatnkhXSL3OODcNp7mADe18bRnVa0AqKpftH7Q7VwvAw6vqnVJ9qM7WP2d9jofpTug94/DBlZVH0lyNnAE3X7Gq9t6x9sN+GiSRXQHH7cfqDu3qm6b5CUc+/0Tur/xZOWDf+/J9hmeBOwLXNDG7g50B3XGfHwjrwXdLK2pxvfK9vMK4AFV9RPgJ+nu47F7q/tmVV0HkGQ53b7PJwde4w/oDqJf3OK8P3DzBLFM1u5AuoNGt7TX+DjdvtSsZmI78/28qvYfLGgD96eDRVP0n+xDOVi+42bEtZxuas95wOVVNf7Dth3wo/GxN+fTTTl6KF1C/Ba6o1erW32AP66qawY7tfft91P124wbz0keT7czc+7AP7nr6BLbYV9zzFTjfqp1afa4im7WwHgvojty/sSq+nWSG9gwVn86QftBnwH+PskTgPtX1aUTtNkOeHJV/XywMMlquh2uBXSzY45q8Z0/1gR4Q1WdPa7fs3G8zhaTjUmAXw4s30W3Xxjgqqp68mDDJLtO8Ro30Y3n3wPWMfV2fD0bLoWbcntdVeuAU4FTk1xJl3CP9y7gK1V1VLopzV8dqJvqs+XY75+r2HCAGPjNuHwY3fgd//eeLAk+t6qOnaCOsXVU1Y+T/DTJI8YS0HHrmMrY5+pu7vkZu5sNudf42MY/D/DRqnrbRl5rwnZJ/miCdc56XmM7O6ym22ki3Z03FwDXjCt/HN1UjDH/meQxbU7/4M1yvkQ3PXTswvldgZ/QTaP4jar6HnAr3TUry8cHVN31CNcneX5bVwaOtH6D7qzW3e2I72XAq9mwo3U28IbkN9f6/N4E73P8+9Hssa3H87HAO6pqYXs8FNgzycPHveazgLGzvauBo9Jdw7ML8FzY6Li/gA3Xpr9os387mum+DNwvyZ+NFST5feDhdNPQfp3uuvGHT7GOXyf5zVmnqrqTbmf9VCbY3jbnAMcNvOb+re+NdNPaFrWds68Bb+ae29vXjr1ekkcl2ZlujB/TPjcPobtOU/002Zg8ZJL21wDzkzy5td0+yWPb9m1t22Emyf2y4c72P6Kb8v6eJIcC3wYWJnlkq38J3YFw6KZaPrEtDyYp99g2JzliYFw+GHgg8O/j29Gdaf33tvyyyX8N9+rn2O+fLwE7ZcO9AuYAfwecBvxsXNvJ9hkuAg4aG5tJdsrkd63/G+DksYM6SXZNsoSpx/ewDkh3743t6Kb7f22C93p0kj3aa/922y+Be/6PmKzdN4BD090rZHu6mWiznont7HAKMCfJFXRTKF5WVb+km3b2gDb94q3ANwf6nEB3TcyX6Y60jvlvwGFtXZcAj62qW+mmbFyZdrOdZjndNZArJonrRcArk3yL7ijbkQAtthvpNi7Q7WDtQjdlA7qjr9sDl7cjtGM32pnq/Wj22Nbj+RjuPYZXtPK/Bg5OcindtLUfALQzZh+nOyjzKTYkCTDJuG+xvD7JxXQ7YpqFqrvI6SjgmeluFnIV3bWDq4DFSdbQjZFvT7GaZXTbv38eKFsO7Ed3V/qJvLGt//IkV3PPO45/g+5eBdCN1T3ZsBP1T8DVwKVte/thujMKK+iuw72C7rO3qTttmiGmGJPrJmn/K7ozvH/btmOXseFuxS8B3ti2wxcCDx7o9590B/lOphurL6e7LOMKujNVYzev+mvg/UnOpzvLNuZzdAcMx24edThwZYvhbOAtVfUfwFfopt1fluSFwP8C/ibJBXTTpidzObA+3Y2vjsex3zsDY/n5Sb5Lt137BRPfJXjCfYY2NfdlwPJWdxHdvuxEPkQ33i5uY+Q84GftpMxk43tYX6c7OXQlcD3j9kOq6mq6+3+c0+I8l+66WRj4HzFZu6q6ie5z/nXgX+huTjjrpRsjkiRJkqRRarMa3lxVz5nmUGYdz9hKkiRJknrNM7aSJEmSpF7zjK0kSZIkqddMbCVJkiRJvWZiK0mSJEnqNRNbSZIkSVKvzZ3uACRJ6rskd7Hhu7ih+77ZHYH7VdXbBtrtDyyvqsckuQH4CRu+y3N1Vb0xyWnAM4FHVNUvk8wD1tB9R+jHWtsFwB3t8cOqesa4eH53srbAXsDzq+qK1vatwCPovlPx34BrgB2A1cDrWv+x8jF/X1Wnb9pvSZKk0TGxlSRpy/28qvYfLEjyaOCLwNsGio8Bzhh4flhV/XCC9d0FvAL40FhBS0T3b+s+Dfh8VX1yomCmapvkCOCUJAcDDwVeDSwGdgO+V1X7J5kLfBn4I+DSsfIpfwOSJE0jpyJLkjQCVXUN8KMkBw4Uv4DubO7G/CNwfEswt3ZcZwE3AS8F/gF4R1XdPq7NeuBC4JFb+/UlSRoFE1tJkrbc/ZNcNvB4YStfTneWliRPAm6tqu8O9PvKQJ/jB8p/AHwNeMmI4n0T8G5gflV9bHxlkp2AP2DD9Op9xr2/p40oLkmSNotTkSVJ2nL3morcnAlcmOTP6RLc5ePqJ5uKDPAeYCXwha0WZVNV65J8Gfj8uKp9klwGFPDZqvpikoU4FVmSNMOZ2EqSNCJVdWO7SdQhwB8DT96Evte2JPMFo4mOu9tjkAmsJKmXTGwlSRqt5XTXsn6vqtZuYt93M4IztpIkzTZeYytJ0pYbf43tewfqPgE8lolvGjV4je29vj6nqq6iuyvxdBt/je0bpzsgSZIGpaqmOwZJkiRJkjabZ2wlSZIkSb3mNbaSJPVYkt8Fxn9lzy+r6sCJ2kuSNBs5FVmSJEmS1GtORZYkSZIk9ZqJrSRJkiSp10xsJUmSJEm9ZmIrSZIkSeo1E1tJkiRJUq/9fx0f5X7ywesYAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 1152x216 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "categorical_attributes = interactions_df.select_dtypes(include = ['object'])\n",
    "\n",
    "plt.figure(figsize=(16,3))\n",
    "sns.countplot(data = categorical_attributes, x = 'EVENT_TYPE')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take note of the DAILY_TEMPERATURE values included in our interaction dataset, you will be using them for getting recomendations. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['hot', 'very hot', 'lukewarm', 'slightly cold', 'cold',\n",
       "       'very cold'], dtype=object)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "interactions_df['DAILY_TEMPERATURE'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Chart the counts of each DAILY_TEMPERATURE generated for the interactions dataset. Check how the temperature is changing during the seasonality of the sample interactions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<AxesSubplot:xlabel='DAILY_TEMPERATURE', ylabel='count'>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZgAAAEHCAYAAACTC1DDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAhIUlEQVR4nO3de7hdVXnv8e/PRCFeAgkEigkYWvLYAlZo8gQU79EktWrSNtRwDhJs2vThoFVbPYV6aiw0KrVKCxQqSkxAFNJYJGoppkFEJQIbDIZwMVujkEJJZEfEC9TE9/wx38Wee7H2zto7e6yd7Pw+z7OeNee7xhhzzHV717yNpYjAzMxsuD1rpDtgZmajkxOMmZkV4QRjZmZFOMGYmVkRTjBmZlbE2JHuwN7i0EMPjalTp450N8zM9il33nnnjyJiUqvHnGDS1KlT6erqGulumJntUyT9sL/HvIvMzMyKcIIxM7MinGDMzKyIYglG0oslbajdfiLp3ZImSloraXPeT6jVOVdSt6QHJM2pxadL2piPXSRJGT9A0rUZv03S1FqdRbmMzZIWlVpPMzNrrViCiYgHIuKEiDgBmA78HLgOOAdYFxHTgHU5j6RjgYXAccBc4FJJY7K5y4AlwLS8zc34YmBHRBwDXAhckG1NBJYCJwEzgaX1RGZmZuV1ahfZLOB7EfFDYB6wMuMrgfk5PQ+4JiKeiogtQDcwU9IRwPiIWB/VyJxXNtVptLUamJVbN3OAtRHRExE7gLX0JiUzM+uATiWYhcDncvrwiHgEIO8Py/hk4KFana0Zm5zTzfE+dSJiJ/A4cMgAbfUhaYmkLkld27dvH/LKmZnZMxVPMJKeA7wF+NfdFW0RiwHiQ63TG4i4PCJmRMSMSZNaXidkZmZD1IktmN8F7oqIR3P+0dztRd5vy/hW4MhavSnAwxmf0iLep46kscBBQM8AbZmZWYd04kr+0+jdPQawBlgEfCTvr6/FPyvp48ALqQ7m3x4RuyQ9Ielk4DbgDODiprbWAwuAmyIiJN0IfKh2YH82cO5QOj/9fVcOpdqIuvOjZ4x0F8zMyiYYSc8F3gD8WS38EWCVpMXAg8CpABGxSdIq4F5gJ3B2ROzKOmcBK4BxwA15A7gCuEpSN9WWy8Jsq0fS+cAdWe68iOgpspJmZtZS0QQTET+nOuhejz1GdVZZq/LLgGUt4l3A8S3iT5IJqsVjy4Hlg++1mZkNB1/Jb2ZmRTjBmJlZEU4wZmZWhBOMmZkV4QRjZmZFOMGYmVkRTjBmZlaEE4yZmRXhBGNmZkU4wZiZWRFOMGZmVoQTjJmZFeEEY2ZmRTjBmJlZEU4wZmZWhBOMmZkV4QRjZmZFOMGYmVkRTjBmZlaEE4yZmRXhBGNmZkUUTTCSDpa0WtL9ku6T9DJJEyWtlbQ57yfUyp8rqVvSA5Lm1OLTJW3Mxy6SpIwfIOnajN8maWqtzqJcxmZJi0qup5mZPVPpLZh/Av4jIn4TeClwH3AOsC4ipgHrch5JxwILgeOAucClksZkO5cBS4BpeZub8cXAjog4BrgQuCDbmggsBU4CZgJL64nMzMzKK5ZgJI0HXgVcARAR/xMRPwbmASuz2Epgfk7PA66JiKciYgvQDcyUdAQwPiLWR0QAVzbVabS1GpiVWzdzgLUR0RMRO4C19CYlMzPrgJJbML8ObAc+Lenbkj4l6XnA4RHxCEDeH5blJwMP1epvzdjknG6O96kTETuBx4FDBmirD0lLJHVJ6tq+ffuerKuZmTUpmWDGAr8DXBYRJwI/I3eH9UMtYjFAfKh1egMRl0fEjIiYMWnSpAG6ZmZmg1UywWwFtkbEbTm/mirhPJq7vcj7bbXyR9bqTwEezviUFvE+dSSNBQ4CegZoy8zMOqRYgomI/wYekvTiDM0C7gXWAI2zuhYB1+f0GmBhnhl2NNXB/NtzN9oTkk7O4ytnNNVptLUAuCmP09wIzJY0IQ/uz86YmZl1yNjC7b8TuFrSc4DvA2+nSmqrJC0GHgROBYiITZJWUSWhncDZEbEr2zkLWAGMA27IG1QnEFwlqZtqy2VhttUj6Xzgjix3XkT0lFxRMzPrq2iCiYgNwIwWD83qp/wyYFmLeBdwfIv4k2SCavHYcmD5ILprZmbDyFfym5lZEU4wZmZWhBOMmZkV4QRjZmZFOMGYmVkRTjBmZlaEE4yZmRXhBGNmZkU4wZiZWRFOMGZmVoQTjJmZFeEEY2ZmRTjBmJlZEU4wZmZWhBOMmZkV4QRjZmZFOMGYmVkRTjBmZlaEE4yZmRXhBGNmZkU4wZiZWRFFE4ykH0jaKGmDpK6MTZS0VtLmvJ9QK3+upG5JD0iaU4tPz3a6JV0kSRk/QNK1Gb9N0tRanUW5jM2SFpVcTzMze6ZObMG8NiJOiIgZOX8OsC4ipgHrch5JxwILgeOAucClksZkncuAJcC0vM3N+GJgR0QcA1wIXJBtTQSWAicBM4Gl9URmZmbljcQusnnAypxeCcyvxa+JiKciYgvQDcyUdAQwPiLWR0QAVzbVabS1GpiVWzdzgLUR0RMRO4C19CYlMzPrgNIJJoCvSLpT0pKMHR4RjwDk/WEZnww8VKu7NWOTc7o53qdOROwEHgcOGaCtPiQtkdQlqWv79u1DXkkzM3umsYXbPyUiHpZ0GLBW0v0DlFWLWAwQH2qd3kDE5cDlADNmzHjG42ZmNnRFt2Ai4uG83wZcR3U85NHc7UXeb8viW4Eja9WnAA9nfEqLeJ86ksYCBwE9A7RlZmYdUizBSHqepBc0poHZwD3AGqBxVtci4PqcXgMszDPDjqY6mH977kZ7QtLJeXzljKY6jbYWADflcZobgdmSJuTB/dkZMzOzDim5i+xw4Lo8o3gs8NmI+A9JdwCrJC0GHgROBYiITZJWAfcCO4GzI2JXtnUWsAIYB9yQN4ArgKskdVNtuSzMtnoknQ/ckeXOi4iegutqZmZNiiWYiPg+8NIW8ceAWf3UWQYsaxHvAo5vEX+STFAtHlsOLB9cr83MbLj4Sn4zMyvCCcbMzIpwgjEzsyKcYMzMrAgnGDMzK8IJxszMinCCMTOzIpxgzMysCCcYMzMrwgnGzMyKcIIxM7MinGDMzKwIJxgzMyvCCcbMzIpwgjEzsyKcYMzMrAgnGDMzK8IJxszMiij2l8lmVtbXXvXqke7CoL36lq+NdBesg7wFY2ZmRTjBmJlZEcUTjKQxkr4t6Us5P1HSWkmb835Crey5krolPSBpTi0+XdLGfOwiScr4AZKuzfhtkqbW6izKZWyWtKj0epqZWV+d2IJ5F3Bfbf4cYF1ETAPW5TySjgUWAscBc4FLJY3JOpcBS4BpeZub8cXAjog4BrgQuCDbmggsBU4CZgJL64nMzMzKK5pgJE0Bfg/4VC08D1iZ0yuB+bX4NRHxVERsAbqBmZKOAMZHxPqICODKpjqNtlYDs3LrZg6wNiJ6ImIHsJbepGRmZh3QVoKRtK6dWAv/CPxf4Fe12OER8QhA3h+W8cnAQ7VyWzM2Oaeb433qRMRO4HHgkAHaal6HJZK6JHVt3769jdUxM7N2DZhgJB2Yu5sOlTQhj59MzGMdL9xN3TcB2yLizjb7ohaxGCA+1Dq9gYjLI2JGRMyYNGlSm900M7N27O46mD8D3k2VTO6k94v7J8A/76buKcBbJL0ROBAYL+kzwKOSjoiIR3L317YsvxU4slZ/CvBwxqe0iNfrbJU0FjgI6Mn4a5rq3Lyb/pqZ2TAacAsmIv4pIo4G3hsRvx4RR+ftpRFxyW7qnhsRUyJiKtXB+5si4nRgDdA4q2sRcH1OrwEW5plhR1MdzL89d6M9IenkPL5yRlOdRlsLchkB3AjMzq2uCcDsjJmZWYe0dSV/RFws6eXA1HqdiLhyCMv8CLBK0mLgQeDUbGuTpFXAvcBO4OyI2JV1zgJWAOOAG/IGcAVwlaRuqi2XhdlWj6TzgTuy3HkR0TOEvpqZ2RC1lWAkXQX8BrABaHzpN87o2q2IuJncRRURjwGz+im3DFjWIt4FHN8i/iSZoFo8thxY3k7/zMxs+LU7FtkM4Njc/WRmZrZb7V4Hcw/wayU7YmZmo0u7WzCHAvdKuh14qhGMiLcU6ZWZme3z2k0wHyzZCTMzG33aPYvMf+JgZmaD0u5ZZE/QeyX8c4BnAz+LiPGlOmZmZvu2drdgXlCflzSfapRiMzOzloY0mnJEfAF43fB2xczMRpN2d5H9QW32WVTXxfiaGDMz61e7Z5G9uTa9E/gB1X+xmJmZtdTuMZi3l+6ImZmNLu3+4dgUSddJ2ibpUUmfz3+rNDMza6ndg/yfphoa/4VU/wz5xYyZmZm11G6CmRQRn46InXlbAfgvIM3MrF/tJpgfSTpd0pi8nQ48VrJjZma2b2v3LLI/Bi4BLqQ6PflWwAf+zayYS/7yiyPdhUF7x8fevPtC+5F2E8z5wKKI2AEgaSLwD1SJx8zM7Bna3UX2243kAtVfEgMnlumSmZmNBu1uwTxL0oSmLZh265qNmFMuPmWkuzAo33znN0e6C2bDpt0k8THgVkmrqY7B/BGwrFivzMxsn9fulfxXSuqiGuBSwB9ExL1Fe2ZmZvu0tkdTjoh7I+KSiLi4neQi6UBJt0u6W9ImSX+b8YmS1kranPcTanXOldQt6QFJc2rx6ZI25mMXSVLGD5B0bcZvkzS1VmdRLmOzpEXtrqeZmQ2PIQ3X36angNdFxEuBE4C5kk4GzgHWRcQ0YF3OI+lYYCFwHDAXuFTSmGzrMmAJMC1vczO+GNgREcdQnUJ9QbY1EVgKnET1vzVL64nMzMzKK5ZgovLTnH123oJqFOaVGV8JzM/pecA1EfFURGwBuoGZko4AxkfE+ogI4MqmOo22VgOzcutmDrA2InryxIS19CYlMzPrgJJbMORV/xuAbVRf+LcBh0fEIwB5f1gWnww8VKu+NWOTc7o53qdOROwEHgcOGaCt5v4tkdQlqWv79u17sKZmZtasaIKJiF0RcQIwhWpr5PgBiqtVEwPEh1qn3r/LI2JGRMyYNMlDq5mZDaeiCaYhIn4M3Ey1m+rR3O1F3m/LYluBI2vVpgAPZ3xKi3ifOpLGAgcBPQO0ZWZmHVIswUiaJOngnB4HvB64n2rY/8ZZXYuA63N6DbAwzww7mupg/u25G+0JSSfn8ZUzmuo02loA3JTHaW4EZkuakAf3Z2fMzMw6pOTV+EcAK/NMsGcBqyLiS5LWA6skLQYeBE4FiIhNklYB91L9LfPZEbEr2zoLWAGMA27IG8AVwFWSuqm2XBZmWz2SzgfuyHLn5fA2ZmbWIcUSTER8hxbjlUXEY8Csfuoso8UIARHRBTzj+E1EPEkmqBaPLQeWD67XZmY2XDpyDMbMzPY/TjBmZlaEE4yZmRXhBGNmZkU4wZiZWRFOMGZmVoQTjJmZFeEEY2ZmRTjBmJlZEU4wZmZWhBOMmZkV4QRjZmZFOMGYmVkRTjBmZlaEE4yZmRXhBGNmZkU4wZiZWRFOMGZmVoQTjJmZFeEEY2ZmRTjBmJlZEcUSjKQjJX1V0n2SNkl6V8YnSloraXPeT6jVOVdSt6QHJM2pxadL2piPXSRJGT9A0rUZv03S1FqdRbmMzZIWlVpPMzNrreQWzE7gLyPit4CTgbMlHQucA6yLiGnAupwnH1sIHAfMBS6VNCbbugxYAkzL29yMLwZ2RMQxwIXABdnWRGApcBIwE1haT2RmZlZesQQTEY9ExF05/QRwHzAZmAeszGIrgfk5PQ+4JiKeiogtQDcwU9IRwPiIWB8RAVzZVKfR1mpgVm7dzAHWRkRPROwA1tKblMzMrAM6cgwmd12dCNwGHB4Rj0CVhIDDsthk4KFata0Zm5zTzfE+dSJiJ/A4cMgAbTX3a4mkLkld27dv34M1NDOzZsUTjKTnA58H3h0RPxmoaItYDBAfap3eQMTlETEjImZMmjRpgK6ZmdlgFU0wkp5NlVyujoh/y/CjuduLvN+W8a3AkbXqU4CHMz6lRbxPHUljgYOAngHaMjOzDil5FpmAK4D7IuLjtYfWAI2zuhYB19fiC/PMsKOpDubfnrvRnpB0crZ5RlOdRlsLgJvyOM2NwGxJE/Lg/uyMmZlZh4wt2PYpwNuAjZI2ZOyvgY8AqyQtBh4ETgWIiE2SVgH3Up2BdnZE7Mp6ZwErgHHADXmDKoFdJambastlYbbVI+l84I4sd15E9BRaTzMza6FYgomIb9D6WAjArH7qLAOWtYh3Ace3iD9JJqgWjy0HlrfbXzMzG16+kt/MzIpwgjEzsyKcYMzMrAgnGDMzK8IJxszMinCCMTOzIpxgzMysCCcYMzMrwgnGzMyKcIIxM7MinGDMzKwIJxgzMyvCCcbMzIpwgjEzsyJK/h+MmZn1Y9npC0a6C4P2/s+sHlR5b8GYmVkRTjBmZlaEE4yZmRXhBGNmZkU4wZiZWRFOMGZmVkSxBCNpuaRtku6pxSZKWitpc95PqD12rqRuSQ9ImlOLT5e0MR+7SJIyfoCkazN+m6SptTqLchmbJS0qtY5mZta/klswK4C5TbFzgHURMQ1Yl/NIOhZYCByXdS6VNCbrXAYsAablrdHmYmBHRBwDXAhckG1NBJYCJwEzgaX1RGZmZp1RLMFExC1AT1N4HrAyp1cC82vxayLiqYjYAnQDMyUdAYyPiPUREcCVTXUaba0GZuXWzRxgbUT0RMQOYC3PTHRmZlZYp6/kPzwiHgGIiEckHZbxycC3auW2ZuyXOd0cb9R5KNvaKelx4JB6vEWdPiQtodo64qijjhr6Wu3DHjzvJSPdhUE56gMbR7oLZtamveUgv1rEYoD4UOv0DUZcHhEzImLGpEmT2uqomZm1p9MJ5tHc7UXeb8v4VuDIWrkpwMMZn9Ii3qeOpLHAQVS75Ppry8zMOqjTCWYN0DiraxFwfS2+MM8MO5rqYP7tuTvtCUkn5/GVM5rqNNpaANyUx2luBGZLmpAH92dnzMzMOqjYMRhJnwNeAxwqaSvVmV0fAVZJWgw8CJwKEBGbJK0C7gV2AmdHxK5s6iyqM9LGATfkDeAK4CpJ3VRbLguzrR5J5wN3ZLnzIqL5ZAMzMyusWIKJiNP6eWhWP+WXActaxLuA41vEnyQTVIvHlgPL2+6smZkNu73lIL+ZmY0yTjBmZlaEE4yZmRXhBGNmZkU4wZiZWRFOMGZmVoQTjJmZFeEEY2ZmRTjBmJlZEU4wZmZWhBOMmZkV4QRjZmZFOMGYmVkRTjBmZlaEE4yZmRXhBGNmZkU4wZiZWRFOMGZmVoQTjJmZFeEEY2ZmRTjBmJlZEaM6wUiaK+kBSd2Szhnp/piZ7U9GbYKRNAb4Z+B3gWOB0yQdO7K9MjPbf4zaBAPMBLoj4vsR8T/ANcC8Ee6Tmdl+QxEx0n0oQtICYG5E/EnOvw04KSLeUSuzBFiSsy8GHuhgFw8FftTB5XWa12/f5vXbd3V63V4UEZNaPTC2g53oNLWI9cmmEXE5cHlnutOXpK6ImDESy+4Er9++zeu379qb1m007yLbChxZm58CPDxCfTEz2++M5gRzBzBN0tGSngMsBNaMcJ/MzPYbo3YXWUTslPQO4EZgDLA8IjaNcLfqRmTXXAd5/fZtXr99116zbqP2IL+ZmY2s0byLzMzMRpATjJmZFeEEU5CkqZLuGUT5MyW9sGSfRpKknw6y/Pw9GX2hneVJ+oGkQ4e6jJEi6WZJM3L63yUd3G75pvgJkt5Ymz9T0iXD3uHe9lu+JpJW5LVrHSPpg5Le2yI+qM/taDHQaz/Yz26DE8ze5Uxgn0owqpR6H82nGuZn1CjxfEXEGyPix0OsfgLwxt0Vsr1H4c/csNonOrmPGyPpk5I2SfqKpHH5q/Fbkr4j6TpJE/LX2wzgakkbJI3rVAclXSDp/9TmPyjpL3P6fZLuyL7+bcamSrpP0qXAXcDfSLqwVv9PJX28n2Utk3R3rv/hGXuRpHW5jHWSjpL0cuAtwEfz+fiNPVi/10j6Um3+EklnNpUZJ+k/su/Pk7Q81/vbkuZlmX+X9Ns5/W1JH8jp8yX9iaTnZ//vkrSxVq/5+XqlpPslfUrSPZKulvR6Sd+UtFnSzFq/nifpy/mc3SPprS3W7+mtMEl/k22vlfS5pl/op0q6XdJ3Jb1S1en75wFvzef4rbU2XyBpi6Rn5/z4XM6zm5Z9eL6H787byzP+F9nfeyS9u0Wfla/DvZK+DBy2+1eyPZLOyPfS3ZKuavX+alFnepZfD5w9XH0ZoI8d+cypGvD3rly3dRmbKOkL2f63Gu/ppnpHS1qf/Th/yCsaEb4VugFTgZ3ACTm/Cjgd+A7w6oydB/xjTt8MzBiBfp4IfK02fy9wFDCb6pRHUf0Y+RLwqlyvXwEnZ/nnAd8Dnp3ztwIvabGcAN6c038P/L+c/iKwKKf/GPhCTq8AFuzBev00718DfKkWvwQ4M6d/kOvzn8AZGfsQcHpOHwx8N9fxHKovn/FU11ndmGW+SjXU0FhgfMYOBbrzuWt+vhrvi5fk83onsDzLzmusf5b9Q+CTtfmDmt8ruQ6HUv1A2QCMA14AbAbeWyv/sZx+I/CfOX0mcEmt/afngU8D83N6SaN+03N8LfDunB4DHARMBzbmc/Z8YBNwYtNr8gfA2qzzQuDHe/Ja1/pzHNWQT4fm/ET6f399sPb81D+THwXu2dc/c8Ak4CHg6MZzkfcXA0tz+nXAhhav/Rp6Pw9nN163wd68BVPelojYkNN3Ar8BHBwRX8vYSqo30IiJiG8Dh0l6oaSXAjsi4kGqN/ts4NtUv5p+E5iW1X4YEd/K+j8DbgLeJOk3qd70G1ss6n+oPjBQPRdTc/plwGdz+irgFcO4eu24Hvh0RFyZ87OBcyRtoPpiPpDqw/91qtfqFcCXgedLei4wNSIeoPpS+JCk71AlrMnA4dnm089X2hIRGyPiV1RfwOui+jRvpPd5Iedfn794XxkRjw+wHq8Aro+IX0TEE1RfrHX/lvf1534gnwLentNvp0o4zV4HXAYQEbuyf68ArouIn0XET3O5r2yq9yrgc1nnYar3z3B4HbA6In6UfephN+8vSQfR9zN51TD1pV8d+sydDNwSEVuyTk/GX0GuY0TcBBySz0HdKcDncnrIz8eovdByL/JUbXoX1S/ivdFqYAHwa1QjT0P1hfnhiPhEvaCkqcDPmup/Cvhr4H5afxEB/DK/RKF6Lvp7/w33xVk76bs7+MCmx78J/K6kz2b/BPxhJo2n5S6lGcD3qX59Hwr8KdUXNsD/pvrVOD0ifinpB7VlNT9f9ffFr2rzv6L2vETEdyVNp9rq+LCkr0TEef2sZ6vx91otc6Dn/mkR8c3cNfNqYExEtHvge3f9eHoRbZYbDLXRbvPj7dQpofRnrr/12u04jQPEBsVbMJ33OLBDUuMX3duAxi+nJ6h2bYyEa6iG01lA9caHahSEP5b0fABJkyW13FceEbdRjf32v+j95dOuW3PZUH1JfyOnh+v5+CFwrKQD8pfarKbHPwA8Blya8zcC75QkAEknAkT1tw8PAX8EfItqi+a9eQ/V7qFtmVxeC7xoTzuu6qzCn0fEZ4B/AH5ngOLfAN4s6cB8zX6vjUXs7jm+kur17O9HwzrgrOzrGEnjgVuA+ZKeK+l5wO/T+xw13AIszDpHAK9to6/tWAf8kaRDsk8T6f/9BUBUJ0g8LukVtTKdUPoztx54taSjs62JGb+FXEdJrwF+FBE/aar7Tfo+Z0PiLZiRsQj4l9y98n16d0OsyPgvgJdFxC861aGI2CTpBcB/RcQjGfuKpN8C1ud37U+pjiHt6qeZVVTHm3YMcvF/DiyX9D5gO73PxzXAJyX9OdX++e8Nsl0AIuIhSauo9rNvptr90Ozd2Ye/B5YC/wh8J5PMD4A3ZbmvA7Mi4ueSvk41iGrjy/Nq4IuSuqiOhdw/lP42eQnViQ6/An5Jfpm3EhF3SFoD3E2VVLuoftAM5Kv07g78cIvHrwb+jv5/NLwLuFzSYqr3xVkRsV7SCuD2LPOp3CVUdx3V7qyNVMe4vsYwyPfxMuBrknZRvdb9vb/q3p5lfk71JV9c6c9cRGxX9Zck/6bqrLNtwBuojj19Onfl/pzq+6jZu4DPSnoX8PmhrqOHirFho+pMrQsjYt1I92V/Jen5EfHT/PFyC7AkIu7ag/YWAPMi4m3D1kkbNnv7Z85bMLbHVF3kdztw9976Rt+PXK7q4tQDgZV7mFwupvrLcV8ns5fZVz5z3oIxM7MifJDfzMyKcIIxM7MinGDMzKwIJxgzMyvCCcZGJUm7VA3guCkH+vsLNY1AK+l6VYMb1mNPD+Gu2hDykmbn4H+Niy/HZPsvb7Hs9+djG2r92CDpz7P9/6rFNkg6WNWAnJHXkzTaOTFj9f5syTp3SXpZi/gGSbdm/ExJ2zN2v6T3NPWz0f6cnL8uy3ZLerzW3svV9LcGqg0gOtBy+lvfIbyktg9ygrHR6hcRcUJEHEd1cdkbqS6gBJ4+zfN3gIMbVzoPJCK+QnXxYiMBvBO4IyJubVF2WS77hFo/ToiIi7LIhbXYCdE71P5GoD5a8kKqiybr3pftngN8ojmet3rSuzbLnwK8X9KRtcdOo7qq/bTs9+9n2T8Bvl5r7xnr2MJAy+lvfW2Uc4KxUS8itlGNBvyOxhYI1SjFX6R3uI52vAc4V9JxwDuAvxrmrj4IHKhqCHwBc4Eb+il7C3BMuw1HxGNUozsfAdVw+VRDlJwJzJbUPD7bkDQvx/ZvTjC2X4iI71O93xvjOp1GNfzJ53K6nTYeoRpCZj3wd7XRaQfrPbXdRV9temw1cCrwcqrRdJ96Ru3Km6m2eBo+Wmvz6ubCqv4D5UCq4XKg2tLYksPv3MwwXUzZYjkw8PraKOYEY/uTxvGTw6l+/X8jIr4L7JR0fJtt/DPVyMIr9qAf9V1GzYM8rqJKMI0E2OyjqsYNW0Lv7jrou4usPjjhWyVtohrz7p8i4smMn0bvCL7XsPsku7vRdvtbDgy8vjaKOcHYfkHSr1MNGLiN6jjHBGCLqiH1p9LmbrL8/5Ziw19ExH9TDWr5BqqRgZs1Eskb2hw+/9o8DvVK4GOSfk3SGKpdhB/I9b+Y6u8KBhpV+TGq56xhIvCjgZbTRt9slHOCsVFP0iTgX6j+rS+ofq3PjYipETGV6h8Y2z0O0wkfAP4qIvobQXfQImI91R9HvQt4PdUYVkfmc/AiqhFz5w/QxM1Ufy1BJqjTqUZiHmg5tp9zgrHRalzjNGWqf5f8CvC3qv646Siq/3MBIP/x7yeSTmrRzickbc3b+haPD0X9mMSG7NPTIuLWiPjCINv8aFObz2lR5gKqYemXUA2XX/d5qv8V6c/5wDGS7qYaAr8b+Ew/ZS8A3l7bIhpwfW308mCXZmZWhLdgzMysCP8fjNkekPR+qrO+6v41IpaNRH/M9ibeRWZmZkV4F5mZmRXhBGNmZkU4wZiZWRFOMGZmVsT/B99bCBKOZHwqAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.countplot(data = categorical_attributes, x = 'DAILY_TEMPERATURE')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Upload Data\n",
    "Now we will upload the data we prepared to S3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "boto3.Session().resource('s3').Bucket(bucket).Object(interactions_filename).upload_file(interactions_filename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "boto3.Session().resource('s3').Bucket(bucket).Object(items_filename).upload_file(items_filename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "boto3.Session().resource('s3').Bucket(bucket).Object(users_filename).upload_file(users_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure Amazon Personalize\n",
    "\n",
    "Now that we've prepared our three datasets and uploaded them to S3 we'll need to configure the Amazon Personalize service to understand our data so that it can be used to train models for generating recommendations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Schemas for Datasets\n",
    "\n",
    "Amazon Personalize requires a schema for each dataset so it can map the columns in our CSVs to fields for model training. Each schema is declared in JSON using the [Apache Avro](https://avro.apache.org/) format.\n",
    "\n",
    "Let's define and create schemas in Personalize for our datasets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Items Dataset Schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"schemaArn\": \"arn:aws:personalize:us-east-1:903376376581:schema/adx-weather-schema-items\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"68bed50f-deca-4b3d-aaeb-4bc2f49dabea\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:40:13 GMT\",\n",
      "      \"x-amzn-requestid\": \"68bed50f-deca-4b3d-aaeb-4bc2f49dabea\",\n",
      "      \"content-length\": \"90\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "items_schema = {\n",
    "    \"type\": \"record\",\n",
    "    \"name\": \"Items\",\n",
    "    \"namespace\": \"com.amazonaws.personalize.schema\",\n",
    "    \"fields\": [\n",
    "        {\n",
    "            \"name\": \"ITEM_ID\",\n",
    "            \"type\": \"string\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"CATEGORY\",\n",
    "            \"type\": \"string\",\n",
    "            \"categorical\": True,\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"TYPE\",\n",
    "            \"type\": \"string\",\n",
    "            \"categorical\": True,\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"SIZE\",\n",
    "            \"type\": \"string\",\n",
    "            \"categorical\": True,\n",
    "        }\n",
    "    ],\n",
    "    \"version\": \"1.0\"\n",
    "}\n",
    "\n",
    "create_schema_response = personalize.create_schema(\n",
    "    name = \"adx-weather-schema-items\",\n",
    "    schema = json.dumps(items_schema)\n",
    ")\n",
    "\n",
    "items_schema_arn = create_schema_response['schemaArn']\n",
    "print(json.dumps(create_schema_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Users Dataset Schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"schemaArn\": \"arn:aws:personalize:us-east-1:903376376581:schema/adx-weather-users\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"796aa762-8976-4922-a0a3-9720b34d551a\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:40:16 GMT\",\n",
      "      \"x-amzn-requestid\": \"796aa762-8976-4922-a0a3-9720b34d551a\",\n",
      "      \"content-length\": \"83\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "users_schema = {\n",
    "    \"type\": \"record\",\n",
    "    \"name\": \"Users\",\n",
    "    \"namespace\": \"com.amazonaws.personalize.schema\",\n",
    "    \"fields\": [\n",
    "        {\n",
    "            \"name\": \"USER_ID\",\n",
    "            \"type\": \"string\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"PERSONA\",\n",
    "            \"type\": \"string\",\n",
    "            \"categorical\": True\n",
    "        }\n",
    "    ],\n",
    "    \"version\": \"1.0\"\n",
    "}\n",
    "\n",
    "create_schema_response = personalize.create_schema(\n",
    "    name = \"adx-weather-users\",\n",
    "    schema = json.dumps(users_schema)\n",
    ")\n",
    "\n",
    "users_schema_arn = create_schema_response['schemaArn']\n",
    "print(json.dumps(create_schema_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Interactions Dataset Schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"schemaArn\": \"arn:aws:personalize:us-east-1:903376376581:schema/adx-weather-interactions\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"8aaff0a1-1396-4c6f-a91b-af0a89d93dc1\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:40:18 GMT\",\n",
      "      \"x-amzn-requestid\": \"8aaff0a1-1396-4c6f-a91b-af0a89d93dc1\",\n",
      "      \"content-length\": \"90\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "interactions_schema = {\n",
    "    \"type\": \"record\",\n",
    "    \"name\": \"Interactions\",\n",
    "    \"namespace\": \"com.amazonaws.personalize.schema\",\n",
    "    \"fields\": [\n",
    "        {\n",
    "            \"name\": \"ITEM_ID\",\n",
    "            \"type\": \"string\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"USER_ID\",\n",
    "            \"type\": \"string\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"EVENT_TYPE\",\n",
    "            \"type\": \"string\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"TIMESTAMP\",\n",
    "            \"type\": \"long\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"DAILY_TEMPERATURE\",\n",
    "            \"type\": \"string\",\n",
    "            \"categorical\": True\n",
    "        }\n",
    "    ],\n",
    "    \"version\": \"1.0\"\n",
    "}\n",
    "\n",
    "create_schema_response = personalize.create_schema(\n",
    "    name = \"adx-weather-interactions\",\n",
    "    schema = json.dumps(interactions_schema)\n",
    ")\n",
    "\n",
    "interactions_schema_arn = create_schema_response['schemaArn']\n",
    "print(json.dumps(create_schema_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create and Wait for Dataset Group\n",
    "\n",
    "Next we need to create the dataset group that will contain our three datasets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Dataset Group"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetGroupArn\": \"arn:aws:personalize:us-east-1:903376376581:dataset-group/adx-weather-dataset\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"9c80dbde-ff24-4700-9f9b-89bec1e04012\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:40:22 GMT\",\n",
      "      \"x-amzn-requestid\": \"9c80dbde-ff24-4700-9f9b-89bec1e04012\",\n",
      "      \"content-length\": \"98\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n",
      "DatasetGroupArn = arn:aws:personalize:us-east-1:903376376581:dataset-group/adx-weather-dataset\n"
     ]
    }
   ],
   "source": [
    "create_dataset_group_response = personalize.create_dataset_group(\n",
    "    name = 'adx-weather-dataset'\n",
    ")\n",
    "dataset_group_arn = create_dataset_group_response['datasetGroupArn']\n",
    "print(json.dumps(create_dataset_group_response, indent=2))\n",
    "\n",
    "print(f'DatasetGroupArn = {dataset_group_arn}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Wait for Dataset Group to Have ACTIVE Status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DatasetGroup: CREATE PENDING\n",
      "DatasetGroup: ACTIVE\n"
     ]
    }
   ],
   "source": [
    "status = None\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_group_response = personalize.describe_dataset_group(\n",
    "        datasetGroupArn = dataset_group_arn\n",
    "    )\n",
    "    status = describe_dataset_group_response[\"datasetGroup\"][\"status\"]\n",
    "    print(\"DatasetGroup: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(15)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Items Dataset\n",
    "\n",
    "Next we will create the datasets in Personalize for our three dataset types. Let's start with the items dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetArn\": \"arn:aws:personalize:us-east-1:903376376581:dataset/adx-weather-dataset/ITEMS\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"b3a09e69-3e06-4375-bdf7-13c99f796e8f\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:43:04 GMT\",\n",
      "      \"x-amzn-requestid\": \"b3a09e69-3e06-4375-bdf7-13c99f796e8f\",\n",
      "      \"content-length\": \"93\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "dataset_type = \"ITEMS\"\n",
    "create_dataset_response = personalize.create_dataset(\n",
    "    name = \"adx-weather-dataset-items\",\n",
    "    datasetType = dataset_type,\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    schemaArn = items_schema_arn\n",
    ")\n",
    "\n",
    "items_dataset_arn = create_dataset_response['datasetArn']\n",
    "print(json.dumps(create_dataset_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Users Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetArn\": \"arn:aws:personalize:us-east-1:903376376581:dataset/adx-weather-dataset/USERS\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"1e461e3d-c895-48a2-8bb0-26952cd9e9ee\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:43:07 GMT\",\n",
      "      \"x-amzn-requestid\": \"1e461e3d-c895-48a2-8bb0-26952cd9e9ee\",\n",
      "      \"content-length\": \"93\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "dataset_type = \"USERS\"\n",
    "create_dataset_response = personalize.create_dataset(\n",
    "    name = \"adx-weather-dataset-users\",\n",
    "    datasetType = dataset_type,\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    schemaArn = users_schema_arn\n",
    ")\n",
    "\n",
    "users_dataset_arn = create_dataset_response['datasetArn']\n",
    "print(json.dumps(create_dataset_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Interactions Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetArn\": \"arn:aws:personalize:us-east-1:903376376581:dataset/adx-weather-dataset/INTERACTIONS\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"971d923f-9d16-409a-b30e-82b891264353\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:43:10 GMT\",\n",
      "      \"x-amzn-requestid\": \"971d923f-9d16-409a-b30e-82b891264353\",\n",
      "      \"content-length\": \"100\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "dataset_type = \"INTERACTIONS\"\n",
    "create_dataset_response = personalize.create_dataset(\n",
    "    name = \"adx-weather-dataset-interactions\",\n",
    "    datasetType = dataset_type,\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    schemaArn = interactions_schema_arn\n",
    ")\n",
    "\n",
    "interactions_dataset_arn = create_dataset_response['datasetArn']\n",
    "print(json.dumps(create_dataset_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import Datasets to Personalize\n",
    "\n",
    "Up to this point we have generated CSVs containing data for our users, items, and interactions and staged them in an S3 bucket. We also created schemas in Personalize that define the columns in our CSVs. Then we created a datset group and three datasets in Personalize that will receive our data. In the following steps we will create import jobs with Personalize that will import the datasets from our S3 bucket into the service.\n",
    "\n",
    "### Setup Permissions\n",
    "\n",
    "By default, the Personalize service does not have permission to acccess the data we uploaded into the S3 bucket in our account. In order to grant access to the  Personalize service to read our CSVs, we need to set a Bucket Policy and create an IAM role that the Amazon Personalize service will assume."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Attach policy to S3 bucket"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = boto3.client(\"s3\")\n",
    "\n",
    "policy = {\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Id\": \"PersonalizeS3BucketAccessPolicy\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "            \"Sid\": \"PersonalizeS3BucketAccessPolicy\",\n",
    "            \"Effect\": \"Allow\",\n",
    "            \"Principal\": {\n",
    "                \"Service\": \"personalize.amazonaws.com\"\n",
    "            },\n",
    "            \"Action\": [\n",
    "                \"s3:GetObject\",\n",
    "                \"s3:ListBucket\"\n",
    "            ],\n",
    "            \"Resource\": [\n",
    "                \"arn:aws:s3:::{}\".format(bucket),\n",
    "                \"arn:aws:s3:::{}/*\".format(bucket)\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create S3 Read Only Access Role"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iam = boto3.client(\"iam\")\n",
    "\n",
    "role_name = 'CPG'+\"-PersonalizeS3\"\n",
    "assume_role_policy_document = {\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "          \"Effect\": \"Allow\",\n",
    "          \"Principal\": {\n",
    "            \"Service\": \"personalize.amazonaws.com\"\n",
    "          },\n",
    "          \"Action\": \"sts:AssumeRole\"\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "create_role_response = iam.create_role(\n",
    "    RoleName = role_name,\n",
    "    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)\n",
    ");\n",
    "\n",
    "iam.attach_role_policy(\n",
    "    RoleName = role_name,\n",
    "    PolicyArn = \"arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess\"\n",
    ");\n",
    "\n",
    "role_arn = create_role_response[\"Role\"][\"Arn\"]\n",
    "print('IAM Role: {}'.format(role_arn))\n",
    "# Pause to allow role to fully persist\n",
    "time.sleep(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Import Jobs\n",
    "\n",
    "With the permissions in place to allow Personalize to access our CSV files, let's create three import jobs to import each file into its respective dataset. Each import job can take several minutes to complete so we'll create all three and then wait for them all to complete."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Items Dataset Import Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetImportJobArn\": \"arn:aws:personalize:us-east-1:903376376581:dataset-import-job/adx-weather-dataset-items-import-job\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"746d4668-8335-49a2-abed-b6238d55387f\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:43:48 GMT\",\n",
      "      \"x-amzn-requestid\": \"746d4668-8335-49a2-abed-b6238d55387f\",\n",
      "      \"content-length\": \"124\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "items_create_dataset_import_job_response = personalize.create_dataset_import_job(\n",
    "    jobName = \"adx-weather-dataset-items-import-job\",\n",
    "    datasetArn = items_dataset_arn,\n",
    "    dataSource = {\n",
    "        \"dataLocation\": \"s3://{}/{}\".format(bucket, items_filename)\n",
    "    },\n",
    "    roleArn = role_arn\n",
    ")\n",
    "\n",
    "items_dataset_import_job_arn = items_create_dataset_import_job_response['datasetImportJobArn']\n",
    "print(json.dumps(items_create_dataset_import_job_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Users Dataset Import Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetImportJobArn\": \"arn:aws:personalize:us-east-1:903376376581:dataset-import-job/adx-weather-dataset-users-import-job\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"ba13338e-e073-4cc2-b859-8abe1ce2cd8f\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:43:51 GMT\",\n",
      "      \"x-amzn-requestid\": \"ba13338e-e073-4cc2-b859-8abe1ce2cd8f\",\n",
      "      \"content-length\": \"124\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "users_create_dataset_import_job_response = personalize.create_dataset_import_job(\n",
    "    jobName = \"adx-weather-dataset-users-import-job\",\n",
    "    datasetArn = users_dataset_arn,\n",
    "    dataSource = {\n",
    "        \"dataLocation\": \"s3://{}/{}\".format(bucket, users_filename)\n",
    "    },\n",
    "    roleArn = role_arn\n",
    ")\n",
    "\n",
    "users_dataset_import_job_arn = users_create_dataset_import_job_response['datasetImportJobArn']\n",
    "print(json.dumps(users_create_dataset_import_job_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Interactions Dataset Import Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"datasetImportJobArn\": \"arn:aws:personalize:us-east-1:903376376581:dataset-import-job/adx-weather-dataset-interactions-import-job\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"37c3fbeb-06c7-4a95-891b-e6a9d555597e\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:43:54 GMT\",\n",
      "      \"x-amzn-requestid\": \"37c3fbeb-06c7-4a95-891b-e6a9d555597e\",\n",
      "      \"content-length\": \"131\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "interactions_create_dataset_import_job_response = personalize.create_dataset_import_job(\n",
    "    jobName = \"adx-weather-dataset-interactions-import-job\",\n",
    "    datasetArn = interactions_dataset_arn,\n",
    "    dataSource = {\n",
    "        \"dataLocation\": \"s3://{}/{}\".format(bucket, interactions_filename)\n",
    "    },\n",
    "    roleArn = role_arn\n",
    ")\n",
    "\n",
    "interactions_dataset_import_job_arn = interactions_create_dataset_import_job_response['datasetImportJobArn']\n",
    "print(json.dumps(interactions_create_dataset_import_job_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wait for Import Jobs to Complete\n",
    "\n",
    "It will take 10-15 minutes for the import jobs to complete, while you're waiting you can learn more about Datasets and Schemas here: https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html\n",
    "\n",
    "We will wait for all three jobs to finish."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Wait for Items Import Job to Complete"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "At least one dataset import job still in progress\n",
      "At least one dataset import job still in progress\n",
      "At least one dataset import job still in progress\n",
      "At least one dataset import job still in progress\n",
      "Import job arn:aws:personalize:us-east-1:903376376581:dataset-import-job/adx-weather-dataset-interactions-import-job successfully completed\n",
      "At least one dataset import job still in progress\n",
      "At least one dataset import job still in progress\n",
      "Import job arn:aws:personalize:us-east-1:903376376581:dataset-import-job/adx-weather-dataset-users-import-job successfully completed\n",
      "Import job arn:aws:personalize:us-east-1:903376376581:dataset-import-job/adx-weather-dataset-items-import-job successfully completed\n",
      "All import jobs have ended\n",
      "CPU times: user 55.3 ms, sys: 12 ms, total: 67.3 ms\n",
      "Wall time: 6min\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "import_job_arns = [ items_dataset_import_job_arn, users_dataset_import_job_arn, interactions_dataset_import_job_arn ]\n",
    "\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    for job_arn in reversed(import_job_arns):\n",
    "        import_job_response = personalize.describe_dataset_import_job(\n",
    "            datasetImportJobArn = job_arn\n",
    "        )\n",
    "        status = import_job_response[\"datasetImportJob\"]['status']\n",
    "\n",
    "        if status == \"ACTIVE\":\n",
    "            print(f'Import job {job_arn} successfully completed')\n",
    "            import_job_arns.remove(job_arn)\n",
    "        elif status == \"CREATE FAILED\":\n",
    "            print(f'Import job {job_arn} failed')\n",
    "            if import_job_response.get('failureReason'):\n",
    "                print('   Reason: ' + import_job_response['failureReason'])\n",
    "            import_job_arns.remove(job_arn)\n",
    "\n",
    "    if len(import_job_arns) > 0:\n",
    "        print('At least one dataset import job still in progress')\n",
    "        time.sleep(60)\n",
    "    else:\n",
    "        print(\"All import jobs have ended\")\n",
    "        break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Solutions\n",
    "\n",
    "With our three datasets imported into our dataset group, we can now turn to training models. \n",
    "When creating a solution, you provide your dataset group and the recipe for training. Let's declare the recipes that we will need for our solutions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### List Recipes\n",
    "\n",
    "First, let's list all available recipes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "list_recipes_response = personalize.list_recipes()\n",
    "list_recipes_response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see above, there are several recipes to choose from. Let's use only the Product Recommendations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Declare Personalize Recipe for Product Recommendations\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "recommend_recipe_arn = \"arn:aws:personalize:::recipe/aws-user-personalization\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Solutions and Solution Versions\n",
    "\n",
    "With our recipes defined, we can now create our solutions and solution versions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Product Recommendation Solution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"solutionArn\": \"arn:aws:personalize:us-east-1:903376376581:solution/adx-weather-product-personalization\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"aa23e3be-2e62-492c-8436-752e3f359326\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:51:07 GMT\",\n",
      "      \"x-amzn-requestid\": \"aa23e3be-2e62-492c-8436-752e3f359326\",\n",
      "      \"content-length\": \"105\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "create_solution_response = personalize.create_solution(\n",
    "    name = \"adx-weather-product-personalization\",\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    recipeArn = recommend_recipe_arn\n",
    ")\n",
    "\n",
    "recommend_solution_arn = create_solution_response['solutionArn']\n",
    "print(json.dumps(create_solution_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Product Recommendation Solution Version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"solutionVersionArn\": \"arn:aws:personalize:us-east-1:903376376581:solution/adx-weather-product-personalization/f7a06b4f\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"bd10d702-0b42-4bbf-ac27-673e778476de\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 19:51:11 GMT\",\n",
      "      \"x-amzn-requestid\": \"bd10d702-0b42-4bbf-ac27-673e778476de\",\n",
      "      \"content-length\": \"121\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "create_solution_version_response = personalize.create_solution_version(\n",
    "    solutionArn = recommend_solution_arn\n",
    ")\n",
    "\n",
    "recommend_solution_version_arn = create_solution_version_response['solutionVersionArn']\n",
    "print(json.dumps(create_solution_version_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wait for Solution Versions to Complete\n",
    "\n",
    "It can take 40-60 minutes for all solution versions to be created. During this process a model is being trained and tested with the data contained within your datasets. The duration of training jobs can increase based on the size of the dataset, training parameters and using AutoML vs. manually selecting a recipe. We submitted requests for all three solutions and versions at once so they are trained in parallel and then below we will wait for all three to finish.\n",
    "\n",
    "While you are waiting for this process to complete you can learn more about solutions here: https://docs.aws.amazon.com/personalize/latest/dg/training-deploying-solutions.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Wait for Related Products Solution Version to Have ACTIVE Status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CREATE PENDING\n",
      "CREATE PENDING\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "ACTIVE\n",
      "Solution version arn:aws:personalize:us-east-1:903376376581:solution/adx-weather-product-personalization/f7a06b4f successfully completed\n",
      "CPU times: user 830 ms, sys: 111 ms, total: 941 ms\n",
      "Wall time: 38min 37s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    soln_ver_response = personalize.describe_solution_version(\n",
    "        solutionVersionArn = recommend_solution_version_arn\n",
    "    )\n",
    "    status = soln_ver_response[\"solutionVersion\"][\"status\"]\n",
    "    time.sleep(10)\n",
    "    print(status)\n",
    "    if status == \"ACTIVE\":\n",
    "        print(f'Solution version {recommend_solution_version_arn} successfully completed')\n",
    "        break\n",
    "    elif status == \"CREATE FAILED\":\n",
    "        print(f'Solution version {soln_ver_arn} failed')\n",
    "        if soln_ver_response.get('failureReason'):\n",
    "            print('   Reason: ' + soln_ver_response['failureReason'])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluate Offline Metrics for Solution Versions\n",
    "\n",
    "Amazon Personalize provides [offline metrics](https://docs.aws.amazon.com/personalize/latest/dg/working-with-training-metrics.html#working-with-training-metrics-metrics) that allow you to evaluate the performance of the solution version before you deploy the model in your application. Metrics can also be used to view the effects of modifying a Solution's hyperparameters or to compare the metrics between solutions that use the same training data but created with different recipes.\n",
    "\n",
    "Let's retrieve the metrics for the solution versions we just created."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Product Recommendations Metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"solutionVersionArn\": \"arn:aws:personalize:us-east-1:903376376581:solution/adx-weather-product-personalization/f7a06b4f\",\n",
      "  \"metrics\": {\n",
      "    \"coverage\": 0.9825,\n",
      "    \"mean_reciprocal_rank_at_25\": 0.8703,\n",
      "    \"normalized_discounted_cumulative_gain_at_10\": 0.7174,\n",
      "    \"normalized_discounted_cumulative_gain_at_25\": 0.8607,\n",
      "    \"normalized_discounted_cumulative_gain_at_5\": 0.7432,\n",
      "    \"precision_at_10\": 0.652,\n",
      "    \"precision_at_25\": 0.4159,\n",
      "    \"precision_at_5\": 0.7331\n",
      "  },\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"97c3c90b-e0af-48cf-89c1-d7170fd803e8\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 20:32:41 GMT\",\n",
      "      \"x-amzn-requestid\": \"97c3c90b-e0af-48cf-89c1-d7170fd803e8\",\n",
      "      \"content-length\": \"418\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "get_solution_metrics_response = personalize.get_solution_metrics(\n",
    "    solutionVersionArn = recommend_solution_version_arn\n",
    ")\n",
    "\n",
    "print(json.dumps(get_solution_metrics_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Campaigns\n",
    "\n",
    "Once we're satisfied with our solution versions, we need to create Campaigns for each solution version. When creating a campaign you specify the minimum transactions per second (`minProvisionedTPS`) that you expect to make against the service for this campaign. Personalize will automatically scale the inference endpoint up and down for the campaign to match demand but will never scale below `minProvisionedTPS`.\n",
    "\n",
    "Let's create campaigns for our three solution versions with each set at `minProvisionedTPS` of 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Create Product Recommendation Campaign"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"campaignArn\": \"arn:aws:personalize:us-east-1:903376376581:campaign/adx-weather-product-personalization\",\n",
      "  \"ResponseMetadata\": {\n",
      "    \"RequestId\": \"f68d6550-2019-4b7e-a592-340a9ec8f9b4\",\n",
      "    \"HTTPStatusCode\": 200,\n",
      "    \"HTTPHeaders\": {\n",
      "      \"content-type\": \"application/x-amz-json-1.1\",\n",
      "      \"date\": \"Mon, 20 Sep 2021 20:32:53 GMT\",\n",
      "      \"x-amzn-requestid\": \"f68d6550-2019-4b7e-a592-340a9ec8f9b4\",\n",
      "      \"content-length\": \"105\",\n",
      "      \"connection\": \"keep-alive\"\n",
      "    },\n",
      "    \"RetryAttempts\": 0\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "create_campaign_response = personalize.create_campaign(\n",
    "    name = \"adx-weather-product-personalization\",\n",
    "    solutionVersionArn = recommend_solution_version_arn,\n",
    "    minProvisionedTPS = 1\n",
    ")\n",
    "\n",
    "recommend_campaign_arn = create_campaign_response['campaignArn']\n",
    "print(json.dumps(create_campaign_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Wait for Related Products Campaign to Have ACTIVE Status\n",
    "\n",
    "It can take 20-30 minutes for the campaigns to be fully created. \n",
    "\n",
    "While you are waiting for this to complete you can learn more about campaigns here: https://docs.aws.amazon.com/personalize/latest/dg/campaigns.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CREATE PENDING\n",
      "CREATE PENDING\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "CREATE IN_PROGRESS\n",
      "ACTIVE\n",
      "Campaign arn:aws:personalize:us-east-1:903376376581:campaign/adx-weather-product-personalization successfully completed\n",
      "CPU times: user 201 ms, sys: 23.4 ms, total: 224 ms\n",
      "Wall time: 9min 11s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    campaign_response = personalize.describe_campaign(\n",
    "        campaignArn = recommend_campaign_arn\n",
    "    )\n",
    "    status = campaign_response[\"campaign\"][\"status\"]\n",
    "    time.sleep(10)\n",
    "    print(status)\n",
    "\n",
    "    if status == \"ACTIVE\":\n",
    "        print(f'Campaign {recommend_campaign_arn} successfully completed')\n",
    "        break\n",
    "    elif status == \"CREATE FAILED\":\n",
    "        print(f'Campaign {campaign_arn} failed')\n",
    "        if campaign_response.get('failureReason'):\n",
    "            print('   Reason: ' + campaign_response['failureReason'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test Campaigns\n",
    "\n",
    "Now that our campaigns have been fully created, let's test each campaign and evaluate the results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test Product Recommendations Campaign\n",
    "\n",
    "Let's test the recommendations made by the product recommendations campaign by selecting a user from the users dataset and requesting item recommendations for that user."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Select a User\n",
    "\n",
    "We'll just pick a random user for simplicity. Feel free to change the `user_id` below and execute the following cells with a different user to get a sense for how the recommendations change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>username</th>\n",
       "      <th>email</th>\n",
       "      <th>first_name</th>\n",
       "      <th>last_name</th>\n",
       "      <th>addresses</th>\n",
       "      <th>age</th>\n",
       "      <th>gender</th>\n",
       "      <th>persona</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1995</th>\n",
       "      <td>170</td>\n",
       "      <td>user170</td>\n",
       "      <td>clifford.mueller@example.com</td>\n",
       "      <td>Clifford</td>\n",
       "      <td>Mueller</td>\n",
       "      <td>[{'first_name': 'Clifford', 'last_name': 'Muel...</td>\n",
       "      <td>30</td>\n",
       "      <td>M</td>\n",
       "      <td>other_waters_sparkling</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       id username                         email first_name last_name  \\\n",
       "1995  170  user170  clifford.mueller@example.com   Clifford   Mueller   \n",
       "\n",
       "                                              addresses  age gender  \\\n",
       "1995  [{'first_name': 'Clifford', 'last_name': 'Muel...   30      M   \n",
       "\n",
       "                     persona  \n",
       "1995  other_waters_sparkling  "
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# User with interactions 170\n",
    "# Cold start user 7000\n",
    "user_id = 170\n",
    "users_df.loc[users_df['id'] == user_id]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Take note of the `persona` value for the user above. We should see recommendations for products consistent with this persona since we generated historical interactions for products in the categories represented in the persona.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Get Product Recommendations for User\n",
    "\n",
    "Now let's call Amazon Personalize to get recommendations for our user from the product recommendations campaign."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\n",
      "    {\n",
      "        \"itemId\": \"4\",\n",
      "        \"score\": 0.0610027\n",
      "    },\n",
      "    {\n",
      "        \"itemId\": \"12\",\n",
      "        \"score\": 0.0608411\n",
      "    },\n",
      "    {\n",
      "        \"itemId\": \"54\",\n",
      "        \"score\": 0.0558495\n",
      "    },\n",
      "    {\n",
      "        \"itemId\": \"40\",\n",
      "        \"score\": 0.0487231\n",
      "    },\n",
      "    {\n",
      "        \"itemId\": \"28\",\n",
      "        \"score\": 0.0450286\n",
      "    }\n",
      "]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ITEM_ID</th>\n",
       "      <th>CATEGORY</th>\n",
       "      <th>TYPE</th>\n",
       "      <th>SIZE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4</td>\n",
       "      <td>waters</td>\n",
       "      <td>sparkling</td>\n",
       "      <td>6x1500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>12</td>\n",
       "      <td>waters</td>\n",
       "      <td>normal</td>\n",
       "      <td>6x1500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>54</td>\n",
       "      <td>waters</td>\n",
       "      <td>flavored</td>\n",
       "      <td>4x500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>40</td>\n",
       "      <td>waters</td>\n",
       "      <td>normal</td>\n",
       "      <td>6x1600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>28</td>\n",
       "      <td>waters</td>\n",
       "      <td>sparkling</td>\n",
       "      <td>6x1600</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ITEM_ID CATEGORY       TYPE    SIZE\n",
       "0        4   waters  sparkling  6x1500\n",
       "1       12   waters     normal  6x1500\n",
       "2       54   waters   flavored   4x500\n",
       "3       40   waters     normal  6x1600\n",
       "4       28   waters  sparkling  6x1600"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = recommend_campaign_arn,\n",
    "    userId = str(user_id),\n",
    "    numResults = 5)\n",
    "\n",
    "item_list = get_recommendations_response['itemList']\n",
    "print(json.dumps(item_list, indent=4))\n",
    "search_items_in_dataframe(item_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Are the recommended products consistent with the persona? Note that this is a rather contrived example using a limited amount of generated interaction data without model parameter tuning. The purpose is to give you hands on experience building models and retrieving inferences from Amazon Personalize. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Contextual recomendations\n",
    "\n",
    "Now lets explore the possibility of passing contextual information to the recomendation call. Context can be any attribute included in the Interactions dataset used to train the solution. in our case we included the average temperature of each day in Atlanta extracted from WeatherTrends360 dataset:https://aws.amazon.com/marketplace/pp/prodview-wbuuretid73x4?sr=0-3&ref_=beagle&applicationId=AWSMPContessa#offers.\n",
    "\n",
    "If you want to try your own solutions feel free to explore other dataset on the AWS Marketplace.  \n",
    "\n",
    "Other useful contextual informacion can be the device or trade channel used to interact and other similar metadata alike. More information: https://aws.amazon.com/blogs/machine-learning/increasing-the-relevance-of-your-amazon-personalize-recommendations-by-leveraging-contextual-information/\n",
    "\n",
    "Lets select a user and test the recomendations for the included temperature ranges.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>USER_ID</th>\n",
       "      <th>PERSONA</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [USER_ID, PERSONA]\n",
       "Index: []"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_id = 10000\n",
    "users_dataset_df.loc[users_dataset_df['USER_ID'] == user_id]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[\n",
      "    {\n",
      "        \"itemId\": \"4\",\n",
      "        \"score\": 0.0749273\n",
      "    },\n",
      "    {\n",
      "        \"itemId\": \"54\",\n",
      "        \"score\": 0.0683786\n",
      "    },\n",
      "    {\n",
      "        \"itemId\": \"12\",\n",
      "        \"score\": 0.0577067\n",
      "    },\n",
      "    {\n",
      "        \"itemId\": \"28\",\n",
      "        \"score\": 0.0558818\n",
      "    },\n",
      "    {\n",
      "        \"itemId\": \"20\",\n",
      "        \"score\": 0.0487305\n",
      "    }\n",
      "]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ITEM_ID</th>\n",
       "      <th>CATEGORY</th>\n",
       "      <th>TYPE</th>\n",
       "      <th>SIZE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4</td>\n",
       "      <td>waters</td>\n",
       "      <td>sparkling</td>\n",
       "      <td>6x1500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>54</td>\n",
       "      <td>waters</td>\n",
       "      <td>flavored</td>\n",
       "      <td>4x500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>12</td>\n",
       "      <td>waters</td>\n",
       "      <td>normal</td>\n",
       "      <td>6x1500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>28</td>\n",
       "      <td>waters</td>\n",
       "      <td>sparkling</td>\n",
       "      <td>6x1600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>20</td>\n",
       "      <td>waters</td>\n",
       "      <td>flavored</td>\n",
       "      <td>6x1500</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ITEM_ID CATEGORY       TYPE    SIZE\n",
       "0        4   waters  sparkling  6x1500\n",
       "1       54   waters   flavored   4x500\n",
       "2       12   waters     normal  6x1500\n",
       "3       28   waters  sparkling  6x1600\n",
       "4       20   waters   flavored  6x1500"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DAILY TEMPERATURE = ['hot', 'very hot', 'lukewarm', 'slightly cold', 'cold', 'very cold']\n",
    "\n",
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = recommend_campaign_arn,\n",
    "    userId = str(user_id),\n",
    "    numResults = 5,\n",
    "    context = {\n",
    "      'DAILY_TEMPERATURE': 'slightly cold'\n",
    "    })\n",
    "\n",
    "item_list = get_recommendations_response['itemList']\n",
    "print(json.dumps(item_list, indent=4))\n",
    "search_items_in_dataframe(item_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The items recommended are different from the previous calls? Try different users, values and amount of items recommended to get a grasp of the behavior. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Workshop Complete\n",
    "\n",
    "Congratulations! You have completed the contextual Weather Personalization Workshop.\n",
    "\n",
    "### Cleanup\n",
    "\n",
    "You MUST run the cleanup notebook or manually clean up these resources. If using the notebook, save the names of the elements to be cleaned:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store dataset_group_arn\n",
    "%store items_dataset_arn\n",
    "%store users_dataset_arn\n",
    "%store interactions_dataset_arn\n",
    "%store role_arn\n",
    "%store users_dataset_import_job_arn\n",
    "%store interactions_dataset_import_job_arn\n",
    "%store items_dataset_import_job_arn\n",
    "%store recommend_campaign_arn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_amazonei_mxnet_p36",
   "language": "python",
   "name": "conda_amazonei_mxnet_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}