{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "271cecd0",
   "metadata": {},
   "source": [
    "![banner](../static/imgs/MagicMovieMachine_banner.png)\n",
    "# Building Your Own Movie Recommender\n",
    "\n",
    "Now it's time to build your own movie recommender in your AWS account. In this Jupyter notebook,\n",
    "you will see how you can create recommenders that generate recommendations, just like the Magic Movie Machine!\n",
    "\n",
    "You will first download, modify, and import the movie data collected from the [MovieLens project](https://grouplens.org/datasets/movielens/) data set into Amazon Personalize. Then you will have Amazon Personalize use this data to create recommenders, one for the *Top picks for you* use case and one for the *More like X* use case. \n",
    "\n",
    "These are pre-configured use cases that generate recommendations. Behind the scenes, Amazon Personalize trains multiple use case specific models for each recommender. *Top picks for you* generates recommendations similar to the Top recommendations you saw in the Magic Movie Machine. *More like X* is a simpler version of the *Hidden gems* recommendations: It generates recommendations for similar movies, but they aren't lesser known gems.\n",
    "\n",
    "Finally, you will get movie recommendations, just like what you saw in the Magic Movie Machine!\n",
    "\n",
    "## Tutorial Cost and Duration\n",
    "This tutorial can take around 2 hours to complete. Depending on your AWS account, you might incur some cost. If you are using the AWS free tier, complete the tutorial in the same day that you start, and delete any resources upon completion, these costs will be mimimal to none. For a complete list of charges and prices, see [Amazon Personalize pricing](https://aws.amazon.com/personalize/pricing/).\n",
    "\n",
    "To avoid incurring unnecessary charges, make sure to run the `Clean_Up_Resources.ipynb` notebook found in this repository.\n",
    "# How to use the notebook\n",
    "\n",
    "The code is broken up into cells like the one below. There's a triangular Run button at the top of this page that you can click to execute each cell and move onto the next, or you can press `Shift` + `Enter` while in the cell to execute it and move onto the next one.\n",
    "\n",
    "As a cell is executing you'll notice a line to the side showcase an `*` while the cell is running or it will update to a number to indicate the last cell that completed executing after it has finished exectuting all the code within a cell.\n",
    "\n",
    "Simply follow the instructions below and execute the cells to get started with Amazon Personalize using case optimized recommenders."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "627fa06f",
   "metadata": {},
   "source": [
    "## Imports\n",
    "Python ships with a broad collection of libraries. You need to import these and additional packages like [boto3](https://aws.amazon.com/sdk-for-python/) (AWS SDK for python) and [Pandas](https://pandas.pydata.org/)/[Numpy](https://numpy.org/), which are core data science tools."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "7eb7ecd0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Imports\n",
    "import boto3\n",
    "import json\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import time\n",
    "import datetime"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb2d3341",
   "metadata": {},
   "source": [
    "Next you should validate that your environment can communicate successfully with Amazon Personalize:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2fe963a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configure the SDK to Personalize:\n",
    "personalize = boto3.client('personalize')\n",
    "personalize_runtime = boto3.client('personalize-runtime')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d024fd1a",
   "metadata": {},
   "source": [
    "## Download, Prepare, and Upload Training Data\n",
    "This notebook uses the interactions data collected from the [MovieLens project](https://grouplens.org/datasets/movielens/). Execute the lines below to download the latest version of the MovieLens data, import it into the notebook, and examine it quickly.\n",
    "\n",
    "### Download and Explore the Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "b1fa6704",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2022-04-09 16:48:16--  https://files.grouplens.org/datasets/movielens/ml-latest-small.zip\n",
      "Resolving files.grouplens.org... 128.101.65.152\n",
      "Connecting to files.grouplens.org|128.101.65.152|:443... connected.\n",
      "HTTP request sent, awaiting response... 304 Not Modified\n",
      "File 'ml-latest-small.zip' not modified on server. Omitting download.\n",
      "\n",
      "Archive:  ml-latest-small.zip\n",
      "  inflating: ml-latest-small/links.csv  \n",
      "  inflating: ml-latest-small/tags.csv  \n",
      "  inflating: ml-latest-small/ratings.csv  \n",
      "  inflating: ml-latest-small/README.txt  \n",
      "  inflating: ml-latest-small/movies.csv  \n"
     ]
    }
   ],
   "source": [
    "!wget -N https://files.grouplens.org/datasets/movielens/ml-latest-small.zip\n",
    "!unzip -o ml-latest-small.zip"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "0de5d30c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "README.txt  links.csv   movies.csv  ratings.csv tags.csv\r\n"
     ]
    }
   ],
   "source": [
    "!ls ml-latest-small"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "3c79f5f9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Summary\r\n",
      "=======\r\n",
      "\r\n",
      "This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from [MovieLens](http://movielens.org), a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.\r\n",
      "\r\n",
      "Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.\r\n",
      "\r\n",
      "The data are contained in the files `links.csv`, `movies.csv`, `ratings.csv` and `tags.csv`. More details about the contents and use of all these files follows.\r\n",
      "\r\n",
      "This is a *development* dataset. As such, it may change over time and is not an appropriate dataset for shared research results. See available *benchmark* datasets if that is your intent.\r\n",
      "\r\n",
      "This and other GroupLens data sets are publicly available for download at <http://grouplens.org/datasets/>.\r\n",
      "\r\n",
      "\r\n",
      "Usage License\r\n",
      "=============\r\n",
      "\r\n",
      "Neither the University of Minnesota nor any of the researchers involved can guarantee the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set. The data set may be used for any research purposes under the following conditions:\r\n",
      "\r\n",
      "* The user may not state or imply any endorsement from the University of Minnesota or the GroupLens Research Group.\r\n",
      "* The user must acknowledge the use of the data set in publications resulting from the use of the data set (see below for citation information).\r\n",
      "* The user may redistribute the data set, including transformations, so long as it is distributed under these same license conditions.\r\n",
      "* The user may not use this information for any commercial or revenue-bearing purposes without first obtaining permission from a faculty member of the GroupLens Research Project at the University of Minnesota.\r\n",
      "* The executable software scripts are provided \"as is\" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of them is with you. Should the program prove defective, you assume the cost of all necessary servicing, repair or correction.\r\n",
      "\r\n",
      "In no event shall the University of Minnesota, its affiliates or employees be liable to you for any damages arising out of the use or inability to use these programs (including but not limited to loss of data or data being rendered inaccurate).\r\n",
      "\r\n",
      "If you have any further questions or comments, please email <grouplens-info@umn.edu>\r\n",
      "\r\n",
      "\r\n",
      "Citation\r\n",
      "========\r\n",
      "\r\n",
      "To acknowledge use of the dataset in publications, please cite the following paper:\r\n",
      "\r\n",
      "> F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>\r\n",
      "\r\n",
      "\r\n",
      "Further Information About GroupLens\r\n",
      "===================================\r\n",
      "\r\n",
      "GroupLens is a research group in the Department of Computer Science and Engineering at the University of Minnesota. Since its inception in 1992, GroupLens's research projects have explored a variety of fields including:\r\n",
      "\r\n",
      "* recommender systems\r\n",
      "* online communities\r\n",
      "* mobile and ubiquitious technologies\r\n",
      "* digital libraries\r\n",
      "* local geographic information systems\r\n",
      "\r\n",
      "GroupLens Research operates a movie recommender based on collaborative filtering, MovieLens, which is the source of these data. We encourage you to visit <http://movielens.org> to try it out! If you have exciting ideas for experimental work to conduct on MovieLens, send us an email at <grouplens-info@cs.umn.edu> - we are always interested in working with external collaborators.\r\n",
      "\r\n",
      "\r\n",
      "Content and Use of Files\r\n",
      "========================\r\n",
      "\r\n",
      "Formatting and Encoding\r\n",
      "-----------------------\r\n",
      "\r\n",
      "The dataset files are written as [comma-separated values](http://en.wikipedia.org/wiki/Comma-separated_values) files with a single header row. Columns that contain commas (`,`) are escaped using double-quotes (`\"`). These files are encoded as UTF-8. If accented characters in movie titles or tag values (e.g. Misérables, Les (1995)) display incorrectly, make sure that any program reading the data, such as a text editor, terminal, or script, is configured for UTF-8.\r\n",
      "\r\n",
      "\r\n",
      "User Ids\r\n",
      "--------\r\n",
      "\r\n",
      "MovieLens users were selected at random for inclusion. Their ids have been anonymized. User ids are consistent between `ratings.csv` and `tags.csv` (i.e., the same id refers to the same user across the two files).\r\n",
      "\r\n",
      "\r\n",
      "Movie Ids\r\n",
      "---------\r\n",
      "\r\n",
      "Only movies with at least one rating or tag are included in the dataset. These movie ids are consistent with those used on the MovieLens web site (e.g., id `1` corresponds to the URL <https://movielens.org/movies/1>). Movie ids are consistent between `ratings.csv`, `tags.csv`, `movies.csv`, and `links.csv` (i.e., the same id refers to the same movie across these four data files).\r\n",
      "\r\n",
      "\r\n",
      "Ratings Data File Structure (ratings.csv)\r\n",
      "-----------------------------------------\r\n",
      "\r\n",
      "All ratings are contained in the file `ratings.csv`. Each line of this file after the header row represents one rating of one movie by one user, and has the following format:\r\n",
      "\r\n",
      "    userId,movieId,rating,timestamp\r\n",
      "\r\n",
      "The lines within this file are ordered first by userId, then, within user, by movieId.\r\n",
      "\r\n",
      "Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).\r\n",
      "\r\n",
      "Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.\r\n",
      "\r\n",
      "\r\n",
      "Tags Data File Structure (tags.csv)\r\n",
      "-----------------------------------\r\n",
      "\r\n",
      "All tags are contained in the file `tags.csv`. Each line of this file after the header row represents one tag applied to one movie by one user, and has the following format:\r\n",
      "\r\n",
      "    userId,movieId,tag,timestamp\r\n",
      "\r\n",
      "The lines within this file are ordered first by userId, then, within user, by movieId.\r\n",
      "\r\n",
      "Tags are user-generated metadata about movies. Each tag is typically a single word or short phrase. The meaning, value, and purpose of a particular tag is determined by each user.\r\n",
      "\r\n",
      "Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.\r\n",
      "\r\n",
      "\r\n",
      "Movies Data File Structure (movies.csv)\r\n",
      "---------------------------------------\r\n",
      "\r\n",
      "Movie information is contained in the file `movies.csv`. Each line of this file after the header row represents one movie, and has the following format:\r\n",
      "\r\n",
      "    movieId,title,genres\r\n",
      "\r\n",
      "Movie titles are entered manually or imported from <https://www.themoviedb.org/>, and include the year of release in parentheses. Errors and inconsistencies may exist in these titles.\r\n",
      "\r\n",
      "Genres are a pipe-separated list, and are selected from the following:\r\n",
      "\r\n",
      "* Action\r\n",
      "* Adventure\r\n",
      "* Animation\r\n",
      "* Children's\r\n",
      "* Comedy\r\n",
      "* Crime\r\n",
      "* Documentary\r\n",
      "* Drama\r\n",
      "* Fantasy\r\n",
      "* Film-Noir\r\n",
      "* Horror\r\n",
      "* Musical\r\n",
      "* Mystery\r\n",
      "* Romance\r\n",
      "* Sci-Fi\r\n",
      "* Thriller\r\n",
      "* War\r\n",
      "* Western\r\n",
      "* (no genres listed)\r\n",
      "\r\n",
      "\r\n",
      "Links Data File Structure (links.csv)\r\n",
      "---------------------------------------\r\n",
      "\r\n",
      "Identifiers that can be used to link to other sources of movie data are contained in the file `links.csv`. Each line of this file after the header row represents one movie, and has the following format:\r\n",
      "\r\n",
      "    movieId,imdbId,tmdbId\r\n",
      "\r\n",
      "movieId is an identifier for movies used by <https://movielens.org>. E.g., the movie Toy Story has the link <https://movielens.org/movies/1>.\r\n",
      "\r\n",
      "imdbId is an identifier for movies used by <http://www.imdb.com>. E.g., the movie Toy Story has the link <http://www.imdb.com/title/tt0114709/>.\r\n",
      "\r\n",
      "tmdbId is an identifier for movies used by <https://www.themoviedb.org>. E.g., the movie Toy Story has the link <https://www.themoviedb.org/movie/862>.\r\n",
      "\r\n",
      "Use of the resources listed above is subject to the terms of each provider.\r\n",
      "\r\n",
      "\r\n",
      "Cross-Validation\r\n",
      "----------------\r\n",
      "\r\n",
      "Prior versions of the MovieLens dataset included either pre-computed cross-folds or scripts to perform this computation. We no longer bundle either of these features with the dataset, since most modern toolkits provide this as a built-in feature. If you wish to learn about standard approaches to cross-fold computation in the context of recommender systems evaluation, see [LensKit](http://lenskit.org) for tools, documentation, and open-source code examples.\r\n"
     ]
    }
   ],
   "source": [
    "!pygmentize ml-latest-small/README.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "43beb2ea",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>userId</th>\n",
       "      <th>movieId</th>\n",
       "      <th>rating</th>\n",
       "      <th>timestamp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>4.0</td>\n",
       "      <td>964982703</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>4.0</td>\n",
       "      <td>964981247</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100834</th>\n",
       "      <td>610</td>\n",
       "      <td>168252</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1493846352</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100835</th>\n",
       "      <td>610</td>\n",
       "      <td>170875</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1493846415</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>100836 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        userId  movieId  rating   timestamp\n",
       "0            1        1     4.0   964982703\n",
       "1            1        3     4.0   964981247\n",
       "...        ...      ...     ...         ...\n",
       "100834     610   168252     5.0  1493846352\n",
       "100835     610   170875     3.0  1493846415\n",
       "\n",
       "[100836 rows x 4 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "interactions_data = pd.read_csv('./ml-latest-small/ratings.csv')\n",
    "pd.set_option('display.max_rows', 5)\n",
    "interactions_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "0f104ec4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 100836 entries, 0 to 100835\n",
      "Data columns (total 4 columns):\n",
      " #   Column     Non-Null Count   Dtype  \n",
      "---  ------     --------------   -----  \n",
      " 0   userId     100836 non-null  int64  \n",
      " 1   movieId    100836 non-null  int64  \n",
      " 2   rating     100836 non-null  float64\n",
      " 3   timestamp  100836 non-null  int64  \n",
      "dtypes: float64(1), int64(3)\n",
      "memory usage: 3.1 MB\n"
     ]
    }
   ],
   "source": [
    "interactions_data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9f67892",
   "metadata": {},
   "source": [
    "## Prepare the Data\n",
    "\n",
    "### Interactions Data\n",
    "As you can see, the interactions data contains a UserID, ItemID, Rating, and Timestamp. There's a couple steps you need to complete before you can import the data. \n",
    "\n",
    "Run the code below to remove the items with low rankings, and remove the Rating column before importing the data. You also add the column EVENT_TYPE with a Watch event type for each interaction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc635a59",
   "metadata": {},
   "outputs": [],
   "source": [
    "interactions_data = interactions_data[interactions_data['rating'] > 3]                # Keep only movies rated higher than 3 out of 5.\n",
    "interactions_data = interactions_data[['userId', 'movieId', 'timestamp']]\n",
    "interactions_data.rename(columns = {'userId':'USER_ID', 'movieId':'ITEM_ID', \n",
    "                              'timestamp':'TIMESTAMP'}, inplace = True)\n",
    "interactions_data['EVENT_TYPE']='watch' #Adds an EVENT_TYPE column and an event type of \"watch\" for each interaction.\n",
    "interactions_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "349cd146",
   "metadata": {},
   "outputs": [],
   "source": [
    "items_data = pd.read_csv('./ml-latest-small/movies.csv')\n",
    "items_data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b4bf0dd",
   "metadata": {},
   "source": [
    "### Item Metadata\n",
    "\n",
    "To create the best Magic Movie Machine possible, you'll want to also upload data about each movie. Open the item data file and take a look at the first rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eaab42d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "items_data = pd.read_csv('./ml-latest-small/movies.csv')\n",
    "items_data.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d8a240d",
   "metadata": {},
   "outputs": [],
   "source": [
    "items_data = pd.read_csv('./ml-latest-small/movies.csv')\n",
    "items_data.head(5)\n",
    "items_data.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "be38fa83",
   "metadata": {},
   "outputs": [],
   "source": [
    "items_data['year'] = items_data['title'].str.extract('.*\\((.*)\\).*',expand = False)\n",
    "items_data.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "630aaef7",
   "metadata": {},
   "source": [
    "Because the actual creation timestamp for each movie is unknown, the following adds a modern date as the creation timestamp."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c40e1fd0",
   "metadata": {},
   "outputs": [],
   "source": [
    "ts= datetime.datetime(2022, 1, 1, 0, 0).strftime('%s')\n",
    "print(ts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85cea8cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "items_data[\"CREATION_TIMESTAMP\"] = ts\n",
    "items_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c5c3e18",
   "metadata": {},
   "outputs": [],
   "source": [
    "# removing the title\n",
    "items_data.drop(columns=\"title\", inplace = True)\n",
    "\n",
    "# renaming the columns to match schema\n",
    "items_data.rename(columns = { 'movieId':'ITEM_ID', 'genres':'GENRES',\n",
    "                              'year':'YEAR'}, inplace = True)\n",
    "items_data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f530cf6",
   "metadata": {},
   "source": [
    "### User Metadata\n",
    "\n",
    "For the Magic Movie Machine to provide personalization based on user metadata, you'll want to import data about users. The MovieLens dataset does not have any user metadata, so you can create a fake metadata for each user that's interacted with a movie as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "964940ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get user ids from the interaction dataset\n",
    "\n",
    "user_ids = interactions_data['USER_ID'].unique()\n",
    "user_data = pd.DataFrame()\n",
    "user_data[\"USER_ID\"]=user_ids\n",
    "user_data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e4deabe",
   "metadata": {},
   "source": [
    "#### Adding Metadata\n",
    "The current dataset does not contain additional user information. For this example, we'll randomly assign a gender to the users with equal probablity of male and female. Attributes like these can lead to more personalized recommendations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "45271232",
   "metadata": {},
   "outputs": [],
   "source": [
    "possible_genders = ['female', 'male']\n",
    "random = np.random.choice(possible_genders, len(user_data.index), p=[0.5, 0.5])\n",
    "user_data[\"GENDER\"] = random\n",
    "user_data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4275a8f",
   "metadata": {},
   "source": [
    "## Configure an S3 bucket and an IAM role\n",
    "\n",
    "So far, we have downloaded, manipulated, and saved the data onto the Amazon EBS instance attached to the SageMaker instance running this Jupyter notebook. However, Amazon Personalize will need an S3 bucket to act as the source of your data, as well as IAM roles for accessing that bucket. Let's set all of that up.\n",
    "\n",
    "The Amazon S3 bucket needs to be in the same region as the Amazon Personalize resources we have been creating so far. Simply define the region as a string below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7841475f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sets the same region as current Amazon SageMaker Notebook\n",
    "with open('/opt/ml/metadata/resource-metadata.json') as notebook_info:\n",
    "    data = json.load(notebook_info)\n",
    "    resource_arn = data['ResourceArn']\n",
    "    region = resource_arn.split(':')[3]\n",
    "print('region:', region)\n",
    "\n",
    "# Or you can specify the region where your bucket and model will be domiciled this should be the same region as the Amazon Personalize resources\n",
    "# region = \"us-east-1\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3f1c7eea",
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = boto3.client('s3')\n",
    "account_id = boto3.client('sts').get_caller_identity().get('Account')\n",
    "bucket_name = account_id + \"-\" + region + \"-\" + \"personalizemanagedvod\"\n",
    "print('bucket_name:', bucket_name)\n",
    "\n",
    "try:\n",
    "    if region == \"us-east-1\":\n",
    "        s3.create_bucket(Bucket=bucket_name)\n",
    "    else:\n",
    "        s3.create_bucket(\n",
    "            Bucket=bucket_name,\n",
    "            CreateBucketConfiguration={'LocationConstraint': region}\n",
    "            )\n",
    "except s3.exceptions.BucketAlreadyOwnedByYou:\n",
    "    print(\"Bucket already exists. Using bucket\", bucket_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf4a07ed",
   "metadata": {},
   "source": [
    "### Upload data to S3\n",
    "Now that your Amazon S3 bucket has been created, upload the CSV file of our user-item-interaction data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29acd72e",
   "metadata": {},
   "outputs": [],
   "source": [
    "interactions_filename = \"interactions.csv\"\n",
    "interactions_data.to_csv(interactions_filename, index=False)\n",
    "boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_filename).upload_file(interactions_filename)\n",
    "\n",
    "items_filename = \"items.csv\"\n",
    "items_data.to_csv(items_filename, index=False)\n",
    "boto3.Session().resource('s3').Bucket(bucket_name).Object(items_filename).upload_file(items_filename)\n",
    "\n",
    "user_filename = \"users.csv\"\n",
    "user_data.to_csv(user_filename, index=False)\n",
    "boto3.Session().resource('s3').Bucket(bucket_name).Object(user_filename).upload_file(user_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5156a89",
   "metadata": {},
   "source": [
    "## Set the S3 bucket policy\n",
    "Amazon Personalize needs to be able to read the contents of your S3 bucket. So we add a bucket policy which allows that.\n",
    "\n",
    "Note: Make sure the role you are using to run the code in this notebook has the necessary permissions to modify the S3 bucket policy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b9f1e961",
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = boto3.client(\"s3\")\n",
    "policy = {\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Id\": \"PersonalizeS3BucketAccessPolicy\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "            \"Sid\": \"PersonalizeS3BucketAccessPolicy\",\n",
    "            \"Effect\": \"Allow\",\n",
    "            \"Principal\": {\n",
    "                \"Service\": \"personalize.amazonaws.com\"\n",
    "            },\n",
    "            \"Action\": [\n",
    "                \"s3:GetObject\",\n",
    "                \"s3:ListBucket\"\n",
    "            ],\n",
    "            \"Resource\": [\n",
    "                \"arn:aws:s3:::{}\".format(bucket_name),\n",
    "                \"arn:aws:s3:::{}/*\".format(bucket_name)\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4468293f",
   "metadata": {},
   "source": [
    "## Create and Wait for Dataset Group\n",
    "Now it's time to create your Amazon Personalize resources. First you'll create a Domain dataset group with the domain set as VIDEO_ON_DEMAND. A Domain dataset group is a container that stores preconfigured resources like datasets and recommenders for different business domains and use cases. Feel free to alter the name below if you'd like.\n",
    "\n",
    "### Create Dataset Group"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4628a23f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "response = personalize.create_dataset_group(\n",
    "    name='personalize-video-on-demand-ds-group',\n",
    "    domain='VIDEO_ON_DEMAND'\n",
    ")\n",
    "\n",
    "dataset_group_arn = response['datasetGroupArn']\n",
    "print(json.dumps(response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41711fa1",
   "metadata": {},
   "source": [
    "Wait for Dataset Group to Have ACTIVE Status.\n",
    "Before we can use the Dataset Group in any items below it must be active. Execute the cell below and wait for it to show 'ACTIVE'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "220e8d88",
   "metadata": {},
   "outputs": [],
   "source": [
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_group_response = personalize.describe_dataset_group(\n",
    "        datasetGroupArn = dataset_group_arn\n",
    "    )\n",
    "    status = describe_dataset_group_response[\"datasetGroup\"][\"status\"]\n",
    "    print(\"DatasetGroup: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb229d8b",
   "metadata": {},
   "source": [
    "## Create Interactions Schema\n",
    "A core component of how Personalize understands your data comes from the Schema that is defined below. This configuration tells the service how to read the data provided via your CSV file. Note the columns and types align to what was in the file you created above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ab327d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = {\n",
    "  \"type\": \"record\",\n",
    "  \"name\": \"Interactions\",\n",
    "  \"namespace\": \"com.amazonaws.personalize.schema\",\n",
    "  \"fields\": [\n",
    "      {\n",
    "          \"name\": \"USER_ID\",\n",
    "          \"type\": \"string\"\n",
    "      },\n",
    "      {\n",
    "          \"name\": \"ITEM_ID\",\n",
    "          \"type\": \"string\"\n",
    "      },\n",
    "      {\n",
    "          \"name\": \"EVENT_TYPE\",\n",
    "          \"type\": \"string\"\n",
    "      },\n",
    "      {\n",
    "          \"name\": \"TIMESTAMP\",\n",
    "          \"type\": \"long\"\n",
    "      }\n",
    "  ],\n",
    "  \"version\": \"1.0\"\n",
    "}\n",
    "\n",
    "create_interactions_schema_response = personalize.create_schema(\n",
    "    name='personalize-demo-interactions-schema',\n",
    "    schema=json.dumps(schema),\n",
    "    domain='VIDEO_ON_DEMAND'\n",
    ")\n",
    "\n",
    "interactions_schema_arn = create_interactions_schema_response['schemaArn']\n",
    "print(json.dumps(create_interactions_schema_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6f84edb",
   "metadata": {},
   "source": [
    "# Create Items (movies) schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3fa4c0c6",
   "metadata": {},
   "source": [
    "Similar to earlier, we will be creating a schema for Items, which in this case are movies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55549abc",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = {\n",
    "  \"type\": \"record\",\n",
    "  \"name\": \"Items\",\n",
    "  \"namespace\": \"com.amazonaws.personalize.schema\",\n",
    "  \"fields\": [\n",
    "    {\n",
    "      \"name\": \"ITEM_ID\",\n",
    "      \"type\": \"string\"\n",
    "    },\n",
    "    {\n",
    "      \"name\": \"GENRES\",\n",
    "      \"type\": [\n",
    "        \"string\"\n",
    "      ],\n",
    "      \"categorical\": True\n",
    "    },\n",
    "    {\n",
    "      \"name\": \"YEAR\",\n",
    "      \"type\": [\n",
    "        \"string\"\n",
    "      ],\n",
    "      \"categorical\": True\n",
    "    }, \n",
    "    {\n",
    "      \"name\": \"CREATION_TIMESTAMP\",\n",
    "      \"type\": \"long\"\n",
    "    }\n",
    "  ],\n",
    "  \"version\": \"1.0\"\n",
    "}\n",
    "create_items_schema_response = personalize.create_schema(\n",
    "    name='personalize-demo-items-schema',\n",
    "    schema=json.dumps(schema),\n",
    "    domain='VIDEO_ON_DEMAND'\n",
    ")\n",
    "\n",
    "items_schema_arn = create_items_schema_response['schemaArn']\n",
    "print(json.dumps(create_items_schema_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9701807d",
   "metadata": {},
   "source": [
    "# Create Users schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c3bb6c1",
   "metadata": {},
   "source": [
    "Now we create a schema for user data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9a544c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = {\n",
    "    \"type\": \"record\",\n",
    "    \"name\": \"Users\",\n",
    "    \"namespace\": \"com.amazonaws.personalize.schema\",\n",
    "    \"fields\": [\n",
    "      {\n",
    "          \"name\": \"USER_ID\",\n",
    "          \"type\": \"string\"\n",
    "      },\n",
    "      {\n",
    "          \"name\": \"GENDER\",\n",
    "          \"type\": \"string\",\n",
    "          \"categorical\": True\n",
    "      }\n",
    "    ],\n",
    "    \"version\": \"1.0\"\n",
    "}\n",
    "create_users_schema_response = personalize.create_schema(\n",
    "    name='personalize-demo-users-schema',\n",
    "    schema=json.dumps(schema),\n",
    "    domain='VIDEO_ON_DEMAND'\n",
    ")\n",
    "\n",
    "users_schema_arn = create_users_schema_response['schemaArn']\n",
    "print(json.dumps(create_users_schema_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3d72798",
   "metadata": {},
   "source": [
    "## Create Datasets\n",
    "The next task is to use the schemas you just created to create the actual Amazon Personalize datasets within your Domain dataset group. The datasets will store your data for training."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0433a207",
   "metadata": {},
   "source": [
    "### Create Interactions Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ea8d7499",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_type = \"INTERACTIONS\"\n",
    "\n",
    "create_dataset_response = personalize.create_dataset(\n",
    "    name = \"personalize-demo-interactions\",\n",
    "    datasetType = dataset_type,\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    schemaArn = interactions_schema_arn\n",
    ")\n",
    "\n",
    "interactions_dataset_arn = create_dataset_response['datasetArn']\n",
    "print(json.dumps(create_dataset_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b7156ab",
   "metadata": {},
   "source": [
    "### Create Items Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc81f7c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_type = \"ITEMS\"\n",
    "create_dataset_response = personalize.create_dataset(\n",
    "    name = \"personalize-demo-items\",\n",
    "    datasetType = dataset_type,\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    schemaArn = items_schema_arn\n",
    ")\n",
    "\n",
    "items_dataset_arn = create_dataset_response['datasetArn']\n",
    "print(json.dumps(create_dataset_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "380cb08b",
   "metadata": {},
   "source": [
    "### Create Users Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d850ba2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_type = \"USERS\"\n",
    "create_dataset_response = personalize.create_dataset(\n",
    "    name = \"personalize-demo-users\",\n",
    "    datasetType = dataset_type,\n",
    "    datasetGroupArn = dataset_group_arn,\n",
    "    schemaArn = users_schema_arn\n",
    ")\n",
    "\n",
    "users_dataset_arn = create_dataset_response['datasetArn']\n",
    "print(json.dumps(create_dataset_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f43a05b",
   "metadata": {},
   "source": [
    "## Create Personalize Role\n",
    "Amazon Personalize needs the ability to assume Roles in AWS in order to have the permissions to execute certain tasks. The lines below grant that.\n",
    "\n",
    "Note: Make sure the role you are using to run the code in this notebook has the necessary permissions to create a role. In case you get stuck, see the [Amazon Personalize developer guide](https://docs.aws.amazon.com/personalize/latest/dg/aws-personalize-set-up-permissions.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd426458",
   "metadata": {},
   "outputs": [],
   "source": [
    "iam = boto3.client(\"iam\")\n",
    "\n",
    "role_name = \"PersonalizeRoleVODDemoRecommender\"\n",
    "assume_role_policy_document = {\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "          \"Effect\": \"Allow\",\n",
    "          \"Principal\": {\n",
    "            \"Service\": \"personalize.amazonaws.com\"\n",
    "          },\n",
    "          \"Action\": \"sts:AssumeRole\"\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "\n",
    "create_role_response = iam.create_role(\n",
    "    RoleName = role_name,\n",
    "    AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)\n",
    ")\n",
    "\n",
    "# AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes \"personalize\" or \"Personalize\" \n",
    "# if you would like to use a bucket with a different name, please consider creating and attaching a new policy\n",
    "# that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role\n",
    "policy_arn = \"arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess\"\n",
    "iam.attach_role_policy(\n",
    "    RoleName = role_name,\n",
    "    PolicyArn = policy_arn\n",
    ")\n",
    "\n",
    "# Now add S3 support\n",
    "iam.attach_role_policy(\n",
    "    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess',\n",
    "    RoleName=role_name\n",
    ")\n",
    "time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate\n",
    "\n",
    "role_arn = create_role_response[\"Role\"][\"Arn\"]\n",
    "print(role_arn)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fd4a8ef",
   "metadata": {},
   "source": [
    "## Import the data\n",
    "Earlier you created the Domain dataset group and datasets to house the Magic Movie Machine data. Now you will execute an import job that will load the data from S3 into Amazon Personalize for building your movie recommender.\n",
    "### Create Interactions Dataset Import Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "663e317e",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_interactions_dataset_import_job_response = personalize.create_dataset_import_job(\n",
    "    jobName = \"personalize-demo-import-interactions\",\n",
    "    datasetArn = interactions_dataset_arn,\n",
    "    dataSource = {\n",
    "        \"dataLocation\": \"s3://{}/{}\".format(bucket_name, interactions_filename)\n",
    "    },\n",
    "    roleArn = role_arn\n",
    ")\n",
    "\n",
    "dataset_interactions_import_job_arn = create_interactions_dataset_import_job_response['datasetImportJobArn']\n",
    "print(json.dumps(create_interactions_dataset_import_job_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fdec7339",
   "metadata": {},
   "source": [
    "### Create Items Dataset Import Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66094ffd",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_items_dataset_import_job_response = personalize.create_dataset_import_job(\n",
    "    jobName = \"personalize-demo-import-items\",\n",
    "    datasetArn = items_dataset_arn,\n",
    "    dataSource = {\n",
    "        \"dataLocation\": \"s3://{}/{}\".format(bucket_name, items_filename)\n",
    "    },\n",
    "    roleArn = role_arn\n",
    ")\n",
    "\n",
    "dataset_items_import_job_arn = create_items_dataset_import_job_response['datasetImportJobArn']\n",
    "print(json.dumps(create_items_dataset_import_job_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ed59d22",
   "metadata": {},
   "source": [
    "### Create Users Dataset Import Job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1956ffa",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_users_dataset_import_job_response = personalize.create_dataset_import_job(\n",
    "    jobName = \"personalize-demo-import-users\",\n",
    "    datasetArn = users_dataset_arn,\n",
    "    dataSource = {\n",
    "        \"dataLocation\": \"s3://{}/{}\".format(bucket_name, user_filename)\n",
    "    },\n",
    "    roleArn = role_arn\n",
    ")\n",
    "\n",
    "dataset_users_import_job_arn = create_users_dataset_import_job_response['datasetImportJobArn']\n",
    "print(json.dumps(create_users_dataset_import_job_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21508180",
   "metadata": {},
   "source": [
    "Wait for Dataset Import Job to Have ACTIVE Status.\n",
    "It can take a while before the import job completes, please wait until you see that it is active below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "214aafa9",
   "metadata": {},
   "outputs": [],
   "source": [
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
    "        datasetImportJobArn = dataset_interactions_import_job_arn\n",
    "    )\n",
    "    status = describe_dataset_import_job_response[\"datasetImportJob\"]['status']\n",
    "    print(\"Interactions DatasetImportJob: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)\n",
    "    \n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
    "        datasetImportJobArn = dataset_items_import_job_arn\n",
    "    )\n",
    "    status = describe_dataset_import_job_response[\"datasetImportJob\"]['status']\n",
    "    print(\"Items DatasetImportJob: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)\n",
    "    \n",
    "max_time = time.time() + 3*60*60 # 3 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
    "        datasetImportJobArn = dataset_users_import_job_arn\n",
    "    )\n",
    "    status = describe_dataset_import_job_response[\"datasetImportJob\"]['status']\n",
    "    print(\"Users DatasetImportJob: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "537228cd",
   "metadata": {},
   "source": [
    "## Choose a recommender use cases\n",
    "\n",
    "Each domain has different use cases. When you create a recommender, you create it for a specific use case, and each use case has different requirements for getting recommendations.\n",
    "\n",
    "For information about all the use cases Amazon Personalize supports see [Choosing recommender use cases](https://docs.aws.amazon.com/personalize/latest/dg/domain-use-cases.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58eb33b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "available_recipes = personalize.list_recipes(domain='VIDEO_ON_DEMAND') # See a list of recommenders for the domain. \n",
    "if (len(available_recipes[\"recipes\"])==0):\n",
    "    # This is a workaround to get the recipes in case 'available_recipes[\"recipes\"]'does not retrieve them\n",
    "    available_recipes = personalize.list_recipes(domain='VIDEO_ON_DEMAND', nextToken=available_recipes[\"nextToken\"])\n",
    "display(available_recipes[\"recipes\"])\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "701ea401",
   "metadata": {},
   "source": [
    "We are going to create a recommender of the type *More like X*. *More like X* is a simpler version of the *Hidden gems* recommendations you saw in the Magic Movie Machine: It generates recommendations for similar movies, but they aren't lesser known gems. With this use case, Amazon Personalize automatically filters videos the user watched based on the userId specified in the `get_recommendations` call."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb597f45",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_recommender_response = personalize.create_recommender(\n",
    "  name = 'more_like_x_demo',\n",
    "  recipeArn = 'arn:aws:personalize:::recipe/aws-vod-more-like-x',\n",
    "  datasetGroupArn = dataset_group_arn\n",
    ")\n",
    "recommender_more_like_x_arn = create_recommender_response[\"recommenderArn\"]\n",
    "print (json.dumps(create_recommender_response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a066baa2",
   "metadata": {},
   "source": [
    "We are going to create a second recommender of the type \"Top picks for you\". This type of recommender offers personalized streaming content recommendations for a user that you specify. With this use case, Amazon Personalize automatically filters videos the user watched based on the userId that you specify and `Watch` events."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "488c92ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "create_recommender_response = personalize.create_recommender(\n",
    "  name = 'top_picks_for_you_demo',\n",
    "  recipeArn = 'arn:aws:personalize:::recipe/aws-vod-top-picks',\n",
    "  datasetGroupArn = dataset_group_arn\n",
    ")\n",
    "recommender_top_picks_arn = create_recommender_response[\"recommenderArn\"]\n",
    "print (json.dumps(create_recommender_response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "101696c0",
   "metadata": {},
   "source": [
    "We wait until the recomenders have finished creating and have status `ACTIVE`. We check periodically on the status of the recommender"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a59d734e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "max_time = time.time() + 10*60*60 # 10 hours\n",
    "while time.time() < max_time:\n",
    "\n",
    "    version_response = personalize.describe_recommender(\n",
    "        recommenderArn = recommender_more_like_x_arn\n",
    "    )\n",
    "    status = version_response[\"recommender\"][\"status\"]\n",
    "\n",
    "    if status == \"ACTIVE\":\n",
    "        print(\"Build succeeded for {}\".format(recommender_more_like_x_arn))\n",
    "        \n",
    "    elif status == \"CREATE FAILED\":\n",
    "        print(\"Build failed for {}\".format(recommender_more_like_x_arn))\n",
    "\n",
    "    if status == \"ACTIVE\":\n",
    "        break\n",
    "    else:\n",
    "        print(\"The More Like X Recommender build is still in progress\")\n",
    "        \n",
    "    time.sleep(60)\n",
    "    \n",
    "while time.time() < max_time:\n",
    "\n",
    "    version_response = personalize.describe_recommender(\n",
    "        recommenderArn = recommender_top_picks_arn\n",
    "    )\n",
    "    status = version_response[\"recommender\"][\"status\"]\n",
    "\n",
    "    if status == \"ACTIVE\":\n",
    "        print(\"Build succeeded for {}\".format(recommender_top_picks_arn))\n",
    "        \n",
    "    elif status == \"CREATE FAILED\":\n",
    "        print(\"Build failed for {}\".format(recommender_top_picks_arn))\n",
    "\n",
    "    if status == \"ACTIVE\":\n",
    "        break\n",
    "    else:\n",
    "        print(\"The Top Pics for You Recommender build is still in progress\")\n",
    "        \n",
    "    time.sleep(60)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff1186da",
   "metadata": {},
   "source": [
    "## Get recommendations\n",
    "Now that you have created your movie recommenders, lets have a look at the movie recommendations you can get for your users!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e2432f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# reading the original data in order to have a dataframe that has both movie_ids \n",
    "# and the corresponding titles to make out recommendations easier to read.\n",
    "items_df = pd.read_csv('./ml-latest-small/movies.csv')\n",
    "items_df.sample(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ff4e06f",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_movie_by_id(movie_id, movie_df):\n",
    "    \"\"\"\n",
    "    This takes in an movie_id from a recommendation in string format,\n",
    "    converts it to an int, and then does a lookup in a specified\n",
    "    dataframe.\n",
    "    \n",
    "    A really broad try/except clause was added in case anything goes wrong.\n",
    "    \n",
    "    Feel free to add more debugging or filtering here to improve results if\n",
    "    you hit an error.\n",
    "    \"\"\"\n",
    "    try:\n",
    "        return movie_df.loc[movie_df[\"movieId\"]==int(movie_id)]['title'].values[0]\n",
    "    except:\n",
    "        print (movie_id)\n",
    "        return \"Error obtaining title\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4024e53e",
   "metadata": {},
   "source": [
    "### Get 'More Like X' movie recommendations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67719cab",
   "metadata": {},
   "source": [
    "The More like X recommender generates recommendations for similar movies. You provide an ID of a movie, and Amazon Personalize returns a list of similar movies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0014d9e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# First pick a user\n",
    "test_user_id = \"1\"\n",
    "\n",
    "# Select a random item\n",
    "test_item_id = \"81847\" #Iron Man 59315, Tangled: 81847\n",
    "\n",
    "# Get recommendations for the user for this item\n",
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    recommenderArn = recommender_more_like_x_arn,\n",
    "    userId = test_user_id,\n",
    "    itemId = test_item_id,\n",
    "    numResults = 20\n",
    ")\n",
    "\n",
    "# Build a new dataframe for the recommendations\n",
    "item_list = get_recommendations_response['itemList']\n",
    "recommendation_list = []\n",
    "for item in item_list:\n",
    "    movie = get_movie_by_id(item['itemId'], items_df)\n",
    "    recommendation_list.append(movie)\n",
    "\n",
    "user_recommendations_df = pd.DataFrame(recommendation_list, columns = [get_movie_by_id(test_item_id, items_df)])\n",
    "\n",
    "pd.options.display.max_rows = 20\n",
    "display(user_recommendations_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5b98839",
   "metadata": {},
   "source": [
    "### Get 'Top picks for you' recommendations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ccfc709",
   "metadata": {},
   "source": [
    "Now you will get personalized recommendations for a user with the *Top picks for you* recommender. First, use the following code to add the user's metadata to our sample user. You can use this type of metadata to get insights on your users."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3bd89667",
   "metadata": {},
   "outputs": [],
   "source": [
    "users_data_df = pd.read_csv('./users.csv')\n",
    "\n",
    "def get_gender_by_id(user_id, user_df):\n",
    "    \"\"\"\n",
    "    This takes in a user_id and then does a lookup in a specified\n",
    "    dataframe.\n",
    "    \n",
    "    A really broad try/except clause was added in case anything goes wrong.\n",
    "    \n",
    "    Feel free to add more debugging or filtering here to improve results if\n",
    "    you hit an error.\n",
    "    \"\"\"\n",
    "    return user_df.loc[user_df[\"USER_ID\"]==int(user_id)]['GENDER'].values[0]\n",
    "    try:\n",
    "        return user_df.loc[user_df[\"USER_ID\"]==int(user_id)]['GENDER'].values[0]\n",
    "    except:\n",
    "        print (user_id)\n",
    "        return \"Error obtaining title\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ca189c7",
   "metadata": {},
   "source": [
    "Next, pick a user and get movie recommendations for them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "601135d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# First pick a user\n",
    "test_user_id = \"111\" # samples users: 55, 75, 76, 111\n",
    "\n",
    "# Get recommendations for the user\n",
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    recommenderArn = recommender_top_picks_arn,\n",
    "    userId = test_user_id,\n",
    "    numResults = 20\n",
    ")\n",
    "\n",
    "# Build a new dataframe for the recommendations\n",
    "item_list = get_recommendations_response['itemList']\n",
    "recommendation_list = []\n",
    "for item in item_list:\n",
    "    movie = get_movie_by_id(item['itemId'], items_df)\n",
    "    recommendation_list.append(movie)\n",
    "\n",
    "column_name = test_user_id+\" (\"+get_gender_by_id(test_user_id, users_data_df)+\")\"\n",
    "\n",
    "user_recommendations_df = pd.DataFrame(recommendation_list, columns = [column_name])\n",
    "\n",
    "pd.options.display.max_rows =20\n",
    "display(user_recommendations_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95c6521f",
   "metadata": {},
   "source": [
    "## Review\n",
    "Using the above code, you successfully created recommenders that work just like in the Magic Movie Machine: You trained a deep learning model to generate movie recommendations based on prior user behavior. \n",
    "\n",
    "You created two recommenders for two foundational use cases: one for the *Top picks for you* use case and one for the *More like X* use case. Going forward, you can adapt this code to create other recommenders."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66a85bbd",
   "metadata": {},
   "source": [
    "## Notes for the Next Notebook:\n",
    "There are a few values you will need for the next notebook, execute the cell below to store them so they can be used in the `Clean_Up_Resources.ipynb` notebook.\n",
    "\n",
    "This will overwite any data stored for those variables and set them to the values specified in this notebook. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b77f68f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# store for cleanup\n",
    "%store dataset_group_arn\n",
    "%store role_name\n",
    "%store region"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}