{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "27cc3268-2230-48d3-b79f-6c9e51ca0b3a",
"metadata": {},
"source": [
"# Amazon Personalize - From Data Preparation to Campaign Deployment\n",
"\n",
"This notebook uses `conda_python3` as the default kernel.\n",
"
\n",
"Deploy Personalize Campaign by running cells sequentially from start to finish."
]
},
{
"cell_type": "markdown",
"id": "a222cc25",
"metadata": {},
"source": []
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9c13c5b0-79c5-41da-adb6-7ae09801f2fd",
"metadata": {},
"source": [
"## 0. Setting Environment\n",
"\n",
"(Optional) Run boto3 sdk upgrade if needed.\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1c9244fa-ffd9-4f35-acdc-e42fef9e59f5",
"metadata": {},
"source": [
"### boto3 Upgrade (Optional)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf7d9b34-7f86-4704-b6cb-d94c48e070a1",
"metadata": {},
"outputs": [],
"source": [
"# !pip install boto3 --upgrade"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1924cfee-75b2-476d-9643-57e66658f97e",
"metadata": {},
"source": [
"## 1. Data Preparation\n",
"\n",
"We use the dataset from the Retail Demo Store below.\n",
"- It is used by unpacking the tar archive.\n",
"\n",
"* Retail Demo Store\n",
" * https://github.com/aws-samples/retail-demo-store"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c60d64e7-8e20-4f3b-8d0d-1bcee84f6ce8",
"metadata": {},
"outputs": [],
"source": [
"import tarfile\n",
"\n",
"tf = tarfile.open(\"../data/RetailDemoDataSet.tar\")\n",
"tf.extractall(\"../data\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a1d0139-ecc3-441e-ac33-db41c31331a5",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"items = pd.read_csv('../data/items.csv')\n",
"users = pd.read_csv('../data/users.csv')\n",
"its = pd.read_csv('../data/interactions.csv')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "29301bf4-7676-4c7f-9c4a-eb370080e356",
"metadata": {},
"source": [
"## 2. Data Preprocessing"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c27d6f3a-56a8-47b8-af11-fc7a294cee98",
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import json\n",
"import numpy as np\n",
"import pandas as pd\n",
"import time\n",
"from datetime import datetime\n",
"\n",
"import matplotlib.pyplot as plt"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "79adaf49-ffb2-4325-973e-ca1de224f9b5",
"metadata": {},
"source": [
"### Edit columns of ITEMS dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4944e7d6-9673-4a6f-9915-c9ab2ce8176d",
"metadata": {},
"outputs": [],
"source": [
"items.columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "77efc9af-0c84-4570-b666-264908faf406",
"metadata": {},
"outputs": [],
"source": [
"def item_data_selection(df, cols):\n",
" ldf = df[cols]\n",
" ldf = ldf.rename(columns={'id':'ITEM_ID',\n",
" 'name' : 'NAME',\n",
" 'category' :'CATEGORY_L1',\n",
" 'style' : 'STYLE',\n",
" 'description' : 'PRODUCT_DESCRIPTION',\n",
" 'price' : 'PRICE',\n",
" })\n",
" return ldf\n",
"\n",
"\n",
"item_cols = ['id', 'name', 'category', 'style', 'description','price']\n",
"items_df = item_data_selection(items, item_cols) \n",
"\n",
"items_df.head(3)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5fb2d4d3-ba76-49d8-ae57-bd3449866899",
"metadata": {},
"source": [
"### Edit columns of USERS dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35a823d5-6917-4619-965d-6a571b7b11cc",
"metadata": {},
"outputs": [],
"source": [
"users.columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35b83662-3f1c-4e0f-bb16-f6ac99d3ded1",
"metadata": {},
"outputs": [],
"source": [
"def user_data_selection(df, cols):\n",
" ldf = df[cols]\n",
" ldf = ldf.rename(columns={'id':'USER_ID',\n",
" 'username' : 'USER_NAME',\n",
" 'age' :'AGE',\n",
" 'gender' : 'GENDER', \n",
" })\n",
" return ldf\n",
"\n",
"user_cols = ['id', 'username', 'age', 'gender']\n",
"\n",
"users_df = user_data_selection(users, user_cols) \n",
"users_df.head(3)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e9261103-46f7-4825-87dc-b3dc57335cf7",
"metadata": {},
"source": [
"### Modify data type of ITEMS dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3daa6f0b-7118-4e0d-8ac1-0d735906457d",
"metadata": {},
"outputs": [],
"source": [
"users_df.info()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10a40f0e-2d7c-4f90-94aa-f98555fa2b29",
"metadata": {},
"outputs": [],
"source": [
"def change_data_type(df, col, target_type):\n",
" ldf = df.copy()\n",
" ldf[col] = ldf[col].astype(target_type)\n",
" \n",
" return ldf\n",
"\n",
"users_df = change_data_type(users_df, col='USER_ID', target_type='object')\n",
"users_df.info()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4ed5349f-3872-4fed-90f1-eb6ec3cdd4e6",
"metadata": {},
"source": [
"### Edit columns of INTERACTIONS dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e297e60-4be8-4956-808f-def6d05d8f5e",
"metadata": {},
"outputs": [],
"source": [
"its.columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a739ecab-b46b-4735-8690-57e6a5b5f461",
"metadata": {},
"outputs": [],
"source": [
"def interactions_data_selection(df, cols):\n",
" ldf = df[cols]\n",
" ldf = ldf.rename(columns={'id':'USER_ID',\n",
" 'username' : 'USER_NAME',\n",
" 'age' :'AGE',\n",
" 'gender' : 'GENDER', \n",
" })\n",
" return ldf\n",
"\n",
"interactions_cols = ['ITEM_ID', 'USER_ID', 'EVENT_TYPE', 'TIMESTAMP']\n",
"\n",
"full_interactions_df = interactions_data_selection(its, interactions_cols) \n",
"full_interactions_df.head(3)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5a749716-8999-43b7-9fad-df609d287f39",
"metadata": {},
"source": [
"### Edit EVENT_TYPE column of INTERACTIONS dataset \n",
"\n",
"Select only ProductViewd and OrderCompleted for EVENT_TYPE and change the names to `View` and `Purchase` respectively."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "748281d6-9f58-4fc3-b86b-74cfedb2238e",
"metadata": {},
"outputs": [],
"source": [
"full_interactions_df.EVENT_TYPE.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "20ca2d73-5ad7-4cf7-8fc5-e536b39b387f",
"metadata": {},
"outputs": [],
"source": [
"def filter_interactions_data(df, kinds_event_type):\n",
" ldf = df[df['EVENT_TYPE'].isin(kinds_event_type)]\n",
" ldf['EVENT_TYPE'] = ldf['EVENT_TYPE'].replace(['ProductViewed'],'View') \n",
" ldf['EVENT_TYPE'] = ldf['EVENT_TYPE'].replace(['OrderCompleted'],'Purchase') \n",
" \n",
" return ldf\n",
"\n",
"select_event_types = ['ProductViewed','OrderCompleted']\n",
"interactions_df = filter_interactions_data(full_interactions_df, select_event_types)\n",
"interactions_df"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "06125e9e-18a2-4116-b4cc-d15bda268773",
"metadata": {},
"source": [
"### Edit columns of INTERACTIONS dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "218d43d9-c058-446b-824f-03334bcd8413",
"metadata": {},
"outputs": [],
"source": [
"interactions_df.info()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b322780f-39be-4f23-9c8a-65a6592e1e62",
"metadata": {},
"outputs": [],
"source": [
"interactions_df = change_data_type(interactions_df, col='USER_ID', target_type='object')\n",
"interactions_df.info()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e49c1a85-80a5-4e4c-883d-b5823e70c238",
"metadata": {},
"source": [
"## 3. Upload the dataset to S3"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2080abc-b028-4110-a99d-f08edddf5f03",
"metadata": {},
"outputs": [],
"source": [
"import sagemaker\n",
"\n",
"bucket='' # replace with the name of your S3 bucket\n",
"bucket"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b37722d3-5b4b-4f5c-87f7-55d56f8ad2b1",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.makedirs('dataset', exist_ok=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0433896b-b0fd-47ff-9f34-11153720ae23",
"metadata": {},
"outputs": [],
"source": [
"items_filename = \"dataset/training_item.csv\"\n",
"users_filename = \"dataset/training_user.csv\"\n",
"its_filename = \"dataset/training_interaction.csv\"\n",
"\n",
"items_df.to_csv(items_filename,index=False)\n",
"users_df.to_csv(users_filename,index=False)\n",
"interactions_df.to_csv(its_filename,index=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e5ec17f5-2d6b-441b-8159-fdd61ded6ff5",
"metadata": {},
"outputs": [],
"source": [
"#upload file for training\n",
"response_upload = boto3.Session().resource('s3').Bucket(bucket).Object(its_filename).upload_file(its_filename)\n",
"boto3.Session().resource('s3').Bucket(bucket).Object(users_filename).upload_file(users_filename)\n",
"boto3.Session().resource('s3').Bucket(bucket).Object(items_filename).upload_file(items_filename)\n",
"\n",
"s3_its_filename = \"s3://{}/{}\".format(bucket, its_filename)\n",
"s3_users_filename = \"s3://{}/{}\".format(bucket, users_filename)\n",
"s3_items_filename = \"s3://{}/{}\".format(bucket, items_filename)\n",
"\n",
"print(\"s3_train_interaction_filename: \\n\", s3_its_filename)\n",
"print(\"s3_train_users_filename: \\n\", s3_users_filename)\n",
"print(\"s3_train_items_filename: \\n\", s3_items_filename)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d81bd337-1457-4447-bef4-ad996c8b65df",
"metadata": {},
"outputs": [],
"source": [
"! aws s3 ls {s3_its_filename} --recursive\n",
"! aws s3 ls {s3_users_filename} --recursive\n",
"! aws s3 ls {s3_items_filename} --recursive"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9444c08d-583d-466d-b383-38dbb4fb8829",
"metadata": {},
"source": [
"## 4. Personalize : Create Dataset Group"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a5abf1c4-e99a-4ff5-bbae-cf841dc689ed",
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import json\n",
"import time\n",
"from datetime import datetime\n",
"\n",
"# Configure the SDK to Personalize:\n",
"personalize = boto3.client('personalize')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "65ffc635-3937-4f40-b07d-0b5487ae97e0",
"metadata": {},
"source": [
"### Creating an IAM Role to access S3 for Personalize "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0713ab0d-1285-4032-81e7-33719b0b6fa3",
"metadata": {},
"outputs": [],
"source": [
"s3 = boto3.client(\"s3\")\n",
"\n",
"policy = {\n",
" \"Version\": \"2012-10-17\",\n",
" \"Id\": \"PersonalizeS3BucketAccessPolicy\",\n",
" \"Statement\": [\n",
" {\n",
" \"Sid\": \"PersonalizeS3BucketAccessPolicy\",\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\n",
" \"Service\": \"personalize.amazonaws.com\"\n",
" },\n",
" \"Action\": [\n",
" \"s3:*\",\n",
" ],\n",
" \"Resource\": [\n",
" \"arn:aws:s3:::{}\".format(bucket),\n",
" \"arn:aws:s3:::{}/*\".format(bucket)\n",
" ]\n",
" }\n",
" ]\n",
"}\n",
"\n",
"s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6aa07729-59ac-4d92-9683-7f98952d4b49",
"metadata": {},
"outputs": [],
"source": [
"suffix = str(np.random.uniform())[4:9]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "597b3fa3-dc39-4eb6-95ad-c63e6ead0e1c",
"metadata": {},
"outputs": [],
"source": [
"iam = boto3.client(\"iam\")\n",
"\n",
"# Create assume_role_policy to create a role that Personalize will use\n",
"role_name = \"PersonalizeRoleDemo\" + suffix\n",
"assume_role_policy_document = {\n",
" \"Version\": \"2012-10-17\",\n",
" \"Statement\": [\n",
" {\n",
" \"Effect\": \"Allow\",\n",
" \"Principal\": {\n",
" \"Service\": \"personalize.amazonaws.com\"\n",
" },\n",
" \"Action\": \"sts:AssumeRole\"\n",
" }\n",
" ]\n",
"}\n",
"\n",
"# Create a role to be used by Personalize\n",
"create_role_response = iam.create_role(\n",
" RoleName = role_name,\n",
" AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)\n",
")\n",
"\n",
"# Add AmazonPersonalizeFullAccess permission to the role created above\n",
"policy_arn = \"arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess\"\n",
"iam.attach_role_policy(\n",
" RoleName = role_name,\n",
" PolicyArn = policy_arn\n",
")\n",
"\n",
"# Add AmazonS3FullAccess permission to the role created above\n",
"iam.attach_role_policy(\n",
" RoleName=role_name, \n",
" PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess'\n",
")\n",
"time.sleep(15) # wait for 15 seconds to allow IAM role policy attachment to propagate\n",
"\n",
"role_arn = create_role_response[\"Role\"][\"Arn\"]\n",
"print(role_arn)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c4931c49-346d-48de-8e19-3d36ba825384",
"metadata": {},
"source": [
"### Create Dataset Group"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a584eee-641f-41c4-81bf-f2fa752eb686",
"metadata": {},
"outputs": [],
"source": [
"create_dataset_group_response = personalize.create_dataset_group(\n",
" name = \"RetailDemo-dataset-group\" + suffix\n",
")\n",
"\n",
"dataset_group_arn = create_dataset_group_response['datasetGroupArn']\n",
"dataset_group_arn"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2d9565ba-6f68-497b-9cb8-66d7f1052c94",
"metadata": {},
"source": [
"#### Waiting for Dataset Group to become Active\n",
"Dataset Group creation usually becomes active within 30 seconds."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b933acaf-ef36-4194-b632-e75f478a7f67",
"metadata": {},
"outputs": [],
"source": [
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
" describe_dataset_group_response = personalize.describe_dataset_group(\n",
" datasetGroupArn = dataset_group_arn\n",
" )\n",
" status = describe_dataset_group_response[\"datasetGroup\"][\"status\"]\n",
" print(\"DatasetGroup: {}\".format(status))\n",
" \n",
" if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
" break\n",
" \n",
" time.sleep(15)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "353ea40e-9fa9-47a6-8723-fcc4bde2af50",
"metadata": {},
"source": [
"### Create Schema"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "03ef0a20-1083-4906-939a-04c96c2f6afc",
"metadata": {},
"source": [
"#### for INTERACTIONS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ca5a825-5a19-4e27-90c5-1e907f69e4ed",
"metadata": {},
"outputs": [],
"source": [
"interaction_schema_name=\"RetailDemo-interaction-schema\" + suffix\n",
"\n",
"schema = {\n",
" \"type\": \"record\",\n",
" \"name\": \"Interactions\",\n",
" \"namespace\": \"com.amazonaws.personalize.schema\",\n",
" \"fields\": [\n",
" {\n",
" \"name\": \"USER_ID\",\n",
" \"type\": \"string\"\n",
" },\n",
" {\n",
" \"name\": \"ITEM_ID\",\n",
" \"type\": \"string\"\n",
" },\n",
" { \n",
" \"name\": \"EVENT_TYPE\",\n",
" \"type\": \"string\"\n",
" }, \n",
" {\n",
" \"name\": \"TIMESTAMP\",\n",
" \"type\": \"long\"\n",
" }\n",
" ],\n",
" \"version\": \"1.0\"\n",
"}\n",
"\n",
"\n",
"create_schema_response = personalize.create_schema( \n",
" name = interaction_schema_name,\n",
" schema = json.dumps(schema)\n",
")\n",
"\n",
"interaction_schema_arn = create_schema_response['schemaArn']\n",
"print(json.dumps(create_schema_response, indent=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8ffb6106-cd13-4642-8337-c6b4da998b51",
"metadata": {},
"source": [
"#### for ITEMS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ac3ddf9-99f5-4c6e-95ae-ce031a5b93bc",
"metadata": {},
"outputs": [],
"source": [
"item_schema_name=\"RetailDemo-item-schema\" + suffix\n",
"\n",
"schema = {\n",
" \"type\": \"record\",\n",
" \"name\": \"Items\",\n",
" \"namespace\": \"com.amazonaws.personalize.schema\",\n",
" \"fields\": [\n",
" {\n",
" \"name\": \"ITEM_ID\",\n",
" \"type\": \"string\"\n",
" },\n",
" {\n",
" \"name\": \"NAME\",\n",
" \"type\": \"string\"\n",
" },\n",
" {\n",
" \"name\": \"CATEGORY_L1\",\n",
" \"type\": [\n",
" \"string\"\n",
" ],\n",
" \"categorical\": True\n",
" },\n",
" {\n",
" \"name\": \"STYLE\",\n",
" \"type\": [\n",
" \"string\"\n",
" ],\n",
" \"categorical\": True\n",
" },\n",
" {\n",
" \"name\": \"PRODUCT_DESCRIPTION\",\n",
" \"type\": \"string\"\n",
" },\n",
" {\n",
" \"name\": \"PRICE\",\n",
" \"type\": \"float\"\n",
" }, \n",
" ],\n",
" \"version\": \"1.0\"\n",
"}\n",
"\n",
"create_metadata_schema_response = personalize.create_schema( \n",
" name = item_schema_name,\n",
" schema = json.dumps(schema)\n",
")\n",
"\n",
"item_schema_arn = create_metadata_schema_response['schemaArn']\n",
"print(json.dumps(create_metadata_schema_response, indent=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5df8191d-8fae-4ce9-9952-015bce72e447",
"metadata": {},
"source": [
"#### for USERS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c9814e4-4706-4b5d-9b14-f43ac7952a14",
"metadata": {},
"outputs": [],
"source": [
"user_schema_name=\"RetailDemo-user-schema\" + suffix\n",
"\n",
"schema = {\n",
" \"type\": \"record\",\n",
" \"name\": \"Users\",\n",
" \"namespace\": \"com.amazonaws.personalize.schema\",\n",
" \"fields\": [\n",
" {\n",
" \"name\": \"USER_ID\",\n",
" \"type\": \"string\"\n",
" },\n",
" {\n",
" \"name\": \"USER_NAME\",\n",
" \"type\": \"string\"\n",
" }, \n",
" {\n",
" \"name\": \"GENDER\",\n",
" \"type\": [\n",
" \"string\"\n",
" ],\n",
" \"categorical\": True\n",
" } \n",
" ],\n",
" \"version\": \"1.0\"\n",
"}\n",
"\n",
"create_metadata_schema_response = personalize.create_schema( \n",
" name = user_schema_name,\n",
" schema = json.dumps(schema)\n",
")\n",
"\n",
"user_schema_arn = create_metadata_schema_response['schemaArn']\n",
"print(json.dumps(create_metadata_schema_response, indent=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6e958b16-9470-4fff-9221-555ce32562bf",
"metadata": {},
"source": [
"## 5. Personalize : Create Dataset"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e4f1176c-b277-4624-a139-a207a7559194",
"metadata": {},
"source": [
"#### for INTERACTIONS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "775ad313-20bc-43d9-bf81-1990f8859cdf",
"metadata": {},
"outputs": [],
"source": [
"dataset_type = \"INTERACTIONS\"\n",
"create_dataset_response = personalize.create_dataset(\n",
" name = \"RetailDemo-interaction-dataset\" + suffix,\n",
" datasetType = dataset_type,\n",
" datasetGroupArn = dataset_group_arn,\n",
" schemaArn = interaction_schema_arn\n",
")\n",
"\n",
"interaction_dataset_arn = create_dataset_response['datasetArn']\n",
"print(json.dumps(create_dataset_response, indent=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5432c4fe-8000-4e25-b509-9d41184da5cf",
"metadata": {},
"source": [
"#### for ITEMS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22cb34f5-00e0-45b8-a74f-1dfe848b86c4",
"metadata": {},
"outputs": [],
"source": [
"dataset_type = \"ITEMS\"\n",
"create_item_dataset_response = personalize.create_dataset(\n",
" name = \"RetailDemo-item-dataset\" + suffix,\n",
" datasetType = dataset_type,\n",
" datasetGroupArn = dataset_group_arn,\n",
" schemaArn = item_schema_arn,\n",
" \n",
")\n",
"\n",
"item_dataset_arn = create_item_dataset_response['datasetArn']\n",
"print(json.dumps(create_item_dataset_response, indent=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1535eda0-83fb-4d92-ad3a-fce3ce3491d1",
"metadata": {},
"source": [
"#### for USERS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3ff752b-88c5-4603-a35f-170c3791ec2d",
"metadata": {},
"outputs": [],
"source": [
"dataset_type = \"USERS\"\n",
"create_user_dataset_response = personalize.create_dataset(\n",
" name = \"RetailDemo-user-dataset\" + suffix,\n",
" datasetType = dataset_type,\n",
" datasetGroupArn = dataset_group_arn,\n",
" schemaArn = user_schema_arn,\n",
" \n",
")\n",
"\n",
"user_dataset_arn = create_user_dataset_response['datasetArn']\n",
"print(json.dumps(create_user_dataset_response, indent=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5cd44969-64d0-4ac3-b663-29661ced4407",
"metadata": {},
"source": [
"#### wait for 1 minute(or less) until Dataset creation is complete"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4633e820-0be4-4cd4-96ef-c03fa2783130",
"metadata": {},
"outputs": [],
"source": [
"time.sleep(60)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6c971d53-c099-4bfc-90fc-370c87db3d46",
"metadata": {},
"source": [
"## 6. Personalize : Import Dataset "
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "00b0da02-b195-4744-b406-b6a26fcd806b",
"metadata": {},
"source": [
"#### INTERACTIONS Dataset - Create Import Job"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ca7c9b9-c630-4e56-97f6-f0478a0ff205",
"metadata": {},
"outputs": [],
"source": [
"create_dataset_import_job_response = personalize.create_dataset_import_job(\n",
" jobName = \"RetailDeom-interaction-dataset-import\" + suffix,\n",
" datasetArn = interaction_dataset_arn,\n",
" dataSource = {\n",
" \"dataLocation\": \"s3://{}/{}\".format(bucket, its_filename)\n",
" },\n",
" roleArn = role_arn\n",
")\n",
"\n",
"interation_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']\n",
"print(json.dumps(create_dataset_import_job_response, indent=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "06e894cf-5bee-48bf-9c6a-d13e7ce7535d",
"metadata": {},
"source": [
"#### ITEMS Dataset - Create Import Job"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a79ddbdf-f8a0-453a-8005-78e27fb9d495",
"metadata": {},
"outputs": [],
"source": [
"create_item_dataset_import_job_response = personalize.create_dataset_import_job(\n",
" jobName = \"RetailDemo-item-dataset-import\" + suffix,\n",
" datasetArn = item_dataset_arn,\n",
" dataSource = {\n",
" \"dataLocation\": \"s3://{}/{}\".format(bucket, items_filename)\n",
" },\n",
" roleArn = role_arn\n",
")\n",
"\n",
"item_dataset_import_job_arn = create_item_dataset_import_job_response['datasetImportJobArn']\n",
"print(json.dumps(create_item_dataset_import_job_response, indent=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "cae29267-7a34-4312-a082-1e6d6faa492f",
"metadata": {},
"source": [
"#### USERS Dataset - Create Import Job"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6ffe0b02-48e9-4096-bd23-2eebcbc68714",
"metadata": {},
"outputs": [],
"source": [
"create_user_dataset_import_job_response = personalize.create_dataset_import_job(\n",
" jobName = \"RetailDemo-user-dataset-import\" + suffix,\n",
" datasetArn = user_dataset_arn,\n",
" dataSource = {\n",
" \"dataLocation\": \"s3://{}/{}\".format(bucket, users_filename)\n",
" },\n",
" roleArn = role_arn\n",
")\n",
"\n",
"user_dataset_import_job_arn = create_user_dataset_import_job_response['datasetImportJobArn']\n",
"print(json.dumps(create_user_dataset_import_job_response, indent=2))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4ec5c03b-14da-4912-a3f5-9cdbc7d89492",
"metadata": {},
"source": [
"#### All Dataset Import tasks must be completed before proceeding with the next step.\n",
"#### Therefore, it waits until all three datasets below become ACTIVE."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2c02ce5b-94fc-453c-9127-c7d0b17256c0",
"metadata": {},
"source": [
"#### import job status of INTERACTIONS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ace1fbd4-cf29-49dd-b41c-6bb4edfd6ff1",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"status = None\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
" describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
" datasetImportJobArn = interation_dataset_import_job_arn\n",
" )\n",
" \n",
" dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n",
" if \"latestDatasetImportJobRun\" not in dataset_import_job:\n",
" status = dataset_import_job[\"status\"]\n",
" print(\"DatasetImportJob: {}\".format(status))\n",
" else:\n",
" status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n",
" print(\"LatestDatasetImportJobRun: {}\".format(status))\n",
" \n",
" if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
" break\n",
" \n",
" time.sleep(15)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e6dca136-eead-459c-a47a-41bbfd05f59b",
"metadata": {},
"source": [
"#### import job status of ITEMS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7223fe1c-b037-42f7-b80b-4593e43e02a5",
"metadata": {},
"outputs": [],
"source": [
"status = None\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
" describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
" datasetImportJobArn = item_dataset_import_job_arn\n",
" )\n",
" \n",
" dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n",
" if \"latestDatasetImportJobRun\" not in dataset_import_job:\n",
" status = dataset_import_job[\"status\"]\n",
" print(\"DatasetImportJob: {}\".format(status))\n",
" else:\n",
" status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n",
" print(\"LatestDatasetImportJobRun: {}\".format(status))\n",
" \n",
" if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
" break\n",
" \n",
" time.sleep(15)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ae1d60d4-106f-480d-9c43-ba6a5b7fd48f",
"metadata": {},
"source": [
"#### import job status of USERS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05796a65-5ed5-4da4-a2e1-c94688a037cd",
"metadata": {},
"outputs": [],
"source": [
"status = None\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
" describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n",
" datasetImportJobArn = user_dataset_import_job_arn\n",
" )\n",
" \n",
" dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n",
" if \"latestDatasetImportJobRun\" not in dataset_import_job:\n",
" status = dataset_import_job[\"status\"]\n",
" print(\"DatasetImportJob: {}\".format(status))\n",
" else:\n",
" status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n",
" print(\"LatestDatasetImportJobRun: {}\".format(status))\n",
" \n",
" if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
" break\n",
" \n",
" time.sleep(15)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "864e5eee-815e-44c1-a21e-ad5de48b53f9",
"metadata": {},
"source": [
"## 7. Personalize : Create Solution"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c1089201-4644-4714-8f96-1ca2a0531de6",
"metadata": {},
"source": [
"### Create Solution with \"AWS-USER-PERSONALIZATION\" recipe"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "82fbdbd2-5d27-4854-98d6-653797a0d09c",
"metadata": {},
"outputs": [],
"source": [
"# Define the solution details\n",
"solution_name = \"RetailDemo-user-personalization\"\n",
"recipe_arn = \"arn:aws:personalize:::recipe/aws-user-personalization\"\n",
"perform_hpo = False # set to true if you want to perform hyperparameter optimization\n",
"\n",
"# Create the solution\n",
"create_solution_response = personalize.create_solution(\n",
" name=solution_name,\n",
" recipeArn=recipe_arn,\n",
" performHPO=perform_hpo,\n",
" datasetGroupArn = dataset_group_arn,\n",
" solutionConfig = {\n",
" \"algorithmHyperParameters\": {\n",
" \"bptt\": \"32\",\n",
" \"hidden_dimension\": \"149\",\n",
" \"recency_mask\": \"true\"\n",
" },\n",
" \"featureTransformationParameters\": {\n",
" \"max_user_history_length_percentile\": \"0.99\",\n",
" \"min_user_history_length_percentile\": \"0.00\"\n",
" }\n",
" }\n",
")\n",
"\n",
"# Get the solution ARN\n",
"solution_arn = create_solution_response['solutionArn']\n",
"print(f'Solution ARN: {solution_arn}')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "66c1cb5f-3d6d-401c-bc22-179fe19f1987",
"metadata": {},
"source": [
"### Create Solution Version"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ebb511b-669e-4c9e-ac09-0b252adfbae9",
"metadata": {},
"outputs": [],
"source": [
"# Create the solution version\n",
"create_solution_version_response = personalize.create_solution_version(\n",
" solutionArn=solution_arn\n",
")\n",
"\n",
"# Get the solution version ARN\n",
"solution_version_arn = create_solution_version_response['solutionVersionArn']\n",
"print(f'Solution version ARN: {solution_version_arn}')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d7338d57-1dd1-42e9-8896-903208c73760",
"metadata": {},
"source": [
"#### Wait until Solution Version is in ACTIVE state\n",
"It takes about 20-30 minutes.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "401f0606-251a-43e6-9349-f875bae8b3e7",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
"\n",
" # status_aws_user_personalization\n",
" describe_solution_response = personalize.describe_solution_version(\n",
" solutionVersionArn = solution_version_arn\n",
" ) \n",
" status_solution = describe_solution_response['solutionVersion'][\"status\"]\n",
" print(\"status_user-personalization : {}\".format(status_solution))\n",
" \n",
" \n",
" if (status_solution == \"ACTIVE\" or status_solution == \"CREATE FAILED\") :\n",
" break\n",
" print(\"-------------------------------------->\")\n",
" time.sleep(30)\n",
"\n",
"print(\"Generating solution version is completed\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "23ed4de0-a5e8-485c-99a4-a3b653cbdf4d",
"metadata": {},
"source": [
"## 8. Personalize : Create Campaign"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a13983a-ae69-471c-93b4-d4cd57feca3b",
"metadata": {},
"outputs": [],
"source": [
"create_campaign_reponse = personalize.create_campaign(\n",
" name = 'RetailDemo-campaign' + suffix,\n",
" solutionVersionArn = solution_version_arn,\n",
" minProvisionedTPS=1\n",
")\n",
"\n",
"campaign_arn = create_campaign_reponse['campaignArn']\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "bac029c2-7dd9-4ea2-bbfc-559c7db53ed2",
"metadata": {},
"source": [
"#### Wait for Campaign creation to complete\n",
"It takes about 7 minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e7e32f36-5c48-45e7-b70c-8f90d4e8b3ec",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"max_time = time.time() + 3*60*60 # 3 hours\n",
"while time.time() < max_time:\n",
"\n",
" # status_aws_user_personalization\n",
" describe_campaign_response = personalize.describe_campaign(\n",
" campaignArn = campaign_arn\n",
" ) \n",
" status_campaign = describe_campaign_response['campaign'][\"status\"]\n",
" print(\"status_creating_campaign : {}\".format(status_campaign))\n",
" \n",
" \n",
" if (status_campaign == \"ACTIVE\" or status_campaign == \"CREATE FAILED\") :\n",
" break\n",
" print(\"-------------------------------------->\")\n",
" time.sleep(60)\n",
"\n",
"print(\"Creating Campaign is completed\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "682fc7e3-8cc5-41a5-a0d4-08a3ddbc4686",
"metadata": {},
"source": [
"#### save variable\n",
"Save variables needed for clean-up"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "69cf51d8-d822-4d74-b41f-e6859bc85276",
"metadata": {},
"outputs": [],
"source": [
"%store dataset_group_arn\n",
"%store interaction_schema_arn\n",
"%store item_schema_arn\n",
"%store user_schema_arn\n",
"%store interaction_dataset_arn\n",
"%store item_dataset_arn\n",
"%store user_dataset_arn\n",
"%store solution_arn\n",
"%store campaign_arn\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "f07fddb0-e5e1-47b7-b0c0-85608cd1d53c",
"metadata": {},
"source": [
"# You can make an inference request with the Personalize Campaign ARN below.\n",
"In the Lambda Function, Personalize Campaign uses the Personalize Campaign ARN below."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "662a455c-c172-4340-aac8-c1be7aac5f42",
"metadata": {},
"outputs": [],
"source": [
"print(\"Personalize Campaign ARN : \", campaign_arn)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2b97d05d-6362-4ed0-a7b8-b6f7dd407d29",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}