{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "27cc3268-2230-48d3-b79f-6c9e51ca0b3a", "metadata": {}, "source": [ "# Amazon Personalize - From Data Preparation to Campaign Deployment\n", "\n", "This notebook uses `conda_python3` as the default kernel.\n", "
\n", "Deploy Personalize Campaign by running cells sequentially from start to finish." ] }, { "cell_type": "markdown", "id": "a222cc25", "metadata": {}, "source": [] }, { "attachments": {}, "cell_type": "markdown", "id": "9c13c5b0-79c5-41da-adb6-7ae09801f2fd", "metadata": {}, "source": [ "## 0. Setting Environment\n", "\n", "(Optional) Run boto3 sdk upgrade if needed.\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1c9244fa-ffd9-4f35-acdc-e42fef9e59f5", "metadata": {}, "source": [ "### boto3 Upgrade (Optional)" ] }, { "cell_type": "code", "execution_count": null, "id": "cf7d9b34-7f86-4704-b6cb-d94c48e070a1", "metadata": {}, "outputs": [], "source": [ "# !pip install boto3 --upgrade" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1924cfee-75b2-476d-9643-57e66658f97e", "metadata": {}, "source": [ "## 1. Data Preparation\n", "\n", "We use the dataset from the Retail Demo Store below.\n", "- It is used by unpacking the tar archive.\n", "\n", "* Retail Demo Store\n", " * https://github.com/aws-samples/retail-demo-store" ] }, { "cell_type": "code", "execution_count": null, "id": "c60d64e7-8e20-4f3b-8d0d-1bcee84f6ce8", "metadata": {}, "outputs": [], "source": [ "import tarfile\n", "\n", "tf = tarfile.open(\"../data/RetailDemoDataSet.tar\")\n", "tf.extractall(\"../data\")" ] }, { "cell_type": "code", "execution_count": null, "id": "3a1d0139-ecc3-441e-ac33-db41c31331a5", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "items = pd.read_csv('../data/items.csv')\n", "users = pd.read_csv('../data/users.csv')\n", "its = pd.read_csv('../data/interactions.csv')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "29301bf4-7676-4c7f-9c4a-eb370080e356", "metadata": {}, "source": [ "## 2. Data Preprocessing" ] }, { "cell_type": "code", "execution_count": null, "id": "c27d6f3a-56a8-47b8-af11-fc7a294cee98", "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import json\n", "import numpy as np\n", "import pandas as pd\n", "import time\n", "from datetime import datetime\n", "\n", "import matplotlib.pyplot as plt" ] }, { "attachments": {}, "cell_type": "markdown", "id": "79adaf49-ffb2-4325-973e-ca1de224f9b5", "metadata": {}, "source": [ "### Edit columns of ITEMS dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "4944e7d6-9673-4a6f-9915-c9ab2ce8176d", "metadata": {}, "outputs": [], "source": [ "items.columns" ] }, { "cell_type": "code", "execution_count": null, "id": "77efc9af-0c84-4570-b666-264908faf406", "metadata": {}, "outputs": [], "source": [ "def item_data_selection(df, cols):\n", " ldf = df[cols]\n", " ldf = ldf.rename(columns={'id':'ITEM_ID',\n", " 'name' : 'NAME',\n", " 'category' :'CATEGORY_L1',\n", " 'style' : 'STYLE',\n", " 'description' : 'PRODUCT_DESCRIPTION',\n", " 'price' : 'PRICE',\n", " })\n", " return ldf\n", "\n", "\n", "item_cols = ['id', 'name', 'category', 'style', 'description','price']\n", "items_df = item_data_selection(items, item_cols) \n", "\n", "items_df.head(3)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5fb2d4d3-ba76-49d8-ae57-bd3449866899", "metadata": {}, "source": [ "### Edit columns of USERS dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "35a823d5-6917-4619-965d-6a571b7b11cc", "metadata": {}, "outputs": [], "source": [ "users.columns" ] }, { "cell_type": "code", "execution_count": null, "id": "35b83662-3f1c-4e0f-bb16-f6ac99d3ded1", "metadata": {}, "outputs": [], "source": [ "def user_data_selection(df, cols):\n", " ldf = df[cols]\n", " ldf = ldf.rename(columns={'id':'USER_ID',\n", " 'username' : 'USER_NAME',\n", " 'age' :'AGE',\n", " 'gender' : 'GENDER', \n", " })\n", " return ldf\n", "\n", "user_cols = ['id', 'username', 'age', 'gender']\n", "\n", "users_df = user_data_selection(users, user_cols) \n", "users_df.head(3)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e9261103-46f7-4825-87dc-b3dc57335cf7", "metadata": {}, "source": [ "### Modify data type of ITEMS dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "3daa6f0b-7118-4e0d-8ac1-0d735906457d", "metadata": {}, "outputs": [], "source": [ "users_df.info()" ] }, { "cell_type": "code", "execution_count": null, "id": "10a40f0e-2d7c-4f90-94aa-f98555fa2b29", "metadata": {}, "outputs": [], "source": [ "def change_data_type(df, col, target_type):\n", " ldf = df.copy()\n", " ldf[col] = ldf[col].astype(target_type)\n", " \n", " return ldf\n", "\n", "users_df = change_data_type(users_df, col='USER_ID', target_type='object')\n", "users_df.info()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4ed5349f-3872-4fed-90f1-eb6ec3cdd4e6", "metadata": {}, "source": [ "### Edit columns of INTERACTIONS dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "1e297e60-4be8-4956-808f-def6d05d8f5e", "metadata": {}, "outputs": [], "source": [ "its.columns" ] }, { "cell_type": "code", "execution_count": null, "id": "a739ecab-b46b-4735-8690-57e6a5b5f461", "metadata": {}, "outputs": [], "source": [ "def interactions_data_selection(df, cols):\n", " ldf = df[cols]\n", " ldf = ldf.rename(columns={'id':'USER_ID',\n", " 'username' : 'USER_NAME',\n", " 'age' :'AGE',\n", " 'gender' : 'GENDER', \n", " })\n", " return ldf\n", "\n", "interactions_cols = ['ITEM_ID', 'USER_ID', 'EVENT_TYPE', 'TIMESTAMP']\n", "\n", "full_interactions_df = interactions_data_selection(its, interactions_cols) \n", "full_interactions_df.head(3)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5a749716-8999-43b7-9fad-df609d287f39", "metadata": {}, "source": [ "### Edit EVENT_TYPE column of INTERACTIONS dataset \n", "\n", "Select only ProductViewd and OrderCompleted for EVENT_TYPE and change the names to `View` and `Purchase` respectively." ] }, { "cell_type": "code", "execution_count": null, "id": "748281d6-9f58-4fc3-b86b-74cfedb2238e", "metadata": {}, "outputs": [], "source": [ "full_interactions_df.EVENT_TYPE.value_counts()" ] }, { "cell_type": "code", "execution_count": null, "id": "20ca2d73-5ad7-4cf7-8fc5-e536b39b387f", "metadata": {}, "outputs": [], "source": [ "def filter_interactions_data(df, kinds_event_type):\n", " ldf = df[df['EVENT_TYPE'].isin(kinds_event_type)]\n", " ldf['EVENT_TYPE'] = ldf['EVENT_TYPE'].replace(['ProductViewed'],'View') \n", " ldf['EVENT_TYPE'] = ldf['EVENT_TYPE'].replace(['OrderCompleted'],'Purchase') \n", " \n", " return ldf\n", "\n", "select_event_types = ['ProductViewed','OrderCompleted']\n", "interactions_df = filter_interactions_data(full_interactions_df, select_event_types)\n", "interactions_df" ] }, { "attachments": {}, "cell_type": "markdown", "id": "06125e9e-18a2-4116-b4cc-d15bda268773", "metadata": {}, "source": [ "### Edit columns of INTERACTIONS dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "218d43d9-c058-446b-824f-03334bcd8413", "metadata": {}, "outputs": [], "source": [ "interactions_df.info()" ] }, { "cell_type": "code", "execution_count": null, "id": "b322780f-39be-4f23-9c8a-65a6592e1e62", "metadata": {}, "outputs": [], "source": [ "interactions_df = change_data_type(interactions_df, col='USER_ID', target_type='object')\n", "interactions_df.info()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e49c1a85-80a5-4e4c-883d-b5823e70c238", "metadata": {}, "source": [ "## 3. Upload the dataset to S3" ] }, { "cell_type": "code", "execution_count": null, "id": "b2080abc-b028-4110-a99d-f08edddf5f03", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "\n", "bucket='' # replace with the name of your S3 bucket\n", "bucket" ] }, { "cell_type": "code", "execution_count": null, "id": "b37722d3-5b4b-4f5c-87f7-55d56f8ad2b1", "metadata": {}, "outputs": [], "source": [ "import os\n", "os.makedirs('dataset', exist_ok=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "0433896b-b0fd-47ff-9f34-11153720ae23", "metadata": {}, "outputs": [], "source": [ "items_filename = \"dataset/training_item.csv\"\n", "users_filename = \"dataset/training_user.csv\"\n", "its_filename = \"dataset/training_interaction.csv\"\n", "\n", "items_df.to_csv(items_filename,index=False)\n", "users_df.to_csv(users_filename,index=False)\n", "interactions_df.to_csv(its_filename,index=False)" ] }, { "cell_type": "code", "execution_count": null, "id": "e5ec17f5-2d6b-441b-8159-fdd61ded6ff5", "metadata": {}, "outputs": [], "source": [ "#upload file for training\n", "response_upload = boto3.Session().resource('s3').Bucket(bucket).Object(its_filename).upload_file(its_filename)\n", "boto3.Session().resource('s3').Bucket(bucket).Object(users_filename).upload_file(users_filename)\n", "boto3.Session().resource('s3').Bucket(bucket).Object(items_filename).upload_file(items_filename)\n", "\n", "s3_its_filename = \"s3://{}/{}\".format(bucket, its_filename)\n", "s3_users_filename = \"s3://{}/{}\".format(bucket, users_filename)\n", "s3_items_filename = \"s3://{}/{}\".format(bucket, items_filename)\n", "\n", "print(\"s3_train_interaction_filename: \\n\", s3_its_filename)\n", "print(\"s3_train_users_filename: \\n\", s3_users_filename)\n", "print(\"s3_train_items_filename: \\n\", s3_items_filename)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d81bd337-1457-4447-bef4-ad996c8b65df", "metadata": {}, "outputs": [], "source": [ "! aws s3 ls {s3_its_filename} --recursive\n", "! aws s3 ls {s3_users_filename} --recursive\n", "! aws s3 ls {s3_items_filename} --recursive" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9444c08d-583d-466d-b383-38dbb4fb8829", "metadata": {}, "source": [ "## 4. Personalize : Create Dataset Group" ] }, { "cell_type": "code", "execution_count": null, "id": "a5abf1c4-e99a-4ff5-bbae-cf841dc689ed", "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import json\n", "import time\n", "from datetime import datetime\n", "\n", "# Configure the SDK to Personalize:\n", "personalize = boto3.client('personalize')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "65ffc635-3937-4f40-b07d-0b5487ae97e0", "metadata": {}, "source": [ "### Creating an IAM Role to access S3 for Personalize " ] }, { "cell_type": "code", "execution_count": null, "id": "0713ab0d-1285-4032-81e7-33719b0b6fa3", "metadata": {}, "outputs": [], "source": [ "s3 = boto3.client(\"s3\")\n", "\n", "policy = {\n", " \"Version\": \"2012-10-17\",\n", " \"Id\": \"PersonalizeS3BucketAccessPolicy\",\n", " \"Statement\": [\n", " {\n", " \"Sid\": \"PersonalizeS3BucketAccessPolicy\",\n", " \"Effect\": \"Allow\",\n", " \"Principal\": {\n", " \"Service\": \"personalize.amazonaws.com\"\n", " },\n", " \"Action\": [\n", " \"s3:*\",\n", " ],\n", " \"Resource\": [\n", " \"arn:aws:s3:::{}\".format(bucket),\n", " \"arn:aws:s3:::{}/*\".format(bucket)\n", " ]\n", " }\n", " ]\n", "}\n", "\n", "s3.put_bucket_policy(Bucket=bucket, Policy=json.dumps(policy))" ] }, { "cell_type": "code", "execution_count": null, "id": "6aa07729-59ac-4d92-9683-7f98952d4b49", "metadata": {}, "outputs": [], "source": [ "suffix = str(np.random.uniform())[4:9]" ] }, { "cell_type": "code", "execution_count": null, "id": "597b3fa3-dc39-4eb6-95ad-c63e6ead0e1c", "metadata": {}, "outputs": [], "source": [ "iam = boto3.client(\"iam\")\n", "\n", "# Create assume_role_policy to create a role that Personalize will use\n", "role_name = \"PersonalizeRoleDemo\" + suffix\n", "assume_role_policy_document = {\n", " \"Version\": \"2012-10-17\",\n", " \"Statement\": [\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Principal\": {\n", " \"Service\": \"personalize.amazonaws.com\"\n", " },\n", " \"Action\": \"sts:AssumeRole\"\n", " }\n", " ]\n", "}\n", "\n", "# Create a role to be used by Personalize\n", "create_role_response = iam.create_role(\n", " RoleName = role_name,\n", " AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)\n", ")\n", "\n", "# Add AmazonPersonalizeFullAccess permission to the role created above\n", "policy_arn = \"arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess\"\n", "iam.attach_role_policy(\n", " RoleName = role_name,\n", " PolicyArn = policy_arn\n", ")\n", "\n", "# Add AmazonS3FullAccess permission to the role created above\n", "iam.attach_role_policy(\n", " RoleName=role_name, \n", " PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess'\n", ")\n", "time.sleep(15) # wait for 15 seconds to allow IAM role policy attachment to propagate\n", "\n", "role_arn = create_role_response[\"Role\"][\"Arn\"]\n", "print(role_arn)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c4931c49-346d-48de-8e19-3d36ba825384", "metadata": {}, "source": [ "### Create Dataset Group" ] }, { "cell_type": "code", "execution_count": null, "id": "5a584eee-641f-41c4-81bf-f2fa752eb686", "metadata": {}, "outputs": [], "source": [ "create_dataset_group_response = personalize.create_dataset_group(\n", " name = \"RetailDemo-dataset-group\" + suffix\n", ")\n", "\n", "dataset_group_arn = create_dataset_group_response['datasetGroupArn']\n", "dataset_group_arn" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2d9565ba-6f68-497b-9cb8-66d7f1052c94", "metadata": {}, "source": [ "#### Waiting for Dataset Group to become Active\n", "Dataset Group creation usually becomes active within 30 seconds." ] }, { "cell_type": "code", "execution_count": null, "id": "b933acaf-ef36-4194-b632-e75f478a7f67", "metadata": {}, "outputs": [], "source": [ "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", " describe_dataset_group_response = personalize.describe_dataset_group(\n", " datasetGroupArn = dataset_group_arn\n", " )\n", " status = describe_dataset_group_response[\"datasetGroup\"][\"status\"]\n", " print(\"DatasetGroup: {}\".format(status))\n", " \n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " \n", " time.sleep(15)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "353ea40e-9fa9-47a6-8723-fcc4bde2af50", "metadata": {}, "source": [ "### Create Schema" ] }, { "attachments": {}, "cell_type": "markdown", "id": "03ef0a20-1083-4906-939a-04c96c2f6afc", "metadata": {}, "source": [ "#### for INTERACTIONS" ] }, { "cell_type": "code", "execution_count": null, "id": "8ca5a825-5a19-4e27-90c5-1e907f69e4ed", "metadata": {}, "outputs": [], "source": [ "interaction_schema_name=\"RetailDemo-interaction-schema\" + suffix\n", "\n", "schema = {\n", " \"type\": \"record\",\n", " \"name\": \"Interactions\",\n", " \"namespace\": \"com.amazonaws.personalize.schema\",\n", " \"fields\": [\n", " {\n", " \"name\": \"USER_ID\",\n", " \"type\": \"string\"\n", " },\n", " {\n", " \"name\": \"ITEM_ID\",\n", " \"type\": \"string\"\n", " },\n", " { \n", " \"name\": \"EVENT_TYPE\",\n", " \"type\": \"string\"\n", " }, \n", " {\n", " \"name\": \"TIMESTAMP\",\n", " \"type\": \"long\"\n", " }\n", " ],\n", " \"version\": \"1.0\"\n", "}\n", "\n", "\n", "create_schema_response = personalize.create_schema( \n", " name = interaction_schema_name,\n", " schema = json.dumps(schema)\n", ")\n", "\n", "interaction_schema_arn = create_schema_response['schemaArn']\n", "print(json.dumps(create_schema_response, indent=2))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8ffb6106-cd13-4642-8337-c6b4da998b51", "metadata": {}, "source": [ "#### for ITEMS" ] }, { "cell_type": "code", "execution_count": null, "id": "7ac3ddf9-99f5-4c6e-95ae-ce031a5b93bc", "metadata": {}, "outputs": [], "source": [ "item_schema_name=\"RetailDemo-item-schema\" + suffix\n", "\n", "schema = {\n", " \"type\": \"record\",\n", " \"name\": \"Items\",\n", " \"namespace\": \"com.amazonaws.personalize.schema\",\n", " \"fields\": [\n", " {\n", " \"name\": \"ITEM_ID\",\n", " \"type\": \"string\"\n", " },\n", " {\n", " \"name\": \"NAME\",\n", " \"type\": \"string\"\n", " },\n", " {\n", " \"name\": \"CATEGORY_L1\",\n", " \"type\": [\n", " \"string\"\n", " ],\n", " \"categorical\": True\n", " },\n", " {\n", " \"name\": \"STYLE\",\n", " \"type\": [\n", " \"string\"\n", " ],\n", " \"categorical\": True\n", " },\n", " {\n", " \"name\": \"PRODUCT_DESCRIPTION\",\n", " \"type\": \"string\"\n", " },\n", " {\n", " \"name\": \"PRICE\",\n", " \"type\": \"float\"\n", " }, \n", " ],\n", " \"version\": \"1.0\"\n", "}\n", "\n", "create_metadata_schema_response = personalize.create_schema( \n", " name = item_schema_name,\n", " schema = json.dumps(schema)\n", ")\n", "\n", "item_schema_arn = create_metadata_schema_response['schemaArn']\n", "print(json.dumps(create_metadata_schema_response, indent=2))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5df8191d-8fae-4ce9-9952-015bce72e447", "metadata": {}, "source": [ "#### for USERS" ] }, { "cell_type": "code", "execution_count": null, "id": "6c9814e4-4706-4b5d-9b14-f43ac7952a14", "metadata": {}, "outputs": [], "source": [ "user_schema_name=\"RetailDemo-user-schema\" + suffix\n", "\n", "schema = {\n", " \"type\": \"record\",\n", " \"name\": \"Users\",\n", " \"namespace\": \"com.amazonaws.personalize.schema\",\n", " \"fields\": [\n", " {\n", " \"name\": \"USER_ID\",\n", " \"type\": \"string\"\n", " },\n", " {\n", " \"name\": \"USER_NAME\",\n", " \"type\": \"string\"\n", " }, \n", " {\n", " \"name\": \"GENDER\",\n", " \"type\": [\n", " \"string\"\n", " ],\n", " \"categorical\": True\n", " } \n", " ],\n", " \"version\": \"1.0\"\n", "}\n", "\n", "create_metadata_schema_response = personalize.create_schema( \n", " name = user_schema_name,\n", " schema = json.dumps(schema)\n", ")\n", "\n", "user_schema_arn = create_metadata_schema_response['schemaArn']\n", "print(json.dumps(create_metadata_schema_response, indent=2))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6e958b16-9470-4fff-9221-555ce32562bf", "metadata": {}, "source": [ "## 5. Personalize : Create Dataset" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e4f1176c-b277-4624-a139-a207a7559194", "metadata": {}, "source": [ "#### for INTERACTIONS" ] }, { "cell_type": "code", "execution_count": null, "id": "775ad313-20bc-43d9-bf81-1990f8859cdf", "metadata": {}, "outputs": [], "source": [ "dataset_type = \"INTERACTIONS\"\n", "create_dataset_response = personalize.create_dataset(\n", " name = \"RetailDemo-interaction-dataset\" + suffix,\n", " datasetType = dataset_type,\n", " datasetGroupArn = dataset_group_arn,\n", " schemaArn = interaction_schema_arn\n", ")\n", "\n", "interaction_dataset_arn = create_dataset_response['datasetArn']\n", "print(json.dumps(create_dataset_response, indent=2))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5432c4fe-8000-4e25-b509-9d41184da5cf", "metadata": {}, "source": [ "#### for ITEMS" ] }, { "cell_type": "code", "execution_count": null, "id": "22cb34f5-00e0-45b8-a74f-1dfe848b86c4", "metadata": {}, "outputs": [], "source": [ "dataset_type = \"ITEMS\"\n", "create_item_dataset_response = personalize.create_dataset(\n", " name = \"RetailDemo-item-dataset\" + suffix,\n", " datasetType = dataset_type,\n", " datasetGroupArn = dataset_group_arn,\n", " schemaArn = item_schema_arn,\n", " \n", ")\n", "\n", "item_dataset_arn = create_item_dataset_response['datasetArn']\n", "print(json.dumps(create_item_dataset_response, indent=2))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1535eda0-83fb-4d92-ad3a-fce3ce3491d1", "metadata": {}, "source": [ "#### for USERS" ] }, { "cell_type": "code", "execution_count": null, "id": "a3ff752b-88c5-4603-a35f-170c3791ec2d", "metadata": {}, "outputs": [], "source": [ "dataset_type = \"USERS\"\n", "create_user_dataset_response = personalize.create_dataset(\n", " name = \"RetailDemo-user-dataset\" + suffix,\n", " datasetType = dataset_type,\n", " datasetGroupArn = dataset_group_arn,\n", " schemaArn = user_schema_arn,\n", " \n", ")\n", "\n", "user_dataset_arn = create_user_dataset_response['datasetArn']\n", "print(json.dumps(create_user_dataset_response, indent=2))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5cd44969-64d0-4ac3-b663-29661ced4407", "metadata": {}, "source": [ "#### wait for 1 minute(or less) until Dataset creation is complete" ] }, { "cell_type": "code", "execution_count": null, "id": "4633e820-0be4-4cd4-96ef-c03fa2783130", "metadata": {}, "outputs": [], "source": [ "time.sleep(60)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6c971d53-c099-4bfc-90fc-370c87db3d46", "metadata": {}, "source": [ "## 6. Personalize : Import Dataset " ] }, { "attachments": {}, "cell_type": "markdown", "id": "00b0da02-b195-4744-b406-b6a26fcd806b", "metadata": {}, "source": [ "#### INTERACTIONS Dataset - Create Import Job" ] }, { "cell_type": "code", "execution_count": null, "id": "0ca7c9b9-c630-4e56-97f6-f0478a0ff205", "metadata": {}, "outputs": [], "source": [ "create_dataset_import_job_response = personalize.create_dataset_import_job(\n", " jobName = \"RetailDeom-interaction-dataset-import\" + suffix,\n", " datasetArn = interaction_dataset_arn,\n", " dataSource = {\n", " \"dataLocation\": \"s3://{}/{}\".format(bucket, its_filename)\n", " },\n", " roleArn = role_arn\n", ")\n", "\n", "interation_dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']\n", "print(json.dumps(create_dataset_import_job_response, indent=2))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "06e894cf-5bee-48bf-9c6a-d13e7ce7535d", "metadata": {}, "source": [ "#### ITEMS Dataset - Create Import Job" ] }, { "cell_type": "code", "execution_count": null, "id": "a79ddbdf-f8a0-453a-8005-78e27fb9d495", "metadata": {}, "outputs": [], "source": [ "create_item_dataset_import_job_response = personalize.create_dataset_import_job(\n", " jobName = \"RetailDemo-item-dataset-import\" + suffix,\n", " datasetArn = item_dataset_arn,\n", " dataSource = {\n", " \"dataLocation\": \"s3://{}/{}\".format(bucket, items_filename)\n", " },\n", " roleArn = role_arn\n", ")\n", "\n", "item_dataset_import_job_arn = create_item_dataset_import_job_response['datasetImportJobArn']\n", "print(json.dumps(create_item_dataset_import_job_response, indent=2))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "cae29267-7a34-4312-a082-1e6d6faa492f", "metadata": {}, "source": [ "#### USERS Dataset - Create Import Job" ] }, { "cell_type": "code", "execution_count": null, "id": "6ffe0b02-48e9-4096-bd23-2eebcbc68714", "metadata": {}, "outputs": [], "source": [ "create_user_dataset_import_job_response = personalize.create_dataset_import_job(\n", " jobName = \"RetailDemo-user-dataset-import\" + suffix,\n", " datasetArn = user_dataset_arn,\n", " dataSource = {\n", " \"dataLocation\": \"s3://{}/{}\".format(bucket, users_filename)\n", " },\n", " roleArn = role_arn\n", ")\n", "\n", "user_dataset_import_job_arn = create_user_dataset_import_job_response['datasetImportJobArn']\n", "print(json.dumps(create_user_dataset_import_job_response, indent=2))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4ec5c03b-14da-4912-a3f5-9cdbc7d89492", "metadata": {}, "source": [ "#### All Dataset Import tasks must be completed before proceeding with the next step.\n", "#### Therefore, it waits until all three datasets below become ACTIVE." ] }, { "attachments": {}, "cell_type": "markdown", "id": "2c02ce5b-94fc-453c-9127-c7d0b17256c0", "metadata": {}, "source": [ "#### import job status of INTERACTIONS" ] }, { "cell_type": "code", "execution_count": null, "id": "ace1fbd4-cf29-49dd-b41c-6bb4edfd6ff1", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "status = None\n", "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", " describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n", " datasetImportJobArn = interation_dataset_import_job_arn\n", " )\n", " \n", " dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n", " if \"latestDatasetImportJobRun\" not in dataset_import_job:\n", " status = dataset_import_job[\"status\"]\n", " print(\"DatasetImportJob: {}\".format(status))\n", " else:\n", " status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n", " print(\"LatestDatasetImportJobRun: {}\".format(status))\n", " \n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " \n", " time.sleep(15)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e6dca136-eead-459c-a47a-41bbfd05f59b", "metadata": {}, "source": [ "#### import job status of ITEMS" ] }, { "cell_type": "code", "execution_count": null, "id": "7223fe1c-b037-42f7-b80b-4593e43e02a5", "metadata": {}, "outputs": [], "source": [ "status = None\n", "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", " describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n", " datasetImportJobArn = item_dataset_import_job_arn\n", " )\n", " \n", " dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n", " if \"latestDatasetImportJobRun\" not in dataset_import_job:\n", " status = dataset_import_job[\"status\"]\n", " print(\"DatasetImportJob: {}\".format(status))\n", " else:\n", " status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n", " print(\"LatestDatasetImportJobRun: {}\".format(status))\n", " \n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " \n", " time.sleep(15)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ae1d60d4-106f-480d-9c43-ba6a5b7fd48f", "metadata": {}, "source": [ "#### import job status of USERS" ] }, { "cell_type": "code", "execution_count": null, "id": "05796a65-5ed5-4da4-a2e1-c94688a037cd", "metadata": {}, "outputs": [], "source": [ "status = None\n", "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", " describe_dataset_import_job_response = personalize.describe_dataset_import_job(\n", " datasetImportJobArn = user_dataset_import_job_arn\n", " )\n", " \n", " dataset_import_job = describe_dataset_import_job_response[\"datasetImportJob\"]\n", " if \"latestDatasetImportJobRun\" not in dataset_import_job:\n", " status = dataset_import_job[\"status\"]\n", " print(\"DatasetImportJob: {}\".format(status))\n", " else:\n", " status = dataset_import_job[\"latestDatasetImportJobRun\"][\"status\"]\n", " print(\"LatestDatasetImportJobRun: {}\".format(status))\n", " \n", " if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n", " break\n", " \n", " time.sleep(15)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "864e5eee-815e-44c1-a21e-ad5de48b53f9", "metadata": {}, "source": [ "## 7. Personalize : Create Solution" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c1089201-4644-4714-8f96-1ca2a0531de6", "metadata": {}, "source": [ "### Create Solution with \"AWS-USER-PERSONALIZATION\" recipe" ] }, { "cell_type": "code", "execution_count": null, "id": "82fbdbd2-5d27-4854-98d6-653797a0d09c", "metadata": {}, "outputs": [], "source": [ "# Define the solution details\n", "solution_name = \"RetailDemo-user-personalization\"\n", "recipe_arn = \"arn:aws:personalize:::recipe/aws-user-personalization\"\n", "perform_hpo = False # set to true if you want to perform hyperparameter optimization\n", "\n", "# Create the solution\n", "create_solution_response = personalize.create_solution(\n", " name=solution_name,\n", " recipeArn=recipe_arn,\n", " performHPO=perform_hpo,\n", " datasetGroupArn = dataset_group_arn,\n", " solutionConfig = {\n", " \"algorithmHyperParameters\": {\n", " \"bptt\": \"32\",\n", " \"hidden_dimension\": \"149\",\n", " \"recency_mask\": \"true\"\n", " },\n", " \"featureTransformationParameters\": {\n", " \"max_user_history_length_percentile\": \"0.99\",\n", " \"min_user_history_length_percentile\": \"0.00\"\n", " }\n", " }\n", ")\n", "\n", "# Get the solution ARN\n", "solution_arn = create_solution_response['solutionArn']\n", "print(f'Solution ARN: {solution_arn}')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "66c1cb5f-3d6d-401c-bc22-179fe19f1987", "metadata": {}, "source": [ "### Create Solution Version" ] }, { "cell_type": "code", "execution_count": null, "id": "7ebb511b-669e-4c9e-ac09-0b252adfbae9", "metadata": {}, "outputs": [], "source": [ "# Create the solution version\n", "create_solution_version_response = personalize.create_solution_version(\n", " solutionArn=solution_arn\n", ")\n", "\n", "# Get the solution version ARN\n", "solution_version_arn = create_solution_version_response['solutionVersionArn']\n", "print(f'Solution version ARN: {solution_version_arn}')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d7338d57-1dd1-42e9-8896-903208c73760", "metadata": {}, "source": [ "#### Wait until Solution Version is in ACTIVE state\n", "It takes about 20-30 minutes.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "401f0606-251a-43e6-9349-f875bae8b3e7", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", "\n", " # status_aws_user_personalization\n", " describe_solution_response = personalize.describe_solution_version(\n", " solutionVersionArn = solution_version_arn\n", " ) \n", " status_solution = describe_solution_response['solutionVersion'][\"status\"]\n", " print(\"status_user-personalization : {}\".format(status_solution))\n", " \n", " \n", " if (status_solution == \"ACTIVE\" or status_solution == \"CREATE FAILED\") :\n", " break\n", " print(\"-------------------------------------->\")\n", " time.sleep(30)\n", "\n", "print(\"Generating solution version is completed\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "23ed4de0-a5e8-485c-99a4-a3b653cbdf4d", "metadata": {}, "source": [ "## 8. Personalize : Create Campaign" ] }, { "cell_type": "code", "execution_count": null, "id": "4a13983a-ae69-471c-93b4-d4cd57feca3b", "metadata": {}, "outputs": [], "source": [ "create_campaign_reponse = personalize.create_campaign(\n", " name = 'RetailDemo-campaign' + suffix,\n", " solutionVersionArn = solution_version_arn,\n", " minProvisionedTPS=1\n", ")\n", "\n", "campaign_arn = create_campaign_reponse['campaignArn']\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "bac029c2-7dd9-4ea2-bbfc-559c7db53ed2", "metadata": {}, "source": [ "#### Wait for Campaign creation to complete\n", "It takes about 7 minutes." ] }, { "cell_type": "code", "execution_count": null, "id": "e7e32f36-5c48-45e7-b70c-8f90d4e8b3ec", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "max_time = time.time() + 3*60*60 # 3 hours\n", "while time.time() < max_time:\n", "\n", " # status_aws_user_personalization\n", " describe_campaign_response = personalize.describe_campaign(\n", " campaignArn = campaign_arn\n", " ) \n", " status_campaign = describe_campaign_response['campaign'][\"status\"]\n", " print(\"status_creating_campaign : {}\".format(status_campaign))\n", " \n", " \n", " if (status_campaign == \"ACTIVE\" or status_campaign == \"CREATE FAILED\") :\n", " break\n", " print(\"-------------------------------------->\")\n", " time.sleep(60)\n", "\n", "print(\"Creating Campaign is completed\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "682fc7e3-8cc5-41a5-a0d4-08a3ddbc4686", "metadata": {}, "source": [ "#### save variable\n", "Save variables needed for clean-up" ] }, { "cell_type": "code", "execution_count": null, "id": "69cf51d8-d822-4d74-b41f-e6859bc85276", "metadata": {}, "outputs": [], "source": [ "%store dataset_group_arn\n", "%store interaction_schema_arn\n", "%store item_schema_arn\n", "%store user_schema_arn\n", "%store interaction_dataset_arn\n", "%store item_dataset_arn\n", "%store user_dataset_arn\n", "%store solution_arn\n", "%store campaign_arn\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f07fddb0-e5e1-47b7-b0c0-85608cd1d53c", "metadata": {}, "source": [ "# You can make an inference request with the Personalize Campaign ARN below.\n", "In the Lambda Function, Personalize Campaign uses the Personalize Campaign ARN below." ] }, { "cell_type": "code", "execution_count": null, "id": "662a455c-c172-4340-aac8-c1be7aac5f42", "metadata": {}, "outputs": [], "source": [ "print(\"Personalize Campaign ARN : \", campaign_arn)" ] }, { "cell_type": "code", "execution_count": null, "id": "2b97d05d-6362-4ed0-a7b8-b6f7dd407d29", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }