{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and Host a Boke AI Keras Model on Amazon SageMaker\n",
"\n",
"Amazon SageMaker is a fully-managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models. The SageMaker Python SDK makes it easy to train and deploy models in Amazon SageMaker with several different machine learning and deep learning frameworks, including TensorFlow and Keras.\n",
"\n",
"In this notebook, we train and host a Keras boke-AI model on SageMaker. The model used for this notebook is a neural network (CNN and Bidirectional-LSTM) for image captioning that was developed by Ryuichi Ishikawa (Dentsu Digital).\n",
"\n",
"
\n",
" Instance Type and Pricing: \n",
"\n",
"Before running this notebook, you should request quota increases for the number of instance in your AWS account [[How to request a quota increase (Japanese)](../../LIMIT_INCREASE.md)]. This sample notebook was tested using \n",
"- the `conda_tensorflow2_p36` kernel on `ml.m5.xlarge` (notebook instance) or `Python 3 (TensorFlow 2.3 Python 3.7 CPU Optimized)` kernel `ml.m5.large` (Studio notebook) \n",
"- with the `ml.g4dn.xlarge` processing instance type, \n",
"- `ml.g4dn.2xlarge` training instance type, \n",
"- and `ml.g4dn.xlarge` hosting instance type \n",
"\n",
"in the `us-west-2`, `us-east-1`, `us-east-2`, and `ap-northeast-1` regions. The processing and training time are approximately 7 minutes and 13 minuites, respectively, using a subset of dataset only containing 8k image/text pairs on the aforementioned hardware specifications.\n",
"\n",
"Price per hour depends on your region and instance type. You can reference prices on the [SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/). \n",
"\n",
"---\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -U sagemaker\n",
"\n",
"import IPython\n",
"IPython.Application.instance().kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"First, we define a few variables that are be needed later in the example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import json\n",
"import boto3\n",
"\n",
"from sagemaker import Session\n",
"from sagemaker import get_execution_role\n",
"\n",
"sagemaker_session = Session()\n",
"default_bucket = sagemaker_session.default_bucket()\n",
"role = get_execution_role()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bokete Bokekan dataset\n",
"\n",
"Bokete is one of the most popular commedy web site. [Bokekan dataset](../README.md) consists of 1M+ images/text pairs to 4 different classes. Here are the classes in the dataset, as well as 1 random image/text pair:\n",
"\n",
"| class | boke (image/text pair) | number of stars |\n",
"| ---- | ----: | ---- |\n",
"| blue | 98,736 | 0 |\n",
"| yellow | 955,901 | 1 - 100 |\n",
"| green | 37,342 | 101 - 1000 |\n",
"| red | 8,183 | 1001 - 10000 |\n",
"| sp | 380 | 10001+ |\n",
"| Total | 1,100,542 | boke |\n",
"\n",
"