{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Train and Host a Boke AI Keras Model on Amazon SageMaker\n", "\n", "Amazon SageMaker is a fully-managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models. The SageMaker Python SDK makes it easy to train and deploy models in Amazon SageMaker with several different machine learning and deep learning frameworks, including TensorFlow and Keras.\n", "\n", "In this notebook, we train and host a Keras boke-AI model on SageMaker. The model used for this notebook is a neural network (CNN and Bidirectional-LSTM) for image captioning that was developed by Ryuichi Ishikawa (Dentsu Digital).\n", "\n", "
\n", " Instance Type and Pricing: \n", "\n", "Before running this notebook, you should request quota increases for the number of instance in your AWS account [[How to request a quota increase (Japanese)](../../LIMIT_INCREASE.md)]. This sample notebook was tested using \n", "- the `conda_tensorflow2_p36` kernel on `ml.m5.xlarge` (notebook instance) or `Python 3 (TensorFlow 2.3 Python 3.7 CPU Optimized)` kernel `ml.m5.large` (Studio notebook) \n", "- with the `ml.g4dn.xlarge` processing instance type, \n", "- `ml.g4dn.2xlarge` training instance type, \n", "- and `ml.g4dn.xlarge` hosting instance type \n", "\n", "in the `us-west-2`, `us-east-1`, `us-east-2`, and `ap-northeast-1` regions. The processing and training time are approximately 7 minutes and 13 minuites, respectively, using a subset of dataset only containing 8k image/text pairs on the aforementioned hardware specifications.\n", "\n", "Price per hour depends on your region and instance type. You can reference prices on the [SageMaker pricing page](https://aws.amazon.com/sagemaker/pricing/). \n", "\n", "---\n", "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install -U sagemaker\n", "\n", "import IPython\n", "IPython.Application.instance().kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "First, we define a few variables that are be needed later in the example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "import boto3\n", "\n", "from sagemaker import Session\n", "from sagemaker import get_execution_role\n", "\n", "sagemaker_session = Session()\n", "default_bucket = sagemaker_session.default_bucket()\n", "role = get_execution_role()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bokete Bokekan dataset\n", "\n", "Bokete is one of the most popular commedy web site. [Bokekan dataset](../README.md) consists of 1M+ images/text pairs to 4 different classes. Here are the classes in the dataset, as well as 1 random image/text pair:\n", "\n", "| class | boke (image/text pair) | number of stars |\n", "| ---- | ----: | ---- |\n", "| blue | 98,736 | 0 |\n", "| yellow | 955,901 | 1 - 100 |\n", "| green | 37,342 | 101 - 1000 |\n", "| red | 8,183 | 1001 - 10000 |\n", "| sp | 380 | 10001+ |\n", "| Total | 1,100,542 | boke |\n", "\n", "
\"bokete\"
\n", "
(Photo by IHA Central Office, licensed under the Creative Commons Attribution License 2.0)
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, you will copy the Bokekan dataset to your SageMaker default bucket. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bokekan_bucket_name = 'bokekan-{}'.format(sagemaker_session.boto_region_name)\n", "prefix = '21-6-1'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bokekan_list = ['blue_000', 'yellow_000', 'yellow_001', 'yellow_002', 'yellow_003','yellow_004', 'yellow_005', 'yellow_006','yellow_007', 'yellow_008', 'yellow_009', 'green_000', 'red_000', 'sp_000']\n", "bokekan = bokekan_list[-2] # Bokekan Red\n", "data_path = '../../data'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time \n", "s3 = boto3.client('s3')\n", "\n", "key = bokekan + '.zip'\n", "print(bokekan_bucket_name, '=>', os.path.join(default_bucket, bokekan_bucket_name, prefix, key))\n", "\n", "s3.copy_object(\n", " CopySource=os.path.join(bokekan_bucket_name, prefix, key), \n", " Bucket=default_bucket, \n", " Key=os.path.join(bokekan_bucket_name, prefix, key), \n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare the dataset for training\n", "\n", "To use the bokekan dataset, we first download it and convert it to numpy array. This step takes around 7 minutes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# !pygmentize script/processing/preprocessing.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.tensorflow import TensorFlowProcessor\n", "\n", "tags = [{'Key': 'Project', 'Value': 'bokete'}]\n", "\n", "tf_processor = TensorFlowProcessor(\n", " framework_version=\"2.3\",\n", " role=role,\n", " instance_type=\"ml.g4dn.xlarge\", \n", " instance_count=1,\n", " volume_size_in_gb=100, \n", " base_job_name='bokete-tf-keras-preprocessing', \n", " py_version='py37', \n", " tags=tags\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.processing import ProcessingInput, ProcessingOutput, FeatureStoreOutput\n", "\n", "# extract_destination = \"s3://{}/{}/{}/{}\".format(default_bucket, bokekan_bucket_name, prefix, \"contents\")\n", "\n", "tf_processor.run(\n", " code=\"preprocessing.py\",\n", " source_dir=\"script/processing\", \n", " inputs=[\n", " ProcessingInput(\n", " input_name='bokekan',\n", " source=\"s3://{}/{}/{}\".format(default_bucket, bokekan_bucket_name, prefix), \n", " destination=\"/opt/ml/processing/input\", \n", " ),\n", " ],\n", " outputs=[\n", " ProcessingOutput(\n", " output_name=\"bokekan_contents\", \n", " source=\"/opt/ml/processing/output/bokekan/content\", \n", "# destination=extract_destination, \n", " s3_upload_mode=\"Continuous\", \n", " ),\n", " ProcessingOutput( # files: params, word_id, id_word\n", " output_name=\"bokekan_metadata\", \n", " source=\"/opt/ml/processing/output/bokekan/metadata\", \n", "# feature_store_output=FeatureStoreOutput(FeatureGroupName='')\n", " ),\n", " ProcessingOutput( # files: X1, X2, y\n", " output_name=\"training\", \n", " source=\"/opt/ml/processing/output/training\", \n", " ),\n", " ],\n", " arguments=[\"--bokekan\", bokekan] \n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above processing job will upload the training data to Amazon S3. Let's retrieve the path for the next step. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "preprocessing_job_description = tf_processor.jobs[-1].describe()\n", "\n", "output_config = preprocessing_job_description[\"ProcessingOutputConfig\"]\n", "for output in output_config[\"Outputs\"]:\n", " if output[\"OutputName\"] == \"training\":\n", " preprocessed_training_data = output[\"S3Output\"][\"S3Uri\"]\n", " if output[\"OutputName\"] == \"bokekan_metadata\":\n", " preprocessed_bokekan_metadata = output[\"S3Output\"][\"S3Uri\"]\n", " if output[\"OutputName\"] == \"bokekan_contents\":\n", " preprocessed_bokekan_contents = output[\"S3Output\"][\"S3Uri\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# retrieve parameters\n", "!aws s3 cp {preprocessed_bokekan_metadata}/param.json ./\n", "\n", "with open('./param.json') as json_file:\n", " param = json.load(json_file)\n", " print(param)\n", "\n", "max_word = param['max_word']\n", "vocabulary_size = param['vocabulary_size']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train the model\n", "\n", "In this tutorial, we train a deep CNN+LSTM to learn a image captioning task with the bokekan dataset. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure metrics\n", "\n", "In addition to running the training job, Amazon SageMaker can retrieve training metrics directly from the logs and send them to CloudWatch metrics. Here, we define metrics we would like to observe:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metric_definitions = [\n", " {'Name': 'train:loss', 'Regex': '.*loss: ([0-9\\\\.]+) - accuracy: [0-9\\\\.]+.*'},\n", " {'Name': 'train:accuracy', 'Regex': '.*loss: [0-9\\\\.]+ - accuracy: ([0-9\\\\.]+).*'},\n", " {'Name': 'sec/steps', 'Regex': '.* - \\d+s (\\d+)[mu]s/step - loss: [0-9\\\\.]+ - accuracy: [0-9\\\\.]+'} \n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run a baseline training job on SageMaker\n", "\n", "The SageMaker Python SDK's `sagemaker.tensorflow.TensorFlow` estimator class makes it easy for us to interact with SageMaker. Here, we create one to configure a training job. Some parameters worth noting:\n", "\n", "* `entry_point`: our training script (adapted from [this Keras example](https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py)).\n", "* `metric_definitions`: the metrics (defined above) that we want sent to CloudWatch.\n", "* `train_instance_count`: the number of training instances. Here, we set it to 1 for our baseline training job.\n", "\n", "For more details about the TensorFlow estimator class, see the [API documentation](https://sagemaker.readthedocs.io/en/stable/sagemaker.tensorflow.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.tensorflow import TensorFlow\n", "\n", "hyperparameters = {'epochs': 10, 'batch-size': 256, 'max_word': max_word, 'vocabulary_size': vocabulary_size}\n", "tags = [{'Key': 'Project', 'Value': 'bokete'}]\n", "\n", "estimator = TensorFlow(entry_point='bokete_keras.py',\n", " source_dir='./script/training', \n", " metric_definitions=metric_definitions,\n", " hyperparameters=hyperparameters,\n", " role=role,\n", " framework_version='2.3',\n", " py_version='py37',\n", " instance_count=1,\n", " instance_type='ml.g4dn.2xlarge',\n", " base_job_name='bokete-tf-keras-training',\n", " tags=tags)\n", "\n", "estimator.fit(preprocessed_training_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we have our estimator, we call `fit()` to start the SageMaker training job and pass the inputs that we uploaded to Amazon S3 earlier. We pass the inputs as a dictionary to define different data channels for training." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the job training metrics\n", "\n", "We can now view the metrics from the training job directly in the SageMaker console. \n", "\n", "Log into the [SageMaker console](https://console.aws.amazon.com/sagemaker/home), choose the latest training job, and scroll down to the monitor section. Alternatively, the code below uses the region and training job name to generate a URL to CloudWatch metrics.\n", "\n", "Using CloudWatch metrics, you can change the period and configure the statistics." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from urllib import parse\n", "\n", "from IPython.core.display import Markdown\n", "\n", "region = sagemaker_session.boto_region_name\n", "cw_url = parse.urlunparse((\n", " 'https',\n", " '{}.console.aws.amazon.com'.format(region),\n", " '/cloudwatch/home',\n", " '',\n", " 'region={}'.format(region),\n", " 'metricsV2:namespace=/aws/sagemaker/TrainingJobs;dimensions=TrainingJobName;search={}'.format(estimator.latest_training_job.name),\n", "))\n", "\n", "display(Markdown('CloudWatch metrics: [link]({}). After you choose a metric, '\n", " 'change the period to 1 Minute (Graphed Metrics -> Period).'.format(cw_url)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy the trained model\n", "\n", "After we train our model, we can deploy it to a SageMaker Endpoint, which serves prediction requests in real-time. To do so, we simply call `deploy()` on our estimator, passing in the desired number of instances and instance type for the endpoint.\n", "\n", "Because we're using TensorFlow Serving for deployment, our training script saves the model in TensorFlow's SavedModel format. \n", "\n", "We don't need accelerated computing power for inference, so let's switch over to a ml.m4.xlarge instance type. \n", "\n", "For more information about deploying Keras and TensorFlow models in SageMaker, see [this blog post](https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Invoke the endpoint\n", "\n", "To verify the that the endpoint is in service, we generate some random data in the correct shape and get a prediction." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install matplotlib" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from tensorflow.keras.preprocessing.image import load_img, img_to_array\n", "from tensorflow import reshape, stack, convert_to_tensor" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dummy_image = np.random.randn(1,4096)\n", "dummy_text = np.random.randn(1,max_word)\n", "payload = {\"inputs\": {\"image\": dummy_image.tolist(), \"text\": dummy_text.tolist()}}\n", "predictor.predict(payload)['outputs'][0][:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's use the test dataset for predictions. We will setup for inference preprocessing. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.models import Model\n", "from tensorflow.keras.applications.vgg16 import VGG16 \n", "\n", "VGG = VGG16(weights='imagenet')\n", "VGG._layers.pop()\n", "VGG = Model(inputs=VGG.inputs, outputs=VGG.layers[-1].output)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load dictionary\n", "id_word = pd.read_csv(os.path.join(preprocessed_bokekan_metadata, 'id_word_bokete_data.csv'), index_col=0).to_dict()['1']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# sample the first word from a distribution\n", "def sample(preds, temperature=1.0):\n", " preds = np.asarray(preds).astype(\"float32\")\n", " preds = np.log(preds) / temperature\n", " exp_preds = np.exp(preds)\n", " preds = exp_preds / np.sum(exp_preds)\n", " probs = np.random.multinomial(1, preds, 1)\n", " return np.argmax(probs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can find your favorite image from https://bokete.jp/boke/select, then copy&paste the image url. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "image_url = 'https://d2dcan0armyq93.cloudfront.net/photo/odai/600/2182ab815bdb2abae3b67e723861ac08_600.jpg'\n", "# image_url = 'https://d2dcan0armyq93.cloudfront.net/photo/odai/600/0235436e0d5d54c882f485e367f68fe6_600.jpg'\n", "# image_url = 'https://d2dcan0armyq93.cloudfront.net/photo/odai/600/db6e0eb2dd26330e1884d0812d78e783_600.jpg'\n", "\n", "!wget {image_url} ./\n", "image_path = image_url.split('/')[-1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# # Please specify the `image_id` here.\n", "# image_id = 256\n", "# data = pd.read_csv(os.path.join(preprocessed_bokekan_metadata, 'boke.csv'))\n", "# image_path = os.path.join(data_path, bokekan, data.odai_photo_url[image_id][1:])\n", "# print(image_path)\n", "\n", "# Preprocess test image array\n", "img = load_img(image_path, target_size=(224, 224))\n", "x = np.array(img)/255\n", "\n", "x = VGG(reshape(x, [1, 224, 224, 3]))\n", "X_test_image = np.array(x)\n", "\n", "# Prepare empty text array\n", "x2 = np.zeros([1, max_word])\n", "x2[0, -1] = 1. # set {1: 'Beginning Of Sentence (BOS)'} at the beginning \n", "X_test_text = np.array(x2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the data loaded, we can use it for predictions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "max_length = max_word\n", "for i in range(max_length):\n", " payload = {\"inputs\": {\"image\": X_test_image.tolist(), \"text\": X_test_text.tolist()}}\n", " outputs = np.array(predictor.predict(payload)['outputs'])\n", "\n", " # sampling the initial word\n", " if i == 0:\n", " X_test_text = np.append(X_test_text, sample(outputs[0], 1.0))[1:].reshape(1,-1)\n", "\n", " elif outputs.argmax() == 0:\n", " break\n", "\n", " else:\n", " X_test_text = np.append(X_test_text, outputs.argmax())[1:].reshape(1,-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the predictions, we calculate our model accuracy and create a confusion matrix." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from operator import itemgetter\n", "boke = ''.join(itemgetter(*X_test_text[0].tolist())(id_word)).split('BOS')[-1]\n", "\n", "plt.imshow(img)\n", "print(boke) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Aided by the colors of the heatmap, we can use this confusion matrix to understand how well the model performed for each label.print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cleanup\n", "\n", "To avoid incurring extra charges to your AWS account, let's delete the endpoint we created:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.delete_endpoint()" ] } ], "metadata": { "instance_type": "ml.m5.large", "kernelspec": { "display_name": "Python 3 (TensorFlow 2.3 Python 3.7 CPU Optimized)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:ap-northeast-1:102112518831:image/tensorflow-2.3-cpu-py37-ubuntu18.04-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "notice": "Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.", "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 4 }