{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Hyperparameter Tuning with SageMaker Automatic Model Tuner\n", "\n", "In [part3](../part3/twitter_volume_forecast.ipynb) we have trained a DeepAR model. Apart from prediction_length, time freqency and number of epochs we did not specify any other hyperparameters. DeepAR has many hyperparameters and in this section we will use SageMaker automatic model tuner to find the right set for our model. Here is a short list of some hyperparameters and their default values in GluonTS DeepAR:\n", "\n", "| Hyperparameters | Value |\n", "|--------------------------|---------------------------|\n", "| epochs | 100 |\n", "| context_length | prediction_length |\n", "| batch size | 32 |\n", "| learning rate | $1e-3$ |\n", "| LSTM layers | 2 |\n", "| LSTM nodes | 40 |\n", "| likelihood | StudentTOutput() |\n", "\n", "\n", "We also need to choose a likelihood model. For example, we choose negative binomial likelihood or StudentT for count data. Other likelihood models can also readily be used as long as samples from the distribution can cheaply be obtained and the log-likelihood and its gradients with respect to the parameters can be evaluated. For example:\n", "\n", "- **Gaussian:** Use for real-valued data.\n", "- **Beta:** Use for real-valued targets between 0 and 1 inclusive.\n", "- **Negative-binomial:** Use for count data (non-negative integers).\n", "- **Student-T:** An alternative for real-valued data that works well for bursty data.\n", "- **Deterministic-L1:** A loss function that does not estimate uncertainty and only learns a point forecast.\n", "\n", "Refer to the [documentation](https://gluon-ts.mxnet.io/api/gluonts/gluonts.model.deepar.html) for a full description of the available parameters. In this notebook your will learn how to train your GluonTS model on Amazon SageMaker and to tune it with automatic model tuner." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import gluonts\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pathlib\n", "import json\n", "import boto3\n", "import s3fs\n", "import csv\n", "import sagemaker" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Upload data to Amazon S3\n", "In order to run the model training with Amazon SageMaker, we need to upload our train and test data to Amazon S3. In the following code cell, we define SageMaker default bucket where data will be uploaded to. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_session = sagemaker.Session()\n", "s3_bucket = sagemaker_session.default_bucket()\n", "\n", "s3_train_data_path = \"s3://{}/gluonts/train\".format(s3_bucket)\n", "s3_test_data_path = \"s3://{}/gluonts/test\".format(s3_bucket)\n", "\n", "print(\"Data will be uploaded to: \", s3_bucket)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we download the file and split it into training and test data. Afterwards we write it to a csv." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "url = \"https://raw.githubusercontent.com/numenta/NAB/master/data/realTweets/Twitter_volume_AMZN.csv\"\n", "df = pd.read_csv(filepath_or_buffer=url, header=0, index_col=0)\n", "\n", "train = df[: \"2015-04-05 00:00:00\"]\n", "train.to_csv(\"train.csv\")\n", "\n", "test = df[: \"2015-04-15 00:00:00\"]\n", "test.to_csv(\"test.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function will create a `train` and `test` folder in the S3 bucket and upload the csv files." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3 = boto3.resource('s3')\n", "def copy_to_s3(local_file, s3_path, override=False):\n", " assert s3_path.startswith('s3://')\n", " split = s3_path.split('/')\n", " bucket = split[2]\n", " path = '/'.join(split[3:])\n", " buk = s3.Bucket(bucket)\n", " \n", " if len(list(buk.objects.filter(Prefix=path))) > 0:\n", " if not override:\n", " print('File s3://{}/{} already exists.\\nSet override to upload anyway.\\n'.format(s3_bucket, s3_path))\n", " return\n", " else:\n", " print('Overwriting existing file')\n", " with open(local_file, 'rb') as data:\n", " print('Uploading file to {}'.format(s3_path))\n", " buk.put_object(Key=path, Body=data)\n", " \n", "copy_to_s3(\"train.csv\", s3_train_data_path + \"/train.csv\")\n", "copy_to_s3(\"test.csv\", s3_test_data_path + \"/test.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a look to what we just wrote to S3. With `s3fs` we can have a look on the files in the bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3filesystem = s3fs.S3FileSystem()\n", "with s3filesystem.open(s3_train_data_path + \"/train.csv\", 'rb') as fp:\n", " print(fp.readline().decode(\"utf-8\")[:100] + \"...\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train DeepAR model with Amazon SageMaker\n", "\n", "Since SageMaker will automatically spin up instances for us, we need to provide a role. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker.mxnet import MXNet\n", "\n", "sagemaker_session = sagemaker.Session()\n", "role = sagemaker.get_execution_role()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we define the MXNet estimator. An [estimator](https://sagemaker.readthedocs.io/en/stable/estimators.html) is a higher level interface to define the SageMaker training. It takes several parameters like the [training](entry_point/train.py) script, which defines our DeepAR model. We indicate the train instance type on which we want to execute our model training. Here we choose `ml.m5.xlarge` which is a CPU instance. We need to provide the role so that SageMaker can spin up the instance for us. We also indicate the framework and python version for MXNet. Afterwards we provide a dictionary of hyperparameters that will be parsed in the [training](entry_point/train.py) script to set the hyperparameters of our model. During hyperparameter tuning SageMaker will adjust the hyperparameters passed into our training job." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mxnet_estimator = MXNet(entry_point='train.py',\n", " source_dir='entry_point',\n", " role=role,\n", " train_instance_type='ml.m5.xlarge',\n", " train_instance_count=1,\n", " framework_version='1.4.1', py_version='py3',\n", " hyperparameters={\n", " 'epochs': 1, \n", " 'prediction_length':12,\n", " 'num_layers':2, \n", " 'dropout_rate': 0.2,\n", " })" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are ready to start the training job. Once we call `fit`, SageMaker will spin up an `ml.m5.xlarge` instance, download the MXNet docker image, download the train and test data from Amazon S3 and execute the `train` function from our `train.py` file. \n", "\n", "While the model is training you may want to have a look at [train.py](entry_point/train.py) file. The file follows a certain structure and has the following functions:\n", "- `train`: defines the training procedure as we defined it in [lab 3](../notebooks/twitter_volume_forecast.ipynb) So in our case it creates the ListDataset, the DeepAR estimator and performs the training. It also performs the evaluation and prints the MSE metric. This is necessary for the hyperparameter tuning later on.\n", "- `model_fn`: used for inference. Once the model is trained we can deploy it and this function will load the trained model.\n", "- `transform_fn`: used for inference. If we send requests to the endpoint, the data will by default be encoded as json string. We decode the data from json into a Pandas data frame. We then create the ListDataset and perform inference. The forecasts will be sent back as a json string." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mxnet_estimator.fit({\"train\": s3_train_data_path, \"test\": s3_test_data_path})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SageMaker Automatic Model Tuner" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we are able to run our DeepAR model with SageMaker, we can start tuning its hyperparameter. In the following section we define the `HyperparameterTuner`, which takes the following hyperparameters:\n", "- `epochs`: number of training epochs. If this value is too large we may overfit the training data, which means the model achieves good performance on the trasining dataset but bad performance on the test dataset.\n", "- `prediction_length`: how many time units shall the model predict\n", "- `num_layers`: number of RNN layers\n", "- `dropout_rate`: dropouts help to regularize the training because they randomly switch off neurons. \n", "\n", "You can find more information about DeepAR parameters [here](https://gluon-ts.mxnet.io/api/gluonts/gluonts.model.deepar.html) \n", "\n", "Next we have to indicate the metric we want to optimize on. We have to make sure that our training job prints those metrics. [train.py](entry_point/train.py) prints the MSE value of evaluated test dataset. These printouts will appear in Cloudwatch and the automatic model tuner will then retrieve those outputs by using the regular expression indicated in `Regex`. \n", "Next we indicate the `max_jobs` and `max_parallel_jobs`. Here we will run 10 jobs in total and in each step we will start 5 parallel jobs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.tuner import HyperparameterTuner, ContinuousParameter, IntegerParameter \n", "\n", "tuner = HyperparameterTuner(estimator=mxnet_estimator, \n", " objective_metric_name='loss',\n", " hyperparameter_ranges={\n", " 'epochs': IntegerParameter(5,20),\n", " 'prediction_length':IntegerParameter(5,20),\n", " 'num_layers': IntegerParameter(1, 5),\n", " 'dropout_rate': ContinuousParameter(0, 0.5) },\n", " metric_definitions=[{'Name': 'loss', 'Regex': \"MSE: ([0-9\\\\.]+)\"}],\n", " max_jobs=10,\n", " max_parallel_jobs=5,\n", " objective_type='Minimize')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`tuner.fit` will start the automatic model tuner. You can go now to the SageMaker console and check the training jobs or proceed to the next cells, to get some real time results from the jobs. \n", "\n", "The search space grows exponentially with the number of hyperparameters. Assuming 5 parameters where each one has 10 discrete options we end up with $10^5$ possible combinations. Clearly we do not want to run $10^5$ jobs. Automatic model tuner will use per default Bayesian optimization which is a combination of explore and exploit. That means after each training job it will evaluate whether to jump into a new area of the search space (explore) or whether to further exploit the local search space. You can find some more information [here](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tuner.fit({'train': s3_train_data_path, \"test\": s3_test_data_path})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can track the status of the hyperparameter tuning jobs by running the following code. Get the name of your job from the sagemaker console." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tuning_job_name = tuner.latest_tuning_job.job_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we retrieve information about the training jobs from SageMaker and we can see how many have already completed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sage_client = boto3.Session().client('sagemaker')\n", "\n", "# run this cell to check current status of hyperparameter tuning job\n", "tuning_job_result = sage_client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)\n", "\n", "status = tuning_job_result['HyperParameterTuningJobStatus']\n", "if status != 'Completed':\n", " print('Reminder: the tuning job has not been completed.')\n", " \n", "job_count = tuning_job_result['TrainingJobStatusCounters']['Completed']\n", "print(\"%d training jobs have completed\" % job_count)\n", " \n", "is_minimize = (tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['Type'] != 'Maximize')\n", "objective_name = tuning_job_result['HyperParameterTuningJobConfig']['HyperParameterTuningJobObjective']['MetricName']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following cell, we retrieve information about training jobs that have already finished. We will plot their hyperparameters versus objective metric." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "job_analytics = sagemaker.HyperparameterTuningJobAnalytics(tuning_job_name)\n", "\n", "full_df = job_analytics.dataframe()\n", "\n", "if len(full_df) > 0:\n", " df = full_df[full_df['FinalObjectiveValue'] > -float('inf')]\n", " if len(df) > 0:\n", " df = df.sort_values('FinalObjectiveValue', ascending=is_minimize)\n", " print(\"Number of training jobs with valid objective: %d\" % len(df))\n", " print({\"lowest\":min(df['FinalObjectiveValue']),\"highest\": max(df['FinalObjectiveValue'])})\n", " pd.set_option('display.max_colwidth', -1) # Don't truncate TrainingJobName \n", " else:\n", " print(\"No training jobs have reported valid results yet.\")\n", " \n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the hyperparameter tuning job has finished we will plot all results. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import bokeh\n", "import bokeh.io\n", "bokeh.io.output_notebook()\n", "from bokeh.plotting import figure, show\n", "from bokeh.models import HoverTool\n", "\n", "ranges = job_analytics.tuning_ranges\n", "figures = []\n", "\n", "class HoverHelper():\n", "\n", " def __init__(self, tuning_analytics):\n", " self.tuner = tuning_analytics\n", "\n", " def hovertool(self):\n", " tooltips = [\n", " (\"FinalObjectiveValue\", \"@FinalObjectiveValue\"),\n", " (\"TrainingJobName\", \"@TrainingJobName\"),\n", " ]\n", " for k in self.tuner.tuning_ranges.keys():\n", " tooltips.append( (k, \"@{%s}\" % k) )\n", "\n", " ht = HoverTool(tooltips=tooltips)\n", " return ht\n", "\n", " def tools(self, standard_tools='pan,crosshair,wheel_zoom,zoom_in,zoom_out,undo,reset'):\n", " return [self.hovertool(), standard_tools]\n", "\n", "hover = HoverHelper(job_analytics)\n", "\n", "for hp_name, hp_range in ranges.items():\n", " categorical_args = {}\n", " if hp_range.get('Values'):\n", " # This is marked as categorical. Check if all options are actually numbers.\n", " def is_num(x):\n", " try:\n", " float(x)\n", " return 1\n", " except:\n", " return 0 \n", " vals = hp_range['Values']\n", " if sum([is_num(x) for x in vals]) == len(vals):\n", " # Bokeh has issues plotting a \"categorical\" range that's actually numeric, so plot as numeric\n", " print(\"Hyperparameter %s is tuned as categorical, but all values are numeric\" % hp_name)\n", " else:\n", " # Set up extra options for plotting categoricals. A bit tricky when they're actually numbers.\n", " categorical_args['x_range'] = vals\n", "\n", " # Now plot it\n", " p = figure(plot_width=500, plot_height=500, \n", " title=\"Objective vs %s\" % hp_name,\n", " tools=hover.tools(),\n", " x_axis_label=hp_name, y_axis_label=objective_name,\n", " **categorical_args)\n", " p.circle(source=df, x=hp_name, y='FinalObjectiveValue')\n", " figures.append(p)\n", "show(bokeh.layouts.Column(*figures))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Running hyperparamter tuning jobs may take a while so in the meantime freel free to check out [this notebook](deepar_datails.ipynb) that gives more in depth details about DeepAR." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have found a model with good hyperparameters we can deploy it. Note: This endpoint will take approximately 5-8 minutes to launch. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "tuned_endpoint = tuner.deploy(instance_type=\"ml.m5.xlarge\", initial_instance_count=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can send some test data to the endpoint, but first we convert the Numpy arrays `test.value` and `test.index` to lists and add them to a dictionary. SageMaker will encode them as a json string when they are sent to the endpoint. Let's compare how much better our predictions are:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "input_data = {'value': test.value.tolist(), 'timestamp': test.index.tolist() }\n", "result = tuned_endpoint.predict(input_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we call `endpoint.predict()`, SageMaker will execute the `transform_fn` in the [train.py](entry_point/train.py) file. As discussed above, this function will decode the json string into a Pandas frame. Afterwards it creates the `ListDataset` and performs inference. The endpoint will then return forecasts. Let's have a look at the result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook you have learnt how to build and train a DeepAR model with GluonTS, how to tune and deploy it with Amazon SageMaker." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Delete the endpoint\n", "Remember to delete your Amazon SageMaker endpoint once it is no longer needed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tuned_endpoint.delete_endpoint()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Challenge\n", "Now it is your turn to find even better hyperparameters for the model. Go to [documentation](https://gluon-ts.mxnet.io/api/gluonts/gluonts.model.deepar.html) and try out other hyperparameters." ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 4 }