{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# How to run time series forecasting at scale with the GluonTS toolkit on Amazon SageMaker\n", "\n", "This notebook contains the notebook and training scripts for the blogpost **How to run time series forecasting at scale with GluonTS toolkit on Amazon SageMaker**\n", "\n", "In this notebook we take the example of forecasting energy usage and show you how to train and tune multiple time series models across algorithms and hyper-parameter combinations using the GluonTS toolkit on Amazon SageMaker. We will first show you how to setup GlounTS on SageMaker using the MXNet estimator, then train multiple models using SageMaker Experiments, and finally use SageMaker Debugger to monitor suboptimal training and improve training efficiencies. We will walk you through the following steps:\n", "\n", "1.\t[Prepare the time series dataset](#section1)\n", "2.\t[Create the algorithm and hyper-parameters combinatorial matrix](#section2)\n", "3.\t[Setup the GluonTS training script](#section3)\n", "4. [Setup Amazon SageMaker Experiment and Trials](#section4)\n", "5.\t[Setup the MXNet Estimator](#section5)\n", "6.\t[Train and validate models](#section6)\n", "7.\t[Evaluate metrics and select a winning candidate](#section7)\n", "8.\t[Run time series forecasts](#section8)\n", "9. [Run experiment with SageMaker Debugger enabled to auto-terminate sub-optimal training jobs](#section9)\n", "\n", "\n", "Before getting started we need to first install a few packages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! pip install gluonts\n", "! pip install --upgrade sagemaker\n", "! pip install sagemaker-experiments\n", "! pip install --upgrade smdebug-rulesconfig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Prepare the time series dataset \n", "For this exercise, we use the individual household electric power consumption dataset. (Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.) We aggregate the usage data hourly.\n", "\n", "https://raw.githubusercontent.com/aws-samples/amazon-forecast-samples/master/notebooks/common/data/item-demand-time.csv\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "url = \"https://raw.githubusercontent.com/aws-samples/amazon-forecast-samples/master/notebooks/common/data/item-demand-time.csv\"\n", "raw_df = pd.read_csv(url, header=None, names=[\"date\", \"usage\", \"client\"])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look on the data:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateusageclient
02014-01-01 01:00:0038.349917client_12
12014-01-01 02:00:0033.582090client_12
22014-01-01 03:00:0034.411277client_12
32014-01-01 04:00:0039.800995client_12
42014-01-01 05:00:0041.044776client_12
\n", "
" ], "text/plain": [ " date usage client\n", "0 2014-01-01 01:00:00 38.349917 client_12\n", "1 2014-01-01 02:00:00 33.582090 client_12\n", "2 2014-01-01 03:00:00 34.411277 client_12\n", "3 2014-01-01 04:00:00 39.800995 client_12\n", "4 2014-01-01 05:00:00 41.044776 client_12" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the S3 bucket and folder locations to store the test and training data. This should be within the same region as the Notebook Instance, training, and hosting. We will use the default SageMaker S3 bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import sagemaker \n", "\n", "s3_client = boto3.client('s3')\n", "s3res = boto3.resource('s3')\n", "\n", "sess = sagemaker.Session()\n", "bucket = sess.default_bucket()\n", "\n", "pref = 'electricity-forecast-experiment/gluonts'\n", "s3_train_channel = \"s3://\" + bucket + \"/\" + pref + \"/train.csv\"\n", "s3_test_channel = \"s3://\" + bucket + \"/\" + pref + \"/test.csv\"\n", "print(s3_train_channel)\n", "print(s3_test_channel)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let’s divide the raw data into train and test samples and save them in their respective s3 folder locations. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "df_train = raw_df.query('date <= \"2014-31-10 11:00:00\"').copy()\n", "df_train.to_csv(\"train.csv\")\n", "s3_client.upload_file(\"train.csv\", bucket, pref+\"/train.csv\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "df_test = raw_df.query('date >= \"2014-1-11 12:00:00\"').copy()\n", "df_test.to_csv(\"test.csv\")\n", "s3_client.upload_file(\"test.csv\", bucket, pref+\"/test.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Create the algorithm and hyper-parameters combinatorial matrix \n", "\n", "GluonTS comes with few pre-built models probabilistic forecasting models. Instead of simply predicting a single point estimate, probabilistic forecasting assigns a probability to every outcome. Once you select a model, you have the flexibility to configure the hyper-parameters to control the learning process. \n", "\n", "SageMaker supports bring your own model using Script mode. This allows you to leverage Amazon SageMaker prebuilt containers to train your models with the same kind of training script you would use outside of SageMaker. In this example, we use SageMaker's Apache MXNet containers to wrap \n", "\n", "In this example, we'll be training using four different models. \n", "- **DeepAR** is a supervised learning algorithm for forecasting scalar time series using recurrent neural networks (RNN). \n", "- **SFeedFwd** (Simple Feedforward) is a supervised learning algorithm where information moves in only one direction—forward—from the input nodes, through the hidden nodes (if any) and to the output nodes in the forward direction. \n", "- **LSTNet** (Long- and Short-term Time-series network) is a multivariate time series forecasting model that uses the combination of Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) to find short-term local dependency patterns among variables and them find long-term patterns for time series trends. \n", "- **Seq2Seq** (Sequence-to-sequence learning) is a method to train models to convert sequences from one domain to sequences in another domain. All these algorithms are already part of GluonTS, and we simply leverage it to quickly iterate and experiment over different models\n", "\n", "\n", "A trainer defines how a network is going to be trained. Let’s define a trainer object using pandas dataframe that has the base list of algorithms, different epochs, learning rate, and hyper-parameter combinations that we want to define for our training runs. \n", "**Note:** If you want to add more alogrithm/hyper-parameter combinations, please add them to the dataframe defined as variable `d` " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochsalgonum_batches_per_epochlearning_ratehybridizeprediction_length
050DeepAR250.001True30
150seq2seq500.001True60
\n", "
" ], "text/plain": [ " epochs algo num_batches_per_epoch learning_rate hybridize \\\n", "0 50 DeepAR 25 0.001 True \n", "1 50 seq2seq 50 0.001 True \n", "\n", " prediction_length \n", "0 30 \n", "1 60 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "d = {'epochs': [50,50], 'algo': [\"DeepAR\", \"seq2seq\"], 'num_batches_per_epoch': [25, 50], 'learning_rate':[1e-3,1e-3], 'hybridize':[True, True]}\n", "df_hps = pd.DataFrame(data=d)\n", "df_hps['prediction_length'] = [30, 60]\n", "df_hps.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the 'product' function to derive combinations of these parameters from the base set into separate rows in the dataframe. Each row corresponds to a training job configuration that we will subsequently pass to the MXNet Estimator to run the training job.\n", "\n", "**Note** Please check your AWS account limits before you setup the product function below. The training process in the sections below will run one training job per row from this dataframe. Based on your account limit for the maximum number of concurrent training jobs, you may get an error that the limit has been exceeded. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from itertools import product\n", "\n", "prod = product(df_hps['epochs'].unique(), df_hps['algo'].unique(), df_hps['num_batches_per_epoch'].unique(), df_hps['learning_rate'].unique(), df_hps['hybridize'].unique(), df_hps['prediction_length'].unique())\n", "\n", "df_hps_combo = pd.DataFrame([list(p) for p in prod],\n", " columns=list(['epochs', 'algo', 'num_batches_per_epoch', 'learning_rate', 'hybridize', 'prediction_length']))\n", "\n", "df_hps_combo['jobnumber'] = df_hps_combo.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look on the different combinations. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochsalgonum_batches_per_epochlearning_ratehybridizeprediction_lengthjobnumber
050DeepAR250.001True300
150DeepAR250.001True601
250DeepAR500.001True302
350DeepAR500.001True603
450seq2seq250.001True304
550seq2seq250.001True605
650seq2seq500.001True306
750seq2seq500.001True607
\n", "
" ], "text/plain": [ " epochs algo num_batches_per_epoch learning_rate hybridize \\\n", "0 50 DeepAR 25 0.001 True \n", "1 50 DeepAR 25 0.001 True \n", "2 50 DeepAR 50 0.001 True \n", "3 50 DeepAR 50 0.001 True \n", "4 50 seq2seq 25 0.001 True \n", "5 50 seq2seq 25 0.001 True \n", "6 50 seq2seq 50 0.001 True \n", "7 50 seq2seq 50 0.001 True \n", "\n", " prediction_length jobnumber \n", "0 30 0 \n", "1 60 1 \n", "2 30 2 \n", "3 60 3 \n", "4 30 4 \n", "5 60 5 \n", "6 30 6 \n", "7 60 7 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_hps_combo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Setup the GluonTS training script \n", "\n", "We will use a python entry script to import the necessary gluonts libraries, setup the gluonts estimators using the model packages for algorithms of interest, and pass in our algorithm and hyper-parameter preferences from the MXNet estimator we setup in the notebook. The script uses the train and test data files we uploaded to S3 to create the corresponding gluonts datasets for training and evaluation. After the completion of training, the script runs an evaluation to generate metrics, store them using the SageMaker Debugger Hook function, that we will later use to choose a winning model. For further analysis the metrics are also available via the SageMaker Trial Component analytics (please refer to Section 7 for more details). The model is then serialized for storage and future retrieval. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Let's take a look at the training script\n", "!pygmentize blog_train_algos.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Setup a SageMaker Experiment \n", "\n", "Before create the training job, we first create a SageMaker Experiment that will allow us to track the different training jobs. We use the `smexperiments` libraray to create the experiment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime\n", "from smexperiments.experiment import Experiment\n", "\n", "sagemaker_boto_client = boto3.client(\"sagemaker\")\n", "\n", "#name of experiment\n", "timestep = datetime.now()\n", "timestep = timestep.strftime(\"%d-%m-%Y-%H-%M-%S\")\n", "experiment_name = timestep + \"-timeseries-models\"\n", "\n", "#create experiment\n", "Experiment.create(\n", " experiment_name=experiment_name, \n", " description=\"Timeseries models\", \n", " sagemaker_boto_client=sagemaker_boto_client)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each job we define a new Trial component within that experiment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from smexperiments.trial import Trial\n", "\n", "trial = Trial.create(\n", " experiment_name=experiment_name,\n", " sagemaker_boto_client=sagemaker_boto_client\n", ")\n", "print(trial)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we define an experiment config which is a dictionary that we pass into the `fit()` method later on. This ensures that the training job that is going to be started is associate with that experiment and trial." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "experiment_config = { \"ExperimentName\": experiment_name, \n", " \"TrialName\": trial.trial_name,\n", " \"TrialComponentDisplayName\": \"Training\"}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Create the MXNet Estimator \n", "\n", "MXNet training scripts could be run on Amazon SageMaker by creating an [MXNet estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/using_mxnet.html). The training job will start when the fit function on an MXNet estimator is called. We pass in a hyperparameter dictionary that takes the inputs from the algorithm and hyper-parameters combinatorial matrix." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker.mxnet import MXNet\n", "\n", "mxnet_estimator = MXNet(entry_point='blog_train_algos.py',\n", " role=sagemaker.get_execution_role(),\n", " instance_type='ml.m5.large',\n", " instance_count=1,\n", " framework_version='1.7.0', \n", " py_version='py3',\n", " hyperparameters={'bucket': bucket,\n", " 'seq': trial.trial_name,\n", " 'algo': \"DeepAR\", \n", " 'freq': \"D\", \n", " 'prediction_length': 30, \n", " 'epochs': 10,\n", " 'learning_rate': 1e-3,\n", " 'hybridize': False,\n", " 'num_batches_per_epoch': 10,\n", " })" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After specifying our estimator with all the necessary hyperparameters we can train it using our training dataset. Train by invoking the train method of the estimator. We pass the location of train and test-data as well as the experiment configuration.\n", "\n", "The training algorithm returns a fitted model (or a Predictor in GluonTS parlance) that can be used to construct forecasts." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mxnet_estimator.fit({\"train\": s3_train_channel, \"test\": s3_test_channel}, \n", " experiment_config=experiment_config,\n", " wait=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Setup experiment with Debugger enabled to auto-terminate sub-optimal training jobs \n", "\n", "We ran a parameter sweep and created lots of different configurations. Doing so may produce parameter combinations that lead to sub-optimal models. We can use SageMaker Debugger to tune our experiment at training time. Debugger automatically captures data from the model training and provides builtin rules that check for conditions such as overfitting, vanishing gradients etc. We can then specify actions to auto-terminate training jobs ahead of time that would otherwise produce low-quality models.\n", "\n", "Some of the models in our experiment use Recurrent Neural Networks that can suffer from the vanishing gradient problem. So we select Debugger's tensor variance thats allows to specify and upper and lower bound on the gradient values. We also specify the action `StopTraining` that will stop a training job once the rule triggers.\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "from sagemaker.debugger import Rule, CollectionConfig, rule_configs\n", "\n", "actions = rule_configs.ActionList(\n", " rule_configs.StopTraining(), \n", ")\n", "\n", "rule = Rule.sagemaker(base_config=rule_configs.tensor_variance(),\n", " rule_parameters={\"min_threshold\": '0.00001',\n", " \"max_threshold\": '100000.0',\n", " \"collection_names\": 'custom_collection'},\n", " actions=actions )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Per default Debugger collects data with an interval of 500 steps. In our case, training dataset is small and our models only train for few minutes, so we can decrease the save interval. Here we create a custom collection, where we collect gradients at an interval of 5." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from sagemaker.debugger import DebuggerHookConfig, CollectionConfig\n", "\n", "debugger_hook_config = DebuggerHookConfig(\n", " collection_configs=[ \n", " CollectionConfig(\n", " name=\"custom_collection\",\n", " parameters={ \"include_regex\": \"(.*gradient)(?!.*featureembedder)(.*weight)\",\n", " \"start_step\": \"10\",\n", " \"save_interval\": \"5\"})])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we create a new experiment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datetime import datetime\n", "from smexperiments.experiment import Experiment\n", "from smexperiments.trial import Trial\n", "\n", "sagemaker_boto_client = boto3.client(\"sagemaker\")\n", "\n", "#name of experiment\n", "timestep = datetime.now()\n", "timestep = timestep.strftime(\"%d-%m-%Y-%H-%M-%S\")\n", "experiment_name = timestep + \"-timeseries-models\"\n", "\n", "#create experiment\n", "Experiment.create(\n", " experiment_name=experiment_name, \n", " description=\"Timeseries models\", \n", " sagemaker_boto_client=sagemaker_boto_client)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7. Train and validate models \n", "\n", "In section 5 we trained one model. Now we iterate over all possible combinations of hyperparameters and algorithms with the SageMaker Debugger rules enabled to detect problems such as vanishing gradients and terminate these training jobs.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker.mxnet import MXNet\n", "\n", "for idx, row in df_hps_combo.iterrows():\n", "\n", " trial = Trial.create(\n", " experiment_name=experiment_name,\n", " sagemaker_boto_client=sagemaker_boto_client\n", " )\n", "\n", " experiment_config = { \"ExperimentName\": experiment_name, \n", " \"TrialName\": trial.trial_name,\n", " \"TrialComponentDisplayName\": \"Training\"}\n", " \n", " mxnet_estimator = MXNet(entry_point='blog_train_algos.py',\n", " role=sagemaker.get_execution_role(),\n", " instance_type='ml.m5.large',\n", " instance_count=1,\n", " framework_version='1.7.0', py_version='py3',\n", " debugger_hook_config=debugger_hook_config,\n", " rules=[rule],\n", " hyperparameters={'bucket': bucket, 'seq': trial.trial_name,\n", " 'algo': row['algo'], \n", " 'freq': \"D\", \n", " 'prediction_length': row['prediction_length'], \n", " 'epochs': row['epochs'],\n", " 'learning_rate': row['learning_rate'],\n", " 'hybridize': row['hybridize'],\n", " 'num_batches_per_epoch': row['num_batches_per_epoch']\n", " })\n", "\n", " mxnet_estimator.fit({\"train\": s3_train_channel, \"test\": s3_test_channel}, \n", " experiment_config=experiment_config,\n", " wait=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the experiment is finished we can determine how many seconds it ran. First we define a helper function to compute the billabale seconds and how many training jobs were auto-terminated." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "import time\n", "\n", "def compute_job_statistics(df):\n", " total_cost = 0\n", " stopped = 0\n", " for name in df['sagemaker_job_name']:\n", " while sagemaker_boto_client.describe_training_job(TrainingJobName=name[1:-1])['TrainingJobStatus'] == 'InProgress':\n", " print('Experiment is still in progress')\n", " time.sleep(30)\n", " continue\n", " total_cost += sagemaker_boto_client.describe_training_job(TrainingJobName=name[1:-1])['BillableTimeInSeconds']\n", " if sagemaker_boto_client.describe_training_job(TrainingJobName=name[1:-1])['TrainingJobStatus'] == \"Stopped\":\n", " stopped += 1\n", " return stopped, total_cost" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Billable seconds for overall experiment with Debugger: 3708 seconds. Number of training jobs auto-terminated: 2\n" ] } ], "source": [ "from sagemaker.analytics import ExperimentAnalytics\n", "trial_component_analytics = ExperimentAnalytics(experiment_name=experiment_name)\n", "\n", "stopped, total_cost = compute_job_statistics(trial_component_analytics.dataframe())\n", "print(\"Billable seconds for overall experiment with Debugger:\", total_cost, \"seconds. Number of training jobs auto-terminated:\", stopped)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This setup is especially useful if you run a parameter sweep with training jobs that train for hours. In our case each job only trained for less than 10 minutes. Until the Debugger data is uploaded, fetched and downloaded into the processing job, a few minutes may pass, so the potential cost reduction will be less for smaller training jobs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 8. Evaluate metrics and select a winning candidate \n", "\n", "Amazon SageMaker Studio provides an experiments browser that you can use to view lists of experiments, trials, and trial components. You can choose one of these entities to view detailed information about the entity or choose multiple entities for comparison. For more details please refer to [the documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/experiments-view-compare.html#experiments-view). Once the training jobs are running we can use the experiment view in Studio (see screenshot below) or the `ExperimentAnalytics` module to track the status of our training jobs and their metrics. \n", "![](screenshot.png)\n", "\n", "\n", "In the training script we used SageMaker Debugger's function `save_scalar` to store metrics such as MAPE, MSE, RMSE in the experiment. We can access the recorded metrics via the ExperimentAnalytics function and convert it to a Pandas dataframe.\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TrialComponentNameDisplayNameSourceArnSageMaker.ImageUriSageMaker.InstanceCountSageMaker.InstanceTypeSageMaker.VolumeSizeInGBalgobucketepochs...scalar/MAPE_GLOBAL - Avgscalar/MAPE_GLOBAL - StdDevscalar/MAPE_GLOBAL - Lastscalar/MAPE_GLOBAL - Countscalar/MSE_GLOBAL - Minscalar/MSE_GLOBAL - Maxscalar/MSE_GLOBAL - Avgscalar/MSE_GLOBAL - StdDevscalar/MSE_GLOBAL - Lastscalar/MSE_GLOBAL - Count
0mxnet-training-2021-02-12-02-45-53-493-aws-tra...Trainingarn:aws:sagemaker:us-east-1:841408598787:train...763104351884.dkr.ecr.us-east-1.amazonaws.com/m...1.0ml.m5.large30.0\"seq2seq\"\"sagemaker-us-east-1-841408598787\"50.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1mxnet-training-2021-02-12-02-45-58-635-aws-tra...Trainingarn:aws:sagemaker:us-east-1:841408598787:train...763104351884.dkr.ecr.us-east-1.amazonaws.com/m...1.0ml.m5.large30.0\"seq2seq\"\"sagemaker-us-east-1-841408598787\"50.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2mxnet-training-2021-02-12-02-45-54-132-aws-tra...Trainingarn:aws:sagemaker:us-east-1:841408598787:train...763104351884.dkr.ecr.us-east-1.amazonaws.com/m...1.0ml.m5.large30.0\"seq2seq\"\"sagemaker-us-east-1-841408598787\"50.0...0.1740170.00.1740171.0133.882625133.882625133.8826250.0133.8826251.0
3mxnet-training-2021-02-12-02-45-44-916-aws-tra...Trainingarn:aws:sagemaker:us-east-1:841408598787:train...763104351884.dkr.ecr.us-east-1.amazonaws.com/m...1.0ml.m5.large30.0\"DeepAR\"\"sagemaker-us-east-1-841408598787\"50.0...0.1967160.00.1967161.0186.979704186.979704186.9797040.0186.9797041.0
4mxnet-training-2021-02-12-02-45-50-439-aws-tra...Trainingarn:aws:sagemaker:us-east-1:841408598787:train...763104351884.dkr.ecr.us-east-1.amazonaws.com/m...1.0ml.m5.large30.0\"DeepAR\"\"sagemaker-us-east-1-841408598787\"50.0...0.2352050.00.2352051.0281.982975281.982975281.9829750.0281.9829751.0
5mxnet-training-2021-02-12-02-45-44-334-aws-tra...Trainingarn:aws:sagemaker:us-east-1:841408598787:train...763104351884.dkr.ecr.us-east-1.amazonaws.com/m...1.0ml.m5.large30.0\"DeepAR\"\"sagemaker-us-east-1-841408598787\"50.0...0.4144630.00.4144631.0614.407812614.407812614.4078120.0614.4078121.0
6mxnet-training-2021-02-12-02-45-51-083-aws-tra...Trainingarn:aws:sagemaker:us-east-1:841408598787:train...763104351884.dkr.ecr.us-east-1.amazonaws.com/m...1.0ml.m5.large30.0\"seq2seq\"\"sagemaker-us-east-1-841408598787\"50.0...0.1406080.00.1406081.088.19129288.19129288.1912920.088.1912921.0
7mxnet-training-2021-02-12-02-45-43-580-aws-tra...Trainingarn:aws:sagemaker:us-east-1:841408598787:train...763104351884.dkr.ecr.us-east-1.amazonaws.com/m...1.0ml.m5.large30.0\"DeepAR\"\"sagemaker-us-east-1-841408598787\"50.0...0.1849220.00.1849221.0170.474154170.474154170.4741540.0170.4741541.0
\n", "

8 rows × 55 columns

\n", "
" ], "text/plain": [ " TrialComponentName DisplayName \\\n", "0 mxnet-training-2021-02-12-02-45-53-493-aws-tra... Training \n", "1 mxnet-training-2021-02-12-02-45-58-635-aws-tra... Training \n", "2 mxnet-training-2021-02-12-02-45-54-132-aws-tra... Training \n", "3 mxnet-training-2021-02-12-02-45-44-916-aws-tra... Training \n", "4 mxnet-training-2021-02-12-02-45-50-439-aws-tra... Training \n", "5 mxnet-training-2021-02-12-02-45-44-334-aws-tra... Training \n", "6 mxnet-training-2021-02-12-02-45-51-083-aws-tra... Training \n", "7 mxnet-training-2021-02-12-02-45-43-580-aws-tra... Training \n", "\n", " SourceArn \\\n", "0 arn:aws:sagemaker:us-east-1:841408598787:train... \n", "1 arn:aws:sagemaker:us-east-1:841408598787:train... \n", "2 arn:aws:sagemaker:us-east-1:841408598787:train... \n", "3 arn:aws:sagemaker:us-east-1:841408598787:train... \n", "4 arn:aws:sagemaker:us-east-1:841408598787:train... \n", "5 arn:aws:sagemaker:us-east-1:841408598787:train... \n", "6 arn:aws:sagemaker:us-east-1:841408598787:train... \n", "7 arn:aws:sagemaker:us-east-1:841408598787:train... \n", "\n", " SageMaker.ImageUri SageMaker.InstanceCount \\\n", "0 763104351884.dkr.ecr.us-east-1.amazonaws.com/m... 1.0 \n", "1 763104351884.dkr.ecr.us-east-1.amazonaws.com/m... 1.0 \n", "2 763104351884.dkr.ecr.us-east-1.amazonaws.com/m... 1.0 \n", "3 763104351884.dkr.ecr.us-east-1.amazonaws.com/m... 1.0 \n", "4 763104351884.dkr.ecr.us-east-1.amazonaws.com/m... 1.0 \n", "5 763104351884.dkr.ecr.us-east-1.amazonaws.com/m... 1.0 \n", "6 763104351884.dkr.ecr.us-east-1.amazonaws.com/m... 1.0 \n", "7 763104351884.dkr.ecr.us-east-1.amazonaws.com/m... 1.0 \n", "\n", " SageMaker.InstanceType SageMaker.VolumeSizeInGB algo \\\n", "0 ml.m5.large 30.0 \"seq2seq\" \n", "1 ml.m5.large 30.0 \"seq2seq\" \n", "2 ml.m5.large 30.0 \"seq2seq\" \n", "3 ml.m5.large 30.0 \"DeepAR\" \n", "4 ml.m5.large 30.0 \"DeepAR\" \n", "5 ml.m5.large 30.0 \"DeepAR\" \n", "6 ml.m5.large 30.0 \"seq2seq\" \n", "7 ml.m5.large 30.0 \"DeepAR\" \n", "\n", " bucket epochs ... scalar/MAPE_GLOBAL - Avg \\\n", "0 \"sagemaker-us-east-1-841408598787\" 50.0 ... NaN \n", "1 \"sagemaker-us-east-1-841408598787\" 50.0 ... NaN \n", "2 \"sagemaker-us-east-1-841408598787\" 50.0 ... 0.174017 \n", "3 \"sagemaker-us-east-1-841408598787\" 50.0 ... 0.196716 \n", "4 \"sagemaker-us-east-1-841408598787\" 50.0 ... 0.235205 \n", "5 \"sagemaker-us-east-1-841408598787\" 50.0 ... 0.414463 \n", "6 \"sagemaker-us-east-1-841408598787\" 50.0 ... 0.140608 \n", "7 \"sagemaker-us-east-1-841408598787\" 50.0 ... 0.184922 \n", "\n", " scalar/MAPE_GLOBAL - StdDev scalar/MAPE_GLOBAL - Last \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 0.0 0.174017 \n", "3 0.0 0.196716 \n", "4 0.0 0.235205 \n", "5 0.0 0.414463 \n", "6 0.0 0.140608 \n", "7 0.0 0.184922 \n", "\n", " scalar/MAPE_GLOBAL - Count scalar/MSE_GLOBAL - Min \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 1.0 133.882625 \n", "3 1.0 186.979704 \n", "4 1.0 281.982975 \n", "5 1.0 614.407812 \n", "6 1.0 88.191292 \n", "7 1.0 170.474154 \n", "\n", " scalar/MSE_GLOBAL - Max scalar/MSE_GLOBAL - Avg scalar/MSE_GLOBAL - StdDev \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 133.882625 133.882625 0.0 \n", "3 186.979704 186.979704 0.0 \n", "4 281.982975 281.982975 0.0 \n", "5 614.407812 614.407812 0.0 \n", "6 88.191292 88.191292 0.0 \n", "7 170.474154 170.474154 0.0 \n", "\n", " scalar/MSE_GLOBAL - Last scalar/MSE_GLOBAL - Count \n", "0 NaN NaN \n", "1 NaN NaN \n", "2 133.882625 1.0 \n", "3 186.979704 1.0 \n", "4 281.982975 1.0 \n", "5 614.407812 1.0 \n", "6 88.191292 1.0 \n", "7 170.474154 1.0 \n", "\n", "[8 rows x 55 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sagemaker.analytics import ExperimentAnalytics\n", "\n", "trial_component_analytics = ExperimentAnalytics(experiment_name=experiment_name)\n", "tc_df = trial_component_analytics.dataframe()\n", "tc_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look on the metrics and hyperparameter combinations:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trialsepochslearning_ratehybridizenum_batches_per_epochprediction_lengthscalar/MASE_GLOBAL - Minscalar/MSE_GLOBAL - Minscalar/RMSE_GLOBAL - Minscalar/MAPE_GLOBAL - Min
0[Trial-2021-02-12-024553-wfkq]50.00.001true25.060.0NaNNaNNaNNaN
1[Trial-2021-02-12-024558-iujn]50.00.001true50.060.0NaNNaNNaNNaN
2[Trial-2021-02-12-024553-dsam]50.00.001true50.030.01.339320133.88262511.5707660.174017
3[Trial-2021-02-12-024544-rwcv]50.00.001true50.030.01.453763186.97970413.6740520.196716
4[Trial-2021-02-12-024550-vfjm]50.00.001true50.060.02.010920281.98297516.7923490.235205
5[Trial-2021-02-12-024544-pfow]50.00.001true25.060.03.382988614.40781224.7872510.414463
6[Trial-2021-02-12-024550-rigj]50.00.001true25.030.01.08768588.1912929.3910220.140608
7[Trial-2021-02-12-024543-envu]50.00.001true25.030.01.409506170.47415413.0565750.184922
\n", "
" ], "text/plain": [ " Trials epochs learning_rate hybridize \\\n", "0 [Trial-2021-02-12-024553-wfkq] 50.0 0.001 true \n", "1 [Trial-2021-02-12-024558-iujn] 50.0 0.001 true \n", "2 [Trial-2021-02-12-024553-dsam] 50.0 0.001 true \n", "3 [Trial-2021-02-12-024544-rwcv] 50.0 0.001 true \n", "4 [Trial-2021-02-12-024550-vfjm] 50.0 0.001 true \n", "5 [Trial-2021-02-12-024544-pfow] 50.0 0.001 true \n", "6 [Trial-2021-02-12-024550-rigj] 50.0 0.001 true \n", "7 [Trial-2021-02-12-024543-envu] 50.0 0.001 true \n", "\n", " num_batches_per_epoch prediction_length scalar/MASE_GLOBAL - Min \\\n", "0 25.0 60.0 NaN \n", "1 50.0 60.0 NaN \n", "2 50.0 30.0 1.339320 \n", "3 50.0 30.0 1.453763 \n", "4 50.0 60.0 2.010920 \n", "5 25.0 60.0 3.382988 \n", "6 25.0 30.0 1.087685 \n", "7 25.0 30.0 1.409506 \n", "\n", " scalar/MSE_GLOBAL - Min scalar/RMSE_GLOBAL - Min scalar/MAPE_GLOBAL - Min \n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 133.882625 11.570766 0.174017 \n", "3 186.979704 13.674052 0.196716 \n", "4 281.982975 16.792349 0.235205 \n", "5 614.407812 24.787251 0.414463 \n", "6 88.191292 9.391022 0.140608 \n", "7 170.474154 13.056575 0.184922 " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_df = tc_df[['Trials','epochs', 'learning_rate', 'hybridize', 'num_batches_per_epoch','prediction_length','scalar/MASE_GLOBAL - Min', 'scalar/MSE_GLOBAL - Min', 'scalar/RMSE_GLOBAL - Min', 'scalar/MAPE_GLOBAL - Min']]\n", "new_df " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The smallest MAPE achieved is:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.1406077684247318" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mape_min = new_df['scalar/MAPE_GLOBAL - Min'].min()\n", "mape_min" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's select the winner model:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Trialsepochslearning_ratehybridizenum_batches_per_epochprediction_lengthscalar/MASE_GLOBAL - Minscalar/MSE_GLOBAL - Minscalar/RMSE_GLOBAL - Minscalar/MAPE_GLOBAL - Min
6[Trial-2021-02-12-024550-rigj]50.00.001true25.030.01.08768588.1912929.3910220.140608
\n", "
" ], "text/plain": [ " Trials epochs learning_rate hybridize \\\n", "6 [Trial-2021-02-12-024550-rigj] 50.0 0.001 true \n", "\n", " num_batches_per_epoch prediction_length scalar/MASE_GLOBAL - Min \\\n", "6 25.0 30.0 1.087685 \n", "\n", " scalar/MSE_GLOBAL - Min scalar/RMSE_GLOBAL - Min scalar/MAPE_GLOBAL - Min \n", "6 88.191292 9.391022 0.140608 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_winner = new_df[new_df['scalar/MAPE_GLOBAL - Min'] == mape_min]\n", "if len(df_winner)!=1: #the df_winner should only content 1 model", " df_winner=df_winner.head(1)", "df_winner" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we download the winner model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#download the champion from S3 and store it in the winner folder\n", "import os\n", "s3 = boto3.client(\"s3\")\n", "windir = \"gluonts/blog-models/\"+str(df_winner['Trials'].item()).replace(\"['\",\"\").replace(\"']\",\"\")+\"/\"\n", "\n", "def downloadDirectoryFroms3(bucket, windir):\n", " s3_resource = boto3.resource('s3')\n", " bucket = s3_resource.Bucket(bucket) \n", " for obj in bucket.objects.filter(Prefix = windir):\n", " print(obj.key)\n", " if not os.path.exists(os.path.dirname(windir)):\n", " os.makedirs(os.path.dirname(windir))\n", " bucket.download_file(obj.key, obj.key) # save to same path\n", "\n", "\n", "downloadDirectoryFroms3(bucket, windir) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 9. Run time series forecasts \n", "After we have downloaded the model, we can run prediction on it. Therefore we use the predictor API from GluonTS, that takes the path to the model files:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:gluonts.mx.context:Using CPU\n" ] } ], "source": [ "# restore the predictor\n", "import pathlib\n", "from gluonts.model.predictor import Predictor\n", "\n", "path = pathlib.Path(windir) \n", "winning_predictor = Predictor.deserialize(path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we run predictions on the test dataset and visualize them:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2014-11-30 00:00:00\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "from gluonts.dataset.common import ListDataset\n", "plt.rcParams['figure.figsize'] = (20.0, 6.0)\n", "\n", "# run forecast\n", "startdate = '2014-11-01 01:00:00'\n", "test_pred = ListDataset(\n", " [{\"start\": startdate, \"target\": raw_df.query('date >= \"2014-11-01 01:00:00\" and client == \"client_12\"').copy()['usage'], \"item_id\": 'client_12'}],\n", " freq = \"1H\"\n", ")\n", "\n", "pred = winning_predictor.predict(test_pred)\n", "for test_entry, forecast in zip(test_pred, pred):\n", " print(forecast.start_date)\n", " plt.plot(pd.date_range(start=startdate, periods=30), pd.DataFrame.from_dict(test_entry['target'])[0][:30],color='b')\n", " plt.plot(pd.date_range(start=forecast.start_date, periods=df_winner['prediction_length'].item()), forecast.quantile(.3), color='r') #samples contain all 100 quantiles\n", " plt.plot(pd.date_range(start=forecast.start_date, periods=df_winner['prediction_length'].item()), forecast.quantile(.5), color='b') #samples contain all 100 quantiles\n", " plt.plot(pd.date_range(start=forecast.start_date, periods=df_winner['prediction_length'].item()), forecast.quantile(.7), color='k') #samples contain all 100 quantiles\n", " x=pd.date_range(start=forecast.start_date, periods=df_winner['prediction_length'].item()) #samples contain all 100 quantiles\n", " y=forecast.quantile(.1) \n", " z=forecast.quantile(.9)\n", " plt.fill_between(x,y,z,color='g', alpha=0.3)\n", "plt.xticks(rotation=30)\n", "plt.legend(['Usage'], loc = 'lower left')\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With Amazon SageMaker we saw how easy it is for every developer and data scientist to setup time series forecasting at scale using the MXNet Estimator with the GluonTS toolkit. Amazon SageMaker removes the undifferentiated heavy lifting from every step of our ML process, automates infrastructure management, enables us to improve the training efficiency with SageMaker Debugger, and accelerates adoption of ML workflows from months to days. Please try out the notebook (https://a2i-experiments.notebook.us-east-1.sagemaker.aws/notebooks/gluonTS/rauscn/blog-gluonts-toolkit-on-sagemaker-PR.ipynb) from our post and let us know your comments and feedback." ] } ], "metadata": { "kernelspec": { "display_name": "conda_mxnet_p36", "language": "python", "name": "conda_mxnet_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }