{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"font-size:200%;font-weight:bold\">Using Amazon SageMaker RL</div>\n",
    "\n",
    "Amazon SageMaker RL allows you to train your RL agents in cloud machines using docker containers. You do not have to worry about setting up your machines with the RL toolkits and deep learning frameworks. You can easily switch between many different machines setup for you, including powerful GPU machines that give a big speedup. You can also choose to use multiple machines in a cluster to further speedup training, often necessary for production level loads.\n",
    "\n",
    "Please note that this notebook defaults to just 10 training episodes to minimize training times. You may want to explore the agents's behavior by increasing the training episodes (i.e., hyperparameter `rl.training.stop.training_iteration`) to a larger number (e.g., 5000, etc.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "\n",
    "import sagemaker\n",
    "import boto3\n",
    "import sys\n",
    "import os\n",
    "import glob\n",
    "import re\n",
    "import subprocess\n",
    "from IPython.display import HTML\n",
    "import time\n",
    "from time import gmtime, strftime\n",
    "from smnb_utils.misc import get_execution_role, wait_for_s3_object, wait_for_training_job_to_complete\n",
    "from sagemaker.rl import RLEstimator, RLToolkit, RLFramework"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Prologue"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Variables\n",
    "\n",
    "We define variables such as the job prefix for the training jobs *and the image path for the container (only when this is BYOC).*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a descriptive job name \n",
    "job_name_prefix = 'rl-battery'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup S3 bucket\n",
    "\n",
    "Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sage_session = sagemaker.session.Session()\n",
    "s3_bucket = sage_session.default_bucket()  \n",
    "s3_output_path = 's3://{}/'.format(s3_bucket)\n",
    "print(\"S3 bucket path: {}\".format(s3_output_path))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get an IAM role\n",
    "\n",
    "Either get the execution role when running from a SageMaker notebook `role = sagemaker.get_execution_role()` or, when running locally, set it to an IAM role with `AmazonSageMakerFullAccess` and `CloudWatchFullAccess permissions`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    role = sagemaker.get_execution_role()\n",
    "except:\n",
    "    role = get_execution_role()\n",
    "\n",
    "print(\"Using IAM role arn: {}\".format(role))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure where training happens\n",
    "\n",
    "You can run the RL training on a SageMaker training instance, or locally on this SageMaker notebook instance. The local mode speeds up iterative testing and debugging while using the same familiar Python SDK interface. You just need to set `local_mode = True`. Note, you can only run a single local notebook at a time.\n",
    "\n",
    "The next cell will run a helper script `../src/common/setup.sh` to prep your SageMaker notebook instance for `local_mode = True`. Note that local mode on non-SageMaker notebook instance requires you to install docker, docker-compose, and optionally nvidia-docker on your machine -- the steps are not covered here, but feel free to consult the documentations of those tools."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# run in local_mode on this machine, or as a SageMaker TrainingJob?\n",
    "local_mode = True\n",
    "\n",
    "if local_mode:\n",
    "    instance_type = 'local'\n",
    "\n",
    "    # Run next line only on a SageMaker notebook instance\n",
    "    !/bin/bash ../bin/setup.sh\n",
    "else:\n",
    "    # If on SageMaker, pick the instance type\n",
    "    instance_type = \"ml.p3.2xlarge\"\n",
    "print(instance_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Metric\n",
    "A list of dictionaries that defines the metric(s) used to evaluate the training jobs. Each dictionary contains two keys: ‘Name’ for the name of the metric, and ‘Regex’ for the regular expression used to extract the metric from the logs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "metric_definitions = [{'Name': 'episode_reward_mean',\n",
    "  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'episode_reward_max',\n",
    "  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'episode_len_mean',\n",
    "  'Regex': 'episode_len_mean: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'entropy',\n",
    "  'Regex': 'entropy: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'episode_reward_min',\n",
    "  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'vf_loss',\n",
    "  'Regex': 'vf_loss: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n",
    " {'Name': 'policy_loss',\n",
    "  'Regex': 'policy_loss: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},                                            \n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Estimator\n",
    "This Estimator executes an RLEstimator script in a managed Reinforcement Learning (RL) execution environment within a SageMaker Training Job. The managed RL environment is an Amazon-built Docker container that executes functions defined in the supplied entry_point Python script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "train_job_max_duration_in_seconds = 60 * 60 * 24\n",
    "\n",
    "hyperparameters =  {\n",
    "\"rl.training.config.env_config.MAX_STEPS_PER_EPISODE\": 168,\n",
    "\"rl.training.config.env_config.LOCAL\": True,\n",
    "\"rl.training.config.env_config.FILEPATH\": \"agent_input/sample-data.csv\",\n",
    "\"rl.training.config.use_pytorch\": False,\n",
    "\"rl.training.stop.training_iteration\": 10,\n",
    "\"rl.training.run\": \"DQN\",\n",
    "}\n",
    "\n",
    "estimator = RLEstimator(entry_point=\"train_battery_sm.py\",\n",
    "                        source_dir=\"../src/source_dir\",\n",
    "                        dependencies=[\n",
    "                            # Include sagemaker_rl module using the `dependencies` mechanism.\n",
    "                            \"../src/sagemaker_rl\",\n",
    "\n",
    "                            # Include the module. Alternatively, specify this in source_dir/requirements.txt\n",
    "                            # to git+https://github.com/aws-samples/sagemaker-rl-energy-storage-system\n",
    "                            \"../src/energy_storage_system\",\n",
    "\n",
    "                            # Upload data (TODO: why not as data channel?)\n",
    "                            \"../data/agent_input\",\n",
    "                        ],\n",
    "                        toolkit=RLToolkit.RAY,\n",
    "                        toolkit_version='0.8.5',\n",
    "                        framework=RLFramework.TENSORFLOW,\n",
    "                        role=role,\n",
    "                        instance_type=instance_type,\n",
    "                        instance_count=1,\n",
    "                        debugger_hook_config=False,\n",
    "                        output_path=s3_output_path,\n",
    "                        base_job_name=job_name_prefix,\n",
    "                        metric_definitions=metric_definitions,\n",
    "                        max_run=train_job_max_duration_in_seconds,\n",
    "                        hyperparameters=hyperparameters,\n",
    "                       )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "estimator.fit(wait=local_mode)\n",
    "job_name=estimator._current_job_name\n",
    "print(\"Job name: {}\".format(job_name))",
    "\n",
    "#Store the job to load for evaluation\n",
    "%store job_name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Visualization\n",
    "\n",
    "RL training can take a long time.  So while it's running there are a variety of ways we can track progress of the running training job.  Some intermediate output gets saved to S3 during training, so we'll set up to capture that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "s3_url = \"s3://{}/{}\".format(s3_bucket, job_name)\n",
    "\n",
    "intermediate_folder_key = \"{}/output/intermediate/\".format(job_name)\n",
    "intermediate_url = \"s3://{}/{}training/\".format(s3_bucket, intermediate_folder_key)\n",
    "\n",
    "print(\"S3 job path: {}\".format(s3_url))\n",
    "print(\"Intermediate folder path: {}\".format(intermediate_url))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot metrics for training job\n",
    "We can see the reward metric of the training as it's running, using algorithm metrics that are recorded in CloudWatch metrics.  We can plot this to see the performance of the model over time.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "from sagemaker.analytics import TrainingJobAnalytics\n",
    "\n",
    "if not local_mode:\n",
    "    wait_for_training_job_to_complete(job_name) # Wait for the job to finish\n",
    "    df = TrainingJobAnalytics(job_name, ['episode_reward_mean']).dataframe()\n",
    "    df_min = TrainingJobAnalytics(job_name, ['episode_reward_min']).dataframe()\n",
    "    df_max = TrainingJobAnalytics(job_name, ['episode_reward_max']).dataframe()\n",
    "    df['rl_reward_mean'] = df['value']\n",
    "    df['rl_reward_min'] = df_min['value']\n",
    "    df['rl_reward_max'] = df_max['value']\n",
    "    num_metrics = len(df)\n",
    "    \n",
    "    if num_metrics == 0:\n",
    "        print(\"No algorithm metrics found in CloudWatch\")\n",
    "else:\n",
    "    print(\"Can't plot metrics in local mode.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not local_mode:\n",
    "    mean_reward = df['rl_reward_mean'].mean()\n",
    "    print(f\"Average reward: {mean_reward}\")\n",
    "    plt = df.plot(x='timestamp', y=['rl_reward_mean'], figsize=(20,5), fontsize=18, legend=True, style='-', color=['b','r','g'])\n",
    "    plt.set_ylabel('Mean reward per episode', fontsize=20)\n",
    "    plt.set_xlabel('Training time (s)', fontsize=20)\n",
    "    plt.axhline(y=mean_reward, color='r')\n",
    "    plt.grid()\n",
    "else:\n",
    "    print(\"Can't generate reports in local mode.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Save Metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not local_mode:\n",
    "    df.to_csv(\"metrics.csv\", index=False)\n",
    "else:\n",
    "    print(\"Can't save in local mode.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.13"
  },
  "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  },
  "toc-autonumbering": true
 },
 "nbformat": 4,
 "nbformat_minor": 4
}