<div style="font-size:200%;font-weight:bold">Using Amazon SageMaker RL</div>

Amazon SageMaker RL allows you to train your RL agents in cloud machines using docker containers. You do not have to worry about setting up your machines with the RL toolkits and deep learning frameworks. You can easily switch between many different machines setup for you, including powerful GPU machines that give a big speedup. You can also choose to use multiple machines in a cluster to further speedup training, often necessary for production level loads.

Please note that this notebook defaults to just 10 training episodes to minimize training times. You may want to explore the agents's behavior by increasing the training episodes (i.e., hyperparameter `rl.training.stop.training_iteration`) to a larger number (e.g., 5000, etc.).

In [None]:
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'

import sagemaker
import boto3
import sys
import os
import glob
import re
import subprocess
from IPython.display import HTML
import time
from time import gmtime, strftime
from smnb_utils.misc import get_execution_role, wait_for_s3_object, wait_for_training_job_to_complete
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework

# Prologue

## Define Variables

We define variables such as the job prefix for the training jobs *and the image path for the container (only when this is BYOC).*

In [None]:
# create a descriptive job name 
job_name_prefix = 'rl-battery'

## Setup S3 bucket

Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata. 

In [None]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()  
s3_output_path = 's3://{}/'.format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

## Get an IAM role

Either get the execution role when running from a SageMaker notebook `role = sagemaker.get_execution_role()` or, when running locally, set it to an IAM role with `AmazonSageMakerFullAccess` and `CloudWatchFullAccess permissions`.

In [None]:
try:
    role = sagemaker.get_execution_role()
except:
    role = get_execution_role()

print("Using IAM role arn: {}".format(role))

## Configure where training happens

You can run the RL training on a SageMaker training instance, or locally on this SageMaker notebook instance. The local mode speeds up iterative testing and debugging while using the same familiar Python SDK interface. You just need to set `local_mode = True`. Note, you can only run a single local notebook at a time.

The next cell will run a helper script `../src/common/setup.sh` to prep your SageMaker notebook instance for `local_mode = True`. Note that local mode on non-SageMaker notebook instance requires you to install docker, docker-compose, and optionally nvidia-docker on your machine -- the steps are not covered here, but feel free to consult the documentations of those tools.

In [None]:
# run in local_mode on this machine, or as a SageMaker TrainingJob?
local_mode = True

if local_mode:
    instance_type = 'local'

    # Run next line only on a SageMaker notebook instance
    !/bin/bash ../bin/setup.sh
else:
    # If on SageMaker, pick the instance type
    instance_type = "ml.p3.2xlarge"
print(instance_type)

# Training

## Define Metric
A list of dictionaries that defines the metric(s) used to evaluate the training jobs. Each dictionary contains two keys: ‘Name’ for the name of the metric, and ‘Regex’ for the regular expression used to extract the metric from the logs.

In [None]:
metric_definitions = [{'Name': 'episode_reward_mean',
  'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_max',
  'Regex': 'episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_len_mean',
  'Regex': 'episode_len_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'entropy',
  'Regex': 'entropy: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_min',
  'Regex': 'episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'vf_loss',
  'Regex': 'vf_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'policy_loss',
  'Regex': 'policy_loss: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},                                            
]

## Define Estimator
This Estimator executes an RLEstimator script in a managed Reinforcement Learning (RL) execution environment within a SageMaker Training Job. The managed RL environment is an Amazon-built Docker container that executes functions defined in the supplied entry_point Python script.

In [None]:
train_job_max_duration_in_seconds = 60 * 60 * 24

hyperparameters =  {
"rl.training.config.env_config.MAX_STEPS_PER_EPISODE": 168,
"rl.training.config.env_config.LOCAL": True,
"rl.training.config.env_config.FILEPATH": "agent_input/sample-data.csv",
"rl.training.config.use_pytorch": False,
"rl.training.stop.training_iteration": 10,
"rl.training.run": "DQN",
}

estimator = RLEstimator(entry_point="train_battery_sm.py",
                        source_dir="../src/source_dir",
                        dependencies=[
                            # Include sagemaker_rl module using the `dependencies` mechanism.
                            "../src/sagemaker_rl",

                            # Include the module. Alternatively, specify this in source_dir/requirements.txt
                            # to git+https://github.com/aws-samples/sagemaker-rl-energy-storage-system
                            "../src/energy_storage_system",

                            # Upload data (TODO: why not as data channel?)
                            "../data/agent_input",
                        ],
                        toolkit=RLToolkit.RAY,
                        toolkit_version='0.8.5',
                        framework=RLFramework.TENSORFLOW,
                        role=role,
                        instance_type=instance_type,
                        instance_count=1,
                        debugger_hook_config=False,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        metric_definitions=metric_definitions,
                        max_run=train_job_max_duration_in_seconds,
                        hyperparameters=hyperparameters,
                       )

In [None]:
estimator.fit(wait=local_mode)
job_name=estimator._current_job_name
print("Job name: {}".format(job_name))
#Store the job to load for evaluation
%store job_name

# Visualization

RL training can take a long time.  So while it's running there are a variety of ways we can track progress of the running training job.  Some intermediate output gets saved to S3 during training, so we'll set up to capture that.

In [None]:
s3_url = "s3://{}/{}".format(s3_bucket, job_name)

intermediate_folder_key = "{}/output/intermediate/".format(job_name)
intermediate_url = "s3://{}/{}training/".format(s3_bucket, intermediate_folder_key)

print("S3 job path: {}".format(s3_url))
print("Intermediate folder path: {}".format(intermediate_url))

## Plot metrics for training job
We can see the reward metric of the training as it's running, using algorithm metrics that are recorded in CloudWatch metrics.  We can plot this to see the performance of the model over time.


In [None]:
%matplotlib inline
from sagemaker.analytics import TrainingJobAnalytics

if not local_mode:
    wait_for_training_job_to_complete(job_name) # Wait for the job to finish
    df = TrainingJobAnalytics(job_name, ['episode_reward_mean']).dataframe()
    df_min = TrainingJobAnalytics(job_name, ['episode_reward_min']).dataframe()
    df_max = TrainingJobAnalytics(job_name, ['episode_reward_max']).dataframe()
    df['rl_reward_mean'] = df['value']
    df['rl_reward_min'] = df_min['value']
    df['rl_reward_max'] = df_max['value']
    num_metrics = len(df)
    
    if num_metrics == 0:
        print("No algorithm metrics found in CloudWatch")
else:
    print("Can't plot metrics in local mode.")

In [None]:
if not local_mode:
    mean_reward = df['rl_reward_mean'].mean()
    print(f"Average reward: {mean_reward}")
    plt = df.plot(x='timestamp', y=['rl_reward_mean'], figsize=(20,5), fontsize=18, legend=True, style='-', color=['b','r','g'])
    plt.set_ylabel('Mean reward per episode', fontsize=20)
    plt.set_xlabel('Training time (s)', fontsize=20)
    plt.axhline(y=mean_reward, color='r')
    plt.grid()
else:
    print("Can't generate reports in local mode.")

## Save Metrics

In [None]:
if not local_mode:
    df.to_csv("metrics.csv", index=False)
else:
    print("Can't save in local mode.")