## Introduction

This notebook demonstrates how to train a reinforcement learning agent to solve a portfolio optimization problem while enforcing constraints on the action space. The objective of the portfolio optimization problem is to maximize the final portfolio value at the end of the episode. We consider a scenario that involves three asset types. The investor starts with a 1000 USD cash balance which will be used to finance asset purchases. The action space of the problem is three dimensional. The 3-D action vector corresponds to the trades (Buy/Sell/Hold) that the agent executes in each asset type. We demonstrate how to use action masking to enforce the following four constraints

* C1:	The agent cannot sell more units of any asset type than what they currently own. For e.g., if the investor has 100 units of Asset 3 at time k in their portfolio, then the agent cannot sell 120 counts of that asset at that time.

* C2:	Asset 3 is considered highly volatile by the investor. The agent is not allowed to buy Asset 3 if the total value of their holdings in Asset 3 is above a third of their total portfolio value.

* C3:	The investor has a moderate risk preference and considers Asset 2 a conservative buy. As a result, the agent is not allowed to buy asset 2 when the total value of asset 2 holdings cross 2/3 of total portfolio value.

* C4:	The agent cannot buy any assets if their current cash balance is less than 1 USD

We will use SageMaker RL and Ray Rllib to train our agent using action masking


## Pre-requsites

### Import libraries

In [None]:
import sagemaker
import boto3
import sys
import os
import glob
import re
import subprocess
from IPython.display import HTML
import time
from time import gmtime, strftime
sys.path.append("common")
from misc import get_execution_role, wait_for_s3_object
from sagemaker.rl import RLEstimator, RLToolkit, RLFramework
from datetime import datetime
import logging
import numpy as np

### Setup S3 bucket


In [None]:
sage_session = sagemaker.session.Session()
s3_bucket = sage_session.default_bucket()
s3_output_path = "s3://{}/".format(s3_bucket)
print("S3 bucket path: {}".format(s3_output_path))

### Define Variables 


In [None]:
# create a descriptive job name
job_name_prefix = 'rl-portfolio-trading'


### Configure training mode


In [None]:
local_mode = False
if local_mode:
 instance_type = 'local'
else:
 # If on SageMaker, pick the instance type
 instance_type = "ml.m4.16xlarge"

### Create an IAM role


In [None]:
try:
 role = sagemaker.get_execution_role()
except:
 role = get_execution_role()

print("Using IAM role arn: {}".format(role))


### Define Metrics


In [None]:
metric_definitions = [{'Name': 'episode_reward_mean',
 'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_max',
 'Regex': 'episode_reward_max: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 {'Name': 'episode_reward_min',
 'Regex': 'episode_reward_min: ([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)'},
 ]

### Define Estimator


In [None]:
train_entry_point = "train_config.py" 
train_job_max_duration_in_seconds = 3600 * 15

cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'
aws_region = boto3.Session().region_name
custom_image_name = "462105765813.dkr.ecr.%s.amazonaws.com/sagemaker-rl-ray-container:ray-1.6.0-tf-%s-py37" % (aws_region, cpu_or_gpu)

estimator = RLEstimator(entry_point= train_entry_point,
 source_dir="src",
 dependencies=["common/sagemaker_rl"],
 image_uri=custom_image_name,
 role=role,
 train_instance_type=instance_type,
 train_instance_count=1,
 output_path=s3_output_path,
 base_job_name=job_name_prefix,
 metric_definitions=metric_definitions,
 train_max_run=train_job_max_duration_in_seconds,
 hyperparameters={}
 )

In [None]:
estimator.fit(wait=local_mode)
job_name = estimator._current_job_name
print("Job name: {}".format(job_name))

### Plot metrics for training job


In [None]:
%matplotlib inline
from sagemaker.analytics import TrainingJobAnalytics

In [None]:
if not local_mode:
 df = TrainingJobAnalytics(job_name, ["episode_reward_mean"]).dataframe()
 df["rl_reward_mean"] = df["value"]
 num_metrics = len(df)

 if num_metrics == 0:
 print("No algorithm metrics found in CloudWatch")
 else:
 plt = df.plot(
 x="timestamp",
 y=["rl_reward_mean"],
 figsize=(18, 6),
 fontsize=18,
 legend=True,
 style="-",
 color=["b", "r", "g"],
 )
 plt.plot(1000*np.ones(int(max(df["timestamp"]))),'r--',label='Initial Cash Balance')
 plt.grid()
 plt.set_ylabel("Mean reward per episode", fontsize=20)
 plt.set_xlabel("Training time (s)", fontsize=20)
 plt.legend(loc=4, prop={"size": 20})
else:
 print("Can't plot metrics in local mode.")