{ "cells": [ { "cell_type": "markdown", "id": "f08ee0d2", "metadata": {}, "source": [ "## Introduction\n", "\n", "This notebook demonstrates how to train a reinforcement learning agent to solve a portfolio optimization problem while enforcing constraints on the action space. The objective of the portfolio optimization problem is to maximize the final portfolio value at the end of the episode. We consider a scenario that involves three asset types. The investor starts with a 1000 USD cash balance which will be used to finance asset purchases. The action space of the problem is three dimensional. The 3-D action vector corresponds to the trades (Buy/Sell/Hold) that the agent executes in each asset type. We demonstrate how to use action masking to enforce the following four constraints\n", "\n", "* C1:\tThe agent cannot sell more units of any asset type than what they currently own. For e.g., if the investor has 100 units of Asset 3 at time k in their portfolio, then the agent cannot sell 120 counts of that asset at that time.\n", "\n", "* C2:\tAsset 3 is considered highly volatile by the investor. The agent is not allowed to buy Asset 3 if the total value of their holdings in Asset 3 is above a third of their total portfolio value.\n", "\n", "* C3:\tThe investor has a moderate risk preference and considers Asset 2 a conservative buy. As a result, the agent is not allowed to buy asset 2 when the total value of asset 2 holdings cross 2/3 of total portfolio value.\n", "\n", "* C4:\tThe agent cannot buy any assets if their current cash balance is less than 1 USD\n", "\n", "We will use SageMaker RL and Ray Rllib to train our agent using action masking\n" ] }, { "cell_type": "markdown", "id": "d09a6d70", "metadata": {}, "source": [ "## Pre-requsites\n", "\n", "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "id": "89a78f5d", "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "import boto3\n", "import sys\n", "import os\n", "import glob\n", "import re\n", "import subprocess\n", "from IPython.display import HTML\n", "import time\n", "from time import gmtime, strftime\n", "sys.path.append(\"common\")\n", "from misc import get_execution_role, wait_for_s3_object\n", "from sagemaker.rl import RLEstimator, RLToolkit, RLFramework\n", "from datetime import datetime\n", "import logging\n", "import numpy as np" ] }, { "cell_type": "markdown", "id": "32a17a5e", "metadata": {}, "source": [ "### Setup S3 bucket\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f1fb10dc", "metadata": {}, "outputs": [], "source": [ "sage_session = sagemaker.session.Session()\n", "s3_bucket = sage_session.default_bucket()\n", "s3_output_path = \"s3://{}/\".format(s3_bucket)\n", "print(\"S3 bucket path: {}\".format(s3_output_path))" ] }, { "cell_type": "markdown", "id": "8ec95cc6", "metadata": {}, "source": [ "### Define Variables \n" ] }, { "cell_type": "code", "execution_count": null, "id": "a3e23473", "metadata": {}, "outputs": [], "source": [ "# create a descriptive job name\n", "job_name_prefix = 'rl-portfolio-trading'\n" ] }, { "cell_type": "markdown", "id": "0d5cf800", "metadata": {}, "source": [ "### Configure training mode\n" ] }, { "cell_type": "code", "execution_count": null, "id": "049a4ca4", "metadata": {}, "outputs": [], "source": [ "local_mode = False\n", "if local_mode:\n", " instance_type = 'local'\n", "else:\n", " # If on SageMaker, pick the instance type\n", " instance_type = \"ml.m4.16xlarge\"" ] }, { "cell_type": "markdown", "id": "84915167", "metadata": {}, "source": [ "### Create an IAM role\n" ] }, { "cell_type": "code", "execution_count": null, "id": "97b28c5a", "metadata": {}, "outputs": [], "source": [ "try:\n", " role = sagemaker.get_execution_role()\n", "except:\n", " role = get_execution_role()\n", "\n", "print(\"Using IAM role arn: {}\".format(role))\n" ] }, { "cell_type": "markdown", "id": "2c73a22c", "metadata": {}, "source": [ "### Define Metrics\n" ] }, { "cell_type": "code", "execution_count": null, "id": "794d5714", "metadata": {}, "outputs": [], "source": [ "metric_definitions = [{'Name': 'episode_reward_mean',\n", " 'Regex': 'episode_reward_mean: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n", " {'Name': 'episode_reward_max',\n", " 'Regex': 'episode_reward_max: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n", " {'Name': 'episode_reward_min',\n", " 'Regex': 'episode_reward_min: ([-+]?[0-9]*\\\\.?[0-9]+([eE][-+]?[0-9]+)?)'},\n", " ]" ] }, { "cell_type": "markdown", "id": "e3fd74ca", "metadata": {}, "source": [ "### Define Estimator\n" ] }, { "cell_type": "code", "execution_count": null, "id": "dc352e90", "metadata": {}, "outputs": [], "source": [ "train_entry_point = \"train_config.py\" \n", "train_job_max_duration_in_seconds = 3600 * 15\n", "\n", "cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'\n", "aws_region = boto3.Session().region_name\n", "custom_image_name = \"462105765813.dkr.ecr.%s.amazonaws.com/sagemaker-rl-ray-container:ray-1.6.0-tf-%s-py37\" % (aws_region, cpu_or_gpu)\n", "\n", "estimator = RLEstimator(entry_point= train_entry_point,\n", " source_dir=\"src\",\n", " dependencies=[\"common/sagemaker_rl\"],\n", " image_uri=custom_image_name,\n", " role=role,\n", " train_instance_type=instance_type,\n", " train_instance_count=1,\n", " output_path=s3_output_path,\n", " base_job_name=job_name_prefix,\n", " metric_definitions=metric_definitions,\n", " train_max_run=train_job_max_duration_in_seconds,\n", " hyperparameters={}\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "aaffa72a", "metadata": {}, "outputs": [], "source": [ "estimator.fit(wait=local_mode)\n", "job_name = estimator._current_job_name\n", "print(\"Job name: {}\".format(job_name))" ] }, { "cell_type": "markdown", "id": "f32080b9", "metadata": {}, "source": [ "### Plot metrics for training job\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a162df38", "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from sagemaker.analytics import TrainingJobAnalytics" ] }, { "cell_type": "code", "execution_count": null, "id": "08f0727f", "metadata": {}, "outputs": [], "source": [ "if not local_mode:\n", " df = TrainingJobAnalytics(job_name, [\"episode_reward_mean\"]).dataframe()\n", " df[\"rl_reward_mean\"] = df[\"value\"]\n", " num_metrics = len(df)\n", "\n", " if num_metrics == 0:\n", " print(\"No algorithm metrics found in CloudWatch\")\n", " else:\n", " plt = df.plot(\n", " x=\"timestamp\",\n", " y=[\"rl_reward_mean\"],\n", " figsize=(18, 6),\n", " fontsize=18,\n", " legend=True,\n", " style=\"-\",\n", " color=[\"b\", \"r\", \"g\"],\n", " )\n", " plt.plot(1000*np.ones(int(max(df[\"timestamp\"]))),'r--',label='Initial Cash Balance')\n", " plt.grid()\n", " plt.set_ylabel(\"Mean reward per episode\", fontsize=20)\n", " plt.set_xlabel(\"Training time (s)\", fontsize=20)\n", " plt.legend(loc=4, prop={\"size\": 20})\n", "else:\n", " print(\"Can't plot metrics in local mode.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "8be7091c", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_amazonei_tensorflow2_p36", "language": "python", "name": "conda_amazonei_tensorflow2_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 }