<div style="font-size:200%;font-weight:bold">Energy Storage System</div>

This notebook demontrates how to train an RL agent for Energy Storage System (ESS) arbitrage. The simulated energy environment is created based on the paper [Arbitrage of Energy Storage in Electricity Markets with Deep Reinforcement Learning](https://arxiv.org/abs/1904.12232), and with [this sample dataset](https://aemo.com.au/en/energy-systems/electricity/national-electricity-market-nem/data-nem/aggregated-data).

Ensure that your Python virtual environment installed the required Python packages specified in `GITROOT/setup.py`.

# Global config

**<div style="color:firebrick">NOTE: due to the pedagogical nature, this notebook fixes the random seed calling 
on `np.random.seed(1)` on every cells that train an agent. Please note that it is NOT sufficient to call 
a single `np.random.seed(1)` because it only affects its own cell.</div>**

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2

import numpy as np
import os
import pandas as pd
from pathlib import Path
from typing import Any, Dict, List, Optional, Union

from energy_storage_system.agents import Agent, MovingAveragePriceAgent, PriceVsCostAgent, RandomAgent
from energy_storage_system.envs import SimpleBattery
from energy_storage_system.utils import ReportIO, evaluate_episode, plot_reward, plot_analysis, train


# Pre-create GITROOT/data and its sub-directories (NOTE: GITROOT/data is not versioned).
data_dir = Path('../data')
%set_env DATA_DIR=$data_dir
!mkdir -p $DATA_DIR/agent_input $DATA_DIR/agent_output $DATA_DIR/bokeh_output $DATA_DIR/streamlit_input

env_config = {
    "MAX_STEPS_PER_EPISODE": 168,
    "LOCAL": True,  # True means to use data from local src folder instead of S3.
    "FILEPATH": data_dir / 'agent_input' / 'sample-data.csv'
}

# True means to download year 2020.
# False means to download only March 2021.
yearly_data = False

In [None]:
# Execute this cell to download the sample data to a local file called ../data/agent_input/sample-data.csv
if not yearly_data:
    !curl https://aemo.com.au/aemo/data/nem/priceanddemand/PRICE_AND_DEMAND_202103_NSW1.csv > $DATA_DIR/agent_input/sample-data.csv
else:
    # Download 12 months of year 2020 data.
    !bash ../bin/download_data.sh $DATA_DIR/price_demand_data &> /dev/null

    # Combine those 12 .csv files to ../data/agent_input/sample-data.csv
    import glob
    files = (data_dir / 'price_demand_data').glob('PRICE_AND_DEMAND*.csv')
    df = pd.concat([pd.read_csv(f) for f in files], axis=0, ignore_index=True)
    df.sort_values('SETTLEMENTDATE', inplace=True)
    df.to_csv(data_dir / 'agent_input' / 'sample-data.csv', index=False)

# Battery Environment

With sample data ready, instantiate a new gym environment for the energy storage system. This notebook defaults to
just 10 training episodes to minimize training times. You may want to explore the agents's behavior by increasing
the training episodes to a larger number (e.g., 3000, etc.)

In [None]:
env = SimpleBattery(env_config)

# More episodes means a longer training time.
episodes = 10

The next cell defines a helper function `train_eval_save()` to (train + evaluate + plot) an agent. This function will be used to evaluate three baseline agents:

1. a random agent
2. an agent that considers market price vs cost
3. an agent that considers the moving average of market price

In [None]:
def train_eval_save(
    env: Dict[str, Any],
    agent: Agent,
    episodes: int,
    np_seed: Optional[int] = None,
    bokeh_dir: Optional[Union[str, os.PathLike]] = None,
    streamlit_csv: Optional[Union[str, os.PathLike]] = None,
) -> pd.DataFrame:
    """Helper function to train, evaluate, inline plot, and save outputs.

    Args:
        env (Dict[str, Any]): configuration of the energy gym environment.
        agent (Agent): a baseline agent.
        episodes (int): Number of training episodes .
        np_seed (Optional[int], optional): Random seed (None means do not fix it). Defaults to None.
        bokeh_dir (Optional[Union[str, os.PathLike]], optional): Where to save interactive .html
            reports (None means do not generate these files). Defaults to None.
        streamlit_csv (Optional[Union[str, os.PathLike]], optional): Where to save .csv files
            that the Streamlit demo can visualize (None means do not generate such files).
            Defaults to None.

    Returns:
        pd.DataFrame: exploitation results.
    """
    if np_seed is not None:
        np.random.seed(np_seed)

    # Training
    train_results = train(env, agent, episodes)
    plot_reward(train_results.rewards_list)  # Jupyter autoplots the returned fig
    print("Average rewards across training episodes:", train_results.mean_rewards)

    # Evaluation
    df_eval = evaluate_episode(agent, env)
    plot_analysis(df_eval)  # Jupyter autoplots the returned fig

    # Generate bokeh input
    if bokeh_dir is not None:
        # close_fig=True to prevent Jupyter to auto-display the generated figure,
        # which are exactly the same as what the above calls have produced.
        ReportIO(bokeh_dir).save2(train_results.rewards_list, df_eval, close_fig=True)
    
    # Generate streamlit input
    if streamlit_csv is not None:
        df_eval.to_csv(streamlit_csv, index=False)

    return df_eval

# Train and evaluate baseline agents

## Random Agent

Train an agent who behaves randomly. This is purely for demonstration of how to use the `energy_storage_system` module,
hence do not save outputs.

**Policy evaluation and observation**: the agent action is totally random, regardless of price and cost.

In [None]:
df_eval_random = train_eval_save(env, RandomAgent(), episodes, np_seed=1)

## Market price vs cost agent

This agent behaves as follows:

- SELL: when market price is higher than cost
- BUY: when market price is lower than cost
- HOLD: others

**Policy evaluation and observation**: agent discharges (sell:1) when price is higher than cost, and charges (buy:0)

    CHARGE = 0
    DISCHARGE = 1
    HOLD = 2

In [None]:
df_eval_price_vs_cost = train_eval_save(
    env,
    PriceVsCostAgent(),
    episodes,
    np_seed=1,
    # Save the output for downstream tasks (i.e., bokeh and streamlit).
    bokeh_dir=data_dir / 'agent_output' / 'agent-price-vs-cost',
    streamlit_csv=data_dir / 'streamlit_input' / 'result_price_vs_cost_agent.csv',
)

## Market Price vs Historical price Agent

This agent behaves as follows:

- SELL: when market price is higher than past 5 days average price
- BUY: when market price is lower than past 5 days average price
- HOLD: others

**Policy evaluation and observation**: Agent will start selling when market price is increasing (high than last 5 days average), and buy when market price is dropping.

    CHARGE = 0
    DISCHARGE = 1
    HOLD = 2


In [None]:
df_eval_ma = train_eval_save(
    env,
    MovingAveragePriceAgent(),
    episodes,
    np_seed=1,
    # Save the output for downstream tasks (i.e., bokeh and streamlit).
    bokeh_dir=data_dir / 'agent_output' / 'agent-hist-price',
    streamlit_csv=data_dir / 'streamlit_input' / 'result_hist_price_agent.csv',
)

## SageMaker RL - DQN

Next is to use DQN algorithm running on SageMaker RL. Please refer to [01_battery_sim_on_sm.ipynb](01_battery_sim_on_sm.ipynb) and
[02_battery_sim_on_sm-eval.ipynb](02_battery_sim_on_sm-eval.ipynb).

# Generate interactive reports for baseline agents

The next cell uses `ipython` to generate interactive reports (i.e., `.html` files). Once the cell completes, feel free
to open & inspect the generated `.html` files. The cell uses `ipython` to recognize `src/energy_storage_system` defined
in `ipython_config.py`.

To execute the script from a terminal as `python -m xxx.yyy -o bokeh_output/abcd agent_output/xyz`, please:

1. `pip install` this repo, or
2. modify `PYTHONPATH` environment variable accordingly.

## Per-agent reports

In [None]:
# ipython takes all before --. Anything after -- belongs to the energy_storage_systme.bokeh_report CLI script.
# When using python from terminal, remove the --.
!ipython -m energy_storage_system.bokeh_report -- -o $DATA_DIR/bokeh_output/agent-price-vs-cost $DATA_DIR/agent_output/agent-price-vs-cost
!ipython -m energy_storage_system.bokeh_report -- -o $DATA_DIR/bokeh_output/agent-hist-price $DATA_DIR/agent_output/agent-hist-price

!echo "Generated Bokeh reports:"
!find $DATA_DIR/bokeh_output

## Comparing across agents

Compare how each agent maintains the energy-inventory level across time.

In [None]:
# ipython takes all before --. Anything after -- belongs to the energy_storage_systen.bokeh_energy_inventory CLI script.
# When using python from terminal, remove the --.
!ipython -m energy_storage_system.bokeh_energy_inventory -- -o $DATA_DIR/bokeh_output/energy-inventory $DATA_DIR/agent_output/

!echo "Generated Bokeh reports:"
!find $DATA_DIR/bokeh_output/energy-inventory