## Introduction
This notebook demonstrates how to test if an action masking model is working as expected. When masking is working properly, any actions that are assigned mask=0 will not be sampled by the RL trainer. Here we consider the action masking model developed for the portfolio optimization problem. This action vector here has three componenets. Each componenet is sampled from a discrete action space with 11 possible values. 

### Install packages


In [None]:
!pip install ray==1.6.0
!pip install gym==0.15.3
!pip install dm_tree
!pip install lz4

### Import libraries

In [2]:
import ray
import ray.rllib.agents.ppo as ppo
from ray.tune.registry import register_env
from trading import mytradingenv
from mask_model import register_actor_mask_model
import numpy as np


### Register the action masking model 

In [3]:
register_actor_mask_model()
ray.shutdown()
ray.init(ignore_reinit_error=True)

env_config={}
register_env("customtradingmodel", lambda env_config:mytradingenv(env_config))





### Specify the environment config to include the action masking model

In [4]:

TestEnvConfig = {
 "log_level":"WARN",

 "model": {
 
 "custom_model": "trading_mask" # Define the custom masking model in the config 
 
 
 }
 }




### Initialize a PPO trainer agent and the portfolio trading environment

In [5]:
agent1 = ppo.PPOTrainer(config=TestEnvConfig,env="customtradingmodel")
env = agent1.env_creator('customtradingmodel')
state=env.reset()
print(state["action_mask"])

2022-08-31 19:57:33,900	INFO logger.py:180 -- pip install 'ray[tune]' to see TensorBoard files.
2022-08-31 19:57:33,903	INFO trainer.py:714 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
2022-08-31 19:57:33,904	INFO ppo.py:159 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
2022-08-31 19:57:33,905	INFO trainer.py:728 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.


[array([0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.], dtype=float32), array([0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.], dtype=float32), array([0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.], dtype=float32)]


### Update the masking values 

Here we mask all actions except action[0]=8, action[1]=5, and action[2]= 1 or 2.

In [6]:
state["action_mask"]=[np.zeros([11],dtype=np.float32) for _ in range(3)]
state['action_mask'][0][8]=1
state['action_mask'][1][5]=1
state['action_mask'][2][1:3]=[1,1]

### Sample a new action after updating the masks

In [8]:
agent1.compute_single_action(state)


(8, 5, 1)

We see that only the unmasked actions are sampled by the agent, verifying that action masking is working as expected