{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Contextual Bandits with Amazon SageMaker RL\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "We demonstrate how you can manage your own contextual multi-armed bandit workflow on SageMaker using the built-in [Vowpal Wabbit (VW)](https://github.com/VowpalWabbit/vowpal_wabbit) container to train and deploy contextual bandit models. We show how to train these models that interact with a live environment (using a simulated client application) and continuously update the model with efficient exploration.\n",
    "\n",
    "### Why Contextual Bandits?\n",
    "\n",
    "Wherever we look to personalize content for a user (content layout, ads, search, product recommendations, etc.), contextual bandits come in handy. Traditional personalization methods collect a training dataset, build a model and deploy it for generating recommendations. However, the training algorithm does not inform us on how to collect this dataset, especially in a production system where generating poor recommendations lead to loss of revenue. Contextual bandit algorithms help us collect this data in a strategic manner by trading off between exploiting known information and exploring recommendations which may yield higher benefits. The collected data is used to update the personalization model in an online manner. Therefore, contextual bandits help us train a personalization model while minimizing the impact of poor recommendations.\n",
    "\n",
    "### What does this notebook contain?\n",
    "\n",
    "To implement the exploration-exploitation strategy, we need an iterative training and deployment system that: (1) recommends an action using the contextual bandit model based on user context, (2) captures the implicit feedback over time and (3) continuously trains the model with incremental interaction data. In this notebook, we show how to setup the infrastructure needed for such an iterative learning system. While the example demonstrates a bandits application, these continual learning systems are useful more generally in dynamic scenarios where models need to be continually updated to capture the recent trends in the data (e.g. tracking fraud behaviors based on detection mechanisms or tracking user interests over time). \n",
    "\n",
    "In a typical supervised learning setup, the model is trained with a SageMaker training job and it is hosted behind a SageMaker hosting endpoint. The client application calls the endpoint for inference and receives a response. In bandits, the client application also sends the reward (a score assigned to each recommendation generated by the model) back for subsequent model training. These rewards will be part of the dataset for the subsequent model training. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p align=\"center\">\n",
    "  <img src=\"workflow.png\">\n",
    "</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The contextual bandit training workflow is controlled by an experiment manager provided with this example. The client application (say a recommender system application) pings the SageMaker hosting endpoint that is serving the bandits model. The application sends the state (user features) as input and receives an action (recommendation) as a response. The client application sends the recommended action to the user and stores the received reward in S3. The SageMaker hosted endpoint also stores inference data (state and action) in S3. The experiment manager joins the inference data with rewards as they become available. The joined data is used to update the model with a SageMaker training job. The updated model is evaluated offline and deployed to SageMaker hosting endpoint if the model evaluation score improves upon prior models. \n",
    "\n",
    "Below is an overview of the subsequent cells in the notebook: \n",
    "* Configuration: this includes details related to SageMaker and other AWS resources needed for the bandits application. \n",
    "* IAM role setup: this creates appropriate execution role and shows how to add more permissions to the role, needed for specific AWS resources.\n",
    "* Client application (Environment): this shows the simulated client application.\n",
    "* Step-by-step bandits model development: \n",
    " 1. Model Initialization (random or warm-start) \n",
    " 2. Deploy the First Model \n",
    " 3. Initialize the Client Application \n",
    " 4. Reward Ingestion \n",
    " 5. Model Re-training and Re-deployment \n",
    "* Bandits model deployment with the end-to-end loop. \n",
    "* Visualization \n",
    "* Cleanup "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Local Mode\n",
    "\n",
    "To facilitate experimentation, we provide a `local_mode` that runs the contextual bandit example using the SageMaker Notebook instance itself instead of SageMaker training and hosting instances. The workflow remains the same in `local_mode`, but runs much faster for small datasets. Hence, it is a useful tool for experimentation and debugging. However, it will not scale to production use cases with high throughput and large datasets. \n",
    "\n",
    "In `local_mode`, the training, evaluation and hosting is done with the SageMaker VW docker container. The join is not handled by SageMaker, and is done inside the client application. The rest of the textual explanation assumes that the notebook is run in SageMaker mode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import yaml\n",
    "import sys\n",
    "import numpy as np\n",
    "import time\n",
    "import sagemaker\n",
    "\n",
    "sys.path.append(\"common\")\n",
    "sys.path.append(\"common/sagemaker_rl\")\n",
    "from misc import get_execution_role\n",
    "from markdown_helper import *\n",
    "from IPython.display import Markdown"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configuration\n",
    "\n",
    "The configuration for the bandits application can be specified in a `config.yaml` file as can be seen below. It configures the AWS resources needed. The DynamoDB tables are used to store metadata related to experiments, models and data joins. The `private_resource` specifices the SageMaker instance types and counts used for training, evaluation and hosting. The SageMaker container image is used for the bandits application. This config file also contains algorithm and SageMaker-specific setups.  Note that all the data generated and used for the bandits application will be stored in `S3://SageMaker-{REGION}-{AWS_ACCOUNT_ID}/{experiment_id}/`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pygmentize 'config.yaml'\n",
    "config_file = \"config.yaml\"\n",
    "with open(config_file, \"r\") as yaml_file:\n",
    "    config = yaml.load(yaml_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Please make sure that the `num_arms` parameter in the config is equal to the number of actions in the client application (which is defined in the cell below)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### IAM role setup\n",
    "Either get the execution role when running from a SageMaker notebook `role = sagemaker.get_execution_role()` or, when running from local machine, use `utils` method `role = get_execution_role('role_name')` to create an execution role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    sagemaker_role = sagemaker.get_execution_role()\n",
    "except:\n",
    "    sagemaker_role = get_execution_role(\"sagemaker\")\n",
    "\n",
    "print(\"Using Sagemaker IAM role arn: \\n{}\".format(sagemaker_role))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Additional permissions for the IAM role\n",
    "IAM role requires additional permissions for [AWS CloudFormation](https://aws.amazon.com/cloudformation/), [Amazon DynamoDB](https://aws.amazon.com/dynamodb/), [Amazon Kinesis Data Firehose](https://aws.amazon.com/kinesis/data-firehose/) and [Amazon Athena](https://aws.amazon.com/athena/). Make sure the SageMaker role you are using has the permissions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display(Markdown(generate_help_for_experiment_manager_permissions(sagemaker_role)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Client application (Environment)\n",
    "The client application simulates a live environment that uses the SageMaker bandits model to serve recommendations to users. The logic of reward generation resides in the client application. We simulate the online learning loop with feedback using the [Statlog (Shuttle) Data Set](https://archive.ics.uci.edu/ml/datasets/Statlog+(Shuttle)). The data consists of 7 classes, and if the agent selects the right class, then reward is 1. Otherwise, the agent obtains a reward 0.\n",
    "\n",
    "The workflow of the client application is as follows:\n",
    "- The client application picks a context at random, which is sent to the SageMaker endpoint for retrieving an action.\n",
    "- SageMaker endpoint returns an action, associated probability and `event_id`.\n",
    "- Since this simulator was generated from the Statlog dataset, we know the true class for that context. \n",
    "- The application reports the reward to the experiment manager using S3, along with the corresponding `event_id`.\n",
    "\n",
    "`event_id` is a unique identifier for each interaction. It is used to join inference data `<state, action, action probability>` with the rewards. \n",
    "\n",
    "In a later cell of this notebook, where there exists a hosted endpoint, we illustrate how the client application interacts with the endpoint and gets the recommended action."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sys.path.append(\"sim_app\")\n",
    "from statlog_sim_app import StatlogSimApp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment the cell below to see how simulated client application works\n",
    "# !pygmentize sim_app/statlog_sim_app.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step-by-step bandits model development\n",
    "\n",
    "`ExperimentManager` is the top level class for all the Bandits/RL and continual learning workflows. Similar to the estimators in the [Sagemaker Python SDK](https://github.com/aws/sagemaker-python-sdk), `ExperimentManager` contains methods for training, deployment and evaluation. It keeps track of the job status and reflects current progress in the workflow.\n",
    "\n",
    "Start the application using the `ExperimentManager` class "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from orchestrator.workflow.manager.experiment_manager import ExperimentManager"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " The initialization below will set up an AWS CloudFormation stack of additional resources. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# model_id length cannot exceed 63 characters under SM mode.\n",
    "# evaluation job name will include timestamp in addition to train job name.\n",
    "# So, make experimend_id as short as possible\n",
    "experiment_name = \"bandits-exp-1\"\n",
    "bandits_experiment = ExperimentManager(config, experiment_id=experiment_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. Model Initialization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To start a new experiment, we need to initialize the first model. In the case where historical data is available and is in the format of `<state, action, action probability, reward>`, we can warm start by learning the policy offline. Otherwise, we can initiate a random policy.\n",
    "\n",
    "**Warm start the policy**\n",
    "\n",
    "We showcase the warm start by generating a batch of randomly selected samples with size `batch_size`. Then we split it into a training set and an evaluation set using the parameter `ratio`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sim_app_utils import *\n",
    "\n",
    "batch_size = 100\n",
    "warm_start_data_buffer = prepare_statlog_warm_start_data(\n",
    "    data_file=\"sim_app/shuttle.trn\", batch_size=batch_size\n",
    ")\n",
    "\n",
    "# upload to s3\n",
    "bandits_experiment.ingest_joined_data(warm_start_data_buffer, ratio=0.8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment._jsonify()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment.initialize_first_model(\n",
    "    input_data_s3_prefix=bandits_experiment.last_joined_job_train_data\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Evaluate current model against historical model**\n",
    "\n",
    "After every training cycle, we evaluate if the newly trained model is better than the one currently deployed. Using the evaluation dataset, we evaluate how the new model would perform compared to the model that is currently deployed. SageMaker RL supports offline evaluation by performing counterfactual analysis (CFA). By default, we apply [doubly robust (DR) estimation](https://arxiv.org/pdf/1103.4601.pdf) method. The bandit policy tries to minimize the cost (1-reward) value in this case, so a smaller evaluation score indicates better policy performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# evaluate the current model\n",
    "bandits_experiment.evaluate_model(\n",
    "    input_data_s3_prefix=bandits_experiment.last_joined_job_eval_data,\n",
    "    evaluate_model_id=bandits_experiment.last_trained_model_id,\n",
    ")\n",
    "\n",
    "eval_score_last_trained_model = bandits_experiment.get_eval_score(\n",
    "    evaluate_model_id=bandits_experiment.last_trained_model_id,\n",
    "    eval_data_path=bandits_experiment.last_joined_job_eval_data,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get baseline performance from the historical (warm start) data\n",
    "download_historical_data_from_s3(data_s3_prefix=bandits_experiment.last_joined_job_eval_data)\n",
    "baseline_score = evaluate_historical_data(data_file=\"statlog_warm_start.data\")\n",
    "baseline_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the model_id of the last model trained.\n",
    "bandits_experiment.last_trained_model_id"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. Deploy the First Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once training and evaluation is done, we can deploy the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment.deploy_model(model_id=bandits_experiment.last_trained_model_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can check the experiment state at any point by executing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment._jsonify()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model just trained appears in both `last_trained_model_id` and `last_hosted_model_id`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Initialize the Client Application"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that the last trained model is hosted, client application can send out the state, hit the endpoint, and receive the recommended action. There are 7 classes in the statlog data, corresponding to 7 actions respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor = bandits_experiment.predictor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sim_app = StatlogSimApp(predictor=predictor)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure that `num_arms` specified in `config.yaml` is equal to the total unique actions in the simulation application."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert (\n",
    "    sim_app.num_actions == bandits_experiment.config[\"algor\"][\"algorithms_parameters\"][\"num_arms\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_id, user_context = sim_app.choose_random_user()\n",
    "action, event_id, model_id, action_prob, sample_prob = predictor.get_action(obs=user_context)\n",
    "\n",
    "# Check prediction response by uncommenting the lines below\n",
    "print(\n",
    "    \"Selected action: {}, event ID: {}, model ID: {}, probability: {}\".format(\n",
    "        action, event_id, model_id, action_prob\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. Reward Ingestion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Client application generates a reward after receiving the recommended action and stores the tuple `<eventID, reward>` in S3. In this case, reward is 1 if predicted action is the true class, and 0 otherwise. SageMaker hosting endpoint saves all the inferences `<eventID, state, action, action probability>` to S3 using [Kinesis Firehose](https://aws.amazon.com/kinesis/data-firehose/). The experiment manager joins the reward with state, action and action probability using [Amazon Athena](https://aws.amazon.com/athena/). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "local_mode = bandits_experiment.local_mode\n",
    "batch_size = 500  # collect 500 data instances\n",
    "print(\"Collecting batch of experience data...\")\n",
    "\n",
    "# Generate experiences and log them\n",
    "for i in range(batch_size):\n",
    "    user_id, user_context = sim_app.choose_random_user()\n",
    "    action, event_id, model_id, action_prob, sample_prob = predictor.get_action(\n",
    "        obs=user_context.tolist()\n",
    "    )\n",
    "    reward = sim_app.get_reward(\n",
    "        user_id, action, event_id, model_id, action_prob, sample_prob, local_mode\n",
    "    )\n",
    "\n",
    "# Join (observation, action) with rewards (can be delayed) and upload the data to S3\n",
    "if local_mode:\n",
    "    bandits_experiment.ingest_joined_data(sim_app.joined_data_buffer)\n",
    "else:\n",
    "    print(\"Waiting for firehose to flush data to s3...\")\n",
    "    time.sleep(60)  # Wait for firehose to flush data to S3\n",
    "    rewards_s3_prefix = bandits_experiment.ingest_rewards(sim_app.rewards_buffer)\n",
    "    bandits_experiment.join(rewards_s3_prefix)\n",
    "\n",
    "sim_app.clear_buffer()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment.last_joined_job_train_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the workflow to see if join job has completed successfully\n",
    "bandits_experiment._jsonify()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. Model Re-training and Re-deployment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can train a new model with newly collected experiences, and host the resulting model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment.train_next_model(\n",
    "    input_data_s3_prefix=bandits_experiment.last_joined_job_train_data\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment.last_trained_model_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# deployment takes ~10 min if `local_mode` is False\n",
    "bandits_experiment.deploy_model(model_id=bandits_experiment.last_trained_model_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment.last_hosted_model_id"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bandits model deployment with the end-to-end loop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above cells explained the individual steps in the training workflow. To train a model to convergence, we will continually train the model based on data collected with client application interactions. We demonstrate the continual training loop in a single cell below.\n",
    "\n",
    "We include the evaluation step at each step before deployment to compare the model just trained (`last_trained_model_id`) against the model that is currently hosted (`last_hosted_model_id`). If you want the loops to finish faster, you can set `do_evaluation=False` in the cell below.\n",
    "\n",
    "Details of each joining and training job can be tracked in `join_db` and `model_db` respectively. `model_db` also stores the evaluation scores. When you have multiple experiments, you can check their status in `experiment_db`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "do_evaluation = True\n",
    "\n",
    "# You can also monitor your loop progress on CloudWatch Dashboard\n",
    "display(Markdown(bandits_experiment.get_cloudwatch_dashboard_details()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "start_time = time.time()\n",
    "total_loops = 15  # Increase for higher accuracy\n",
    "batch_size = 500  # Model will be trained after every 500 data instances\n",
    "rewards_list = []\n",
    "\n",
    "local_mode = bandits_experiment.local_mode\n",
    "for loop_no in range(total_loops):\n",
    "    print(\n",
    "        f\"\"\"\n",
    "    #################\n",
    "    #################\n",
    "         Loop {loop_no+1}\n",
    "    #################\n",
    "    #################\n",
    "    \"\"\"\n",
    "    )\n",
    "\n",
    "    # Generate experiences and log them\n",
    "    for i in range(batch_size):\n",
    "        user_id, user_context = sim_app.choose_random_user()\n",
    "        action, event_id, model_id, action_prob, sample_prob = predictor.get_action(\n",
    "            obs=user_context.tolist()\n",
    "        )\n",
    "        reward = sim_app.get_reward(\n",
    "            user_id, action, event_id, model_id, action_prob, sample_prob, local_mode\n",
    "        )\n",
    "        rewards_list.append(reward)\n",
    "\n",
    "    # publish rewards sum for this batch to CloudWatch for monitoring\n",
    "    bandits_experiment.cw_logger.publish_rewards_for_simulation(\n",
    "        bandits_experiment.experiment_id, sum(rewards_list[-batch_size:]) / batch_size\n",
    "    )\n",
    "\n",
    "    # Local/Athena join\n",
    "    if local_mode:\n",
    "        bandits_experiment.ingest_joined_data(sim_app.joined_data_buffer, ratio=0.85)\n",
    "    else:\n",
    "        print(\"Waiting for firehose to flush data to s3...\")\n",
    "        time.sleep(60)\n",
    "        rewards_s3_prefix = bandits_experiment.ingest_rewards(sim_app.rewards_buffer)\n",
    "        bandits_experiment.join(rewards_s3_prefix, ratio=0.85)\n",
    "\n",
    "    # Train\n",
    "    bandits_experiment.train_next_model(\n",
    "        input_data_s3_prefix=bandits_experiment.last_joined_job_train_data\n",
    "    )\n",
    "\n",
    "    if do_evaluation:\n",
    "        # Evaluate\n",
    "        bandits_experiment.evaluate_model(\n",
    "            input_data_s3_prefix=bandits_experiment.last_joined_job_eval_data,\n",
    "            evaluate_model_id=bandits_experiment.last_trained_model_id,\n",
    "        )\n",
    "        eval_score_last_trained_model = bandits_experiment.get_eval_score(\n",
    "            evaluate_model_id=bandits_experiment.last_trained_model_id,\n",
    "            eval_data_path=bandits_experiment.last_joined_job_eval_data,\n",
    "        )\n",
    "\n",
    "        bandits_experiment.evaluate_model(\n",
    "            input_data_s3_prefix=bandits_experiment.last_joined_job_eval_data,\n",
    "            evaluate_model_id=bandits_experiment.last_hosted_model_id,\n",
    "        )\n",
    "\n",
    "        eval_score_last_hosted_model = bandits_experiment.get_eval_score(\n",
    "            evaluate_model_id=bandits_experiment.last_hosted_model_id,\n",
    "            eval_data_path=bandits_experiment.last_joined_job_eval_data,\n",
    "        )\n",
    "\n",
    "        # Deploy\n",
    "        if eval_score_last_trained_model <= eval_score_last_hosted_model:\n",
    "            bandits_experiment.deploy_model(model_id=bandits_experiment.last_trained_model_id)\n",
    "        else:\n",
    "            print(\"Not deploying model in loop {}\".format(loop_no))\n",
    "    else:\n",
    "        bandits_experiment.deploy_model(model_id=bandits_experiment.last_trained_model_id)\n",
    "\n",
    "    sim_app.clear_buffer()\n",
    "\n",
    "print(f\"Total time taken to complete {total_loops} loops: {time.time() - start_time}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can visualize the model performance along the training loop by plotting the rolling mean reward across client interactions. Here rolling mean reward is calculated on the last `rolling_window` number of data instances, where each data instance corresponds to a single client interaction. \n",
    "\n",
    "> Note: The plot below cannot be generated if the notebook has been restarted after the execution of the cell above. \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "import matplotlib.pyplot as plt\n",
    "from pylab import rcParams\n",
    "import pandas as pd\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "\n",
    "def get_mean_reward(reward_lst, batch_size=batch_size):\n",
    "    mean_rew = list()\n",
    "    for r in range(len(reward_lst)):\n",
    "        mean_rew.append(sum(reward_lst[: r + 1]) * 1.0 / ((r + 1) * batch_size))\n",
    "    return mean_rew\n",
    "\n",
    "\n",
    "rcParams[\"figure.figsize\"] = 15, 10\n",
    "lwd = 5\n",
    "cmap = plt.get_cmap(\"tab20\")\n",
    "colors = plt.cm.tab20(np.linspace(0, 1, 20))\n",
    "\n",
    "rolling_window = 100\n",
    "rewards_df = pd.DataFrame(rewards_list, columns=[\"bandit\"]).rolling(rolling_window).mean()\n",
    "rewards_df[\"oracle\"] = sum(sim_app.opt_rewards) / len(sim_app.opt_rewards)\n",
    "\n",
    "rewards_df.plot(y=[\"bandit\", \"oracle\"], linewidth=lwd)\n",
    "plt.legend(loc=4, prop={\"size\": 20})\n",
    "plt.tick_params(axis=\"both\", which=\"major\", labelsize=15)\n",
    "plt.xlabel(\"Data instances (models were updated every %s data instances)\" % batch_size, size=20)\n",
    "plt.ylabel(\"Rolling Mean Reward\", size=30)\n",
    "plt.grid()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Get mean rewards"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rewards_df.bandit.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Clean up"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have three DynamoDB tables (experiment, join, model) from the bandits application above (e.g. `experiment_id='bandits-exp-1'`). To better maintain them, we should remove the related records if the experiment has finished. Besides, having an endpoint running will incur costs. Therefore, we delete these components as part of the clean up process.\n",
    "\n",
    "> Only execute the clean up cells below when you've finished the current experiment and want to deprecate everything associated with it. After the cleanup, the Cloudwatch metrics will not be populated anymore."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment.clean_resource(experiment_id=bandits_experiment.experiment_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bandits_experiment.clean_table_records(experiment_id=bandits_experiment.experiment_id)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/reinforcement_learning|bandits_statlog_vw_customEnv|bandits_statlog_vw_customEnv.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "550.4px"
   },
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}