{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "run_control": {
     "frozen": false,
     "read_only": false
    }
   },
   "source": [
    "# Benchmarking GFLOPS with PyWren and AWS Lambda"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-02-27T10:34:14.685368",
     "start_time": "2017-02-27T10:34:14.671498"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%pylab inline\n",
    "import numpy as np\n",
    "import time\n",
    "import flops_benchmark\n",
    "import pandas as pd\n",
    "import cPickle as pickle\n",
    "import seaborn as sns\n",
    "sns.set_style('whitegrid')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step by Step instructions\n",
    "\n",
    "\n",
    "### Setup Logging (optional)\n",
    "Only activate the below lines if you want to see all debug messages from PyWren. _Note: The output will be rather chatty and lengthy._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.INFO)\n",
    "%env PYWREN_LOGLEVEL=INFO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are going to benchmark the simple function below, which simply generates two matrices and computes their matrix (dot) product. The matrices are of size `MAT_N` and we will compute the product `loopcount` times. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-02-27T09:26:13.230345",
     "start_time": "2017-02-27T09:26:13.219630"
    }
   },
   "outputs": [],
   "source": [
    "def compute_flops(loopcount, MAT_N):\n",
    "    \n",
    "    A = np.arange(MAT_N**2, dtype=np.float64).reshape(MAT_N, MAT_N)\n",
    "    B = np.arange(MAT_N**2, dtype=np.float64).reshape(MAT_N, MAT_N)\n",
    "\n",
    "    t1 = time.time()\n",
    "    for i in range(loopcount):\n",
    "        c = np.sum(np.dot(A, B))\n",
    "\n",
    "    FLOPS = 2 *  MAT_N**3 * loopcount\n",
    "    t2 = time.time()\n",
    "    return FLOPS / (t2-t1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All of the actual benchmark code is in a stand-alone python file, which you can call as follows. It places the output in `small.pickle`.\n",
    "If you are interested in the details you can inspect the [flops_benchmark.py](/edit/Lab-1-Hello-World/benchmark_flops/flops_benchmark.py) file. Here is the relevant code snippet that invokes the distirbuten PyWren functions:\n",
    "\n",
    "```python\n",
    "iters = np.arange(N)\n",
    "\n",
    "def f(x):\n",
    "    return {'flops': compute_flops(loopcount, matn)}\n",
    "\n",
    "pwex = pywren.lambda_executor()\n",
    "futures = pwex.map(f, iters)\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-02-27T10:38:59.263090",
     "start_time": "2017-02-27T10:38:30.640817"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!python flops_benchmark.py --workers=10 --loopcount=10 --matn=1024 --outfile=\"small.pickle\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now plot a histogram of the results: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-02-27T09:26:39.652451",
     "start_time": "2017-02-27T09:26:39.498018"
    }
   },
   "outputs": [],
   "source": [
    "exp_results = pickle.load(open(\"small.pickle\", 'r'))\n",
    "results_df = flops_benchmark.results_to_dataframe(exp_results)\n",
    "sns.distplot(results_df.intra_func_flops/1e9, bins=np.arange(10, 30), kde=False, axlabel='FLOPS measured intra function')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-02-27T09:27:24.537559",
     "start_time": "2017-02-27T09:27:24.528708"
    }
   },
   "source": [
    "# Scaling up\n",
    "Now we will run a larger number of functions simultaneously. Note that this is dependent on the soft limit of parallel AWS Lambda function for your AWS Account. For this workshop we suggest to start off with 100 parallel AWS Lambda function (`--workers=100`). The more functions you run in parallel the larger your potential performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-02-27T10:01:42.382113",
     "start_time": "2017-02-27T09:57:50.492364"
    }
   },
   "outputs": [],
   "source": [
    "# Parallel Function limit of 100 simultaneous invocations\n",
    "!python flops_benchmark.py --workers=100 --loopcount=10 --matn=4096 --outfile=\"big.pickle\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's plot the intra function FLOPS first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-02-27T10:07:16.204258",
     "start_time": "2017-02-27T10:07:15.477252"
    }
   },
   "outputs": [],
   "source": [
    "big_exp_results = pickle.load(open(\"big.pickle\", 'r'))\n",
    "big_results_df = flops_benchmark.results_to_dataframe(big_exp_results)\n",
    "sns.distplot(results_df.intra_func_flops/1e9, bins=np.arange(10, 36), kde=False, axlabel='FLOPS measured intra function')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's aggregate this to understand our total GFLOPS across all the parallel executions and plot it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-02-27T10:20:59.609352",
     "start_time": "2017-02-27T10:20:59.602174"
    }
   },
   "outputs": [],
   "source": [
    "est_total_flops = big_results_df['est_flops']\n",
    "total_jobs = len(big_results_df)\n",
    "JOB_GFLOPS = est_total_flops /1e9 /total_jobs \n",
    "# grid jobs running time \n",
    "time_offset = np.min(big_results_df.host_submit_time)\n",
    "max_time = np.max(big_results_df.download_output_timestamp ) - time_offset\n",
    "runtime_bins = np.linspace(0, max_time, max_time, endpoint=False)\n",
    "\n",
    "\n",
    "runtime_flops_hist = np.zeros((len(big_results_df), len(runtime_bins)))\n",
    "for i in range(len(big_results_df)):\n",
    "    row = big_results_df.iloc[i]\n",
    "    s = (row.start_time + row.setup_time) - time_offset\n",
    "    e = row.end_time - time_offset\n",
    "    a, b = np.searchsorted(runtime_bins, [s, e])\n",
    "    if b-a > 0:\n",
    "        runtime_flops_hist[i, a:b] = row.est_flops / float(b-a)\n",
    "        \n",
    "results_by_endtime = big_results_df.sort_values('download_output_timestamp')\n",
    "results_by_endtime['job_endtime_zeroed'] = big_results_df.download_output_timestamp - time_offset\n",
    "results_by_endtime['flops_done'] = results_by_endtime.est_flops.cumsum()\n",
    "results_by_endtime['rolling_flops_rate'] = results_by_endtime.flops_done/results_by_endtime.job_endtime_zeroed\n",
    "\n",
    "    \n",
    "fig = pylab.figure(figsize=(8, 6))\n",
    "ax = fig.add_subplot(1, 1, 1)\n",
    "ax.plot(runtime_flops_hist.sum(axis=0)/1e9, label='peak GFLOPS')\n",
    "ax.plot(results_by_endtime.job_endtime_zeroed, \n",
    "        results_by_endtime.rolling_flops_rate/1e9, label='effective GFLOPS')\n",
    "ax.set_xlabel('time (sec)')\n",
    "ax.set_ylabel(\"GFLOPS\")\n",
    "pylab.legend()\n",
    "ax.grid(False)\n",
    "sns.despine()\n",
    "fig.tight_layout()\n",
    "fig.savefig(\"flops_benchmark.gflops.png\")\n",
    "fig.savefig(\"flops_benchmark.gflops.pdf\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This plot computes two things:\n",
    "* **Peak GLFOPS**: Across all cores, what is the total simultaneous FLOPS that are being computed? \n",
    "* **Effective GFLOPS**: If the job ended at this point in time, what would our aggregate effective GFLOPS have been, including time to launch the jobs and download the results\n",
    "\n",
    "We see \"peak GFLOPS\" peaks in the middle of the job, when all 100 lambdas are running at once. \"Effective GFLOPS\" starts climbing as results quickly return, but stragglers mean that our total effective GFLOPS drops slightly. Still not bad for pure python! "
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  },
  "toc": {
   "nav_menu": {
    "height": "66px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}