{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cost Calculator\n",
    "\n",
    "This will walk you through calculating the number of metrics in your dataset and then using that value to estimate your costs for Lookout for Metrics.\n",
    "\n",
    "**Note** This is reported as an estimate because it assumes that you may have new entries in terms of values in your dimensions that are not known in your historical dataset, they will of course have an impact on your total costs. Use this as a guide.**END_NOTE**\n",
    "\n",
    "This notebook can be executed in your environment by deploying the `getting_started` resources, then browsing back to this folder inside a SageMaker Notebook Instance.\n",
    "\n",
    "Next upload your historical data into this folder, we will then explore the pricing of a CSV file named `historical.csv` that has been included here. \n",
    "\n",
    "Follow along with the notebook as is first, then once you understand the process, update the filename to match your uploaded content and follow allong to completed the pricing exercise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "CSV_FILENAME = \"historical.csv\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After updating the filename above to reflect your content, run the cell below to see a sample of your data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>platform</th>\n",
       "      <th>marketplace</th>\n",
       "      <th>timestamp</th>\n",
       "      <th>views</th>\n",
       "      <th>revenue</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13232</th>\n",
       "      <td>pc_web</td>\n",
       "      <td>de</td>\n",
       "      <td>2021-01-27 06:00:00</td>\n",
       "      <td>269</td>\n",
       "      <td>80.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70769</th>\n",
       "      <td>mobile_app</td>\n",
       "      <td>jp</td>\n",
       "      <td>2021-05-21 09:00:00</td>\n",
       "      <td>195</td>\n",
       "      <td>58.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20118</th>\n",
       "      <td>pc_web</td>\n",
       "      <td>us</td>\n",
       "      <td>2021-02-09 22:00:00</td>\n",
       "      <td>498</td>\n",
       "      <td>149.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87217</th>\n",
       "      <td>pc_web</td>\n",
       "      <td>es</td>\n",
       "      <td>2021-06-23 01:00:00</td>\n",
       "      <td>102</td>\n",
       "      <td>30.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1015</th>\n",
       "      <td>mobile_web</td>\n",
       "      <td>us</td>\n",
       "      <td>2021-01-03 00:00:00</td>\n",
       "      <td>440</td>\n",
       "      <td>132.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         platform marketplace            timestamp  views  revenue\n",
       "13232      pc_web          de  2021-01-27 06:00:00    269     80.7\n",
       "70769  mobile_app          jp  2021-05-21 09:00:00    195     58.5\n",
       "20118      pc_web          us  2021-02-09 22:00:00    498    149.4\n",
       "87217      pc_web          es  2021-06-23 01:00:00    102     30.6\n",
       "1015   mobile_web          us  2021-01-03 00:00:00    440    132.0"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv(CSV_FILENAME)\n",
    "data.sample(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above cell, we see that `timestamp` was our timestamp field so now we can read the file again with some more specific instructions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>platform</th>\n",
       "      <th>marketplace</th>\n",
       "      <th>views</th>\n",
       "      <th>revenue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>timestamp</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2021-07-17 09:00:00</th>\n",
       "      <td>mobile_app</td>\n",
       "      <td>jp</td>\n",
       "      <td>185</td>\n",
       "      <td>55.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-04-01 13:00:00</th>\n",
       "      <td>mobile_app</td>\n",
       "      <td>de</td>\n",
       "      <td>783</td>\n",
       "      <td>234.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-08-26 08:00:00</th>\n",
       "      <td>mobile_web</td>\n",
       "      <td>de</td>\n",
       "      <td>180</td>\n",
       "      <td>54.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-04-11 08:00:00</th>\n",
       "      <td>pc_web</td>\n",
       "      <td>fr</td>\n",
       "      <td>211</td>\n",
       "      <td>63.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2021-02-18 21:00:00</th>\n",
       "      <td>mobile_app</td>\n",
       "      <td>es</td>\n",
       "      <td>446</td>\n",
       "      <td>133.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       platform marketplace  views  revenue\n",
       "timestamp                                                  \n",
       "2021-07-17 09:00:00  mobile_app          jp    185     55.5\n",
       "2021-04-01 13:00:00  mobile_app          de    783    234.9\n",
       "2021-08-26 08:00:00  mobile_web          de    180     54.0\n",
       "2021-04-11 08:00:00      pc_web          fr    211     63.3\n",
       "2021-02-18 21:00:00  mobile_app          es    446    133.8"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv(CSV_FILENAME,parse_dates=True, index_col='timestamp',)\n",
    "data.sample(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here in this dataset we now see there are a few colums:\n",
    "\n",
    "Numerical:\n",
    "* Views\n",
    "* Revenue\n",
    "\n",
    "Categorical:\n",
    "* platform\n",
    "* marketplace\n",
    "\n",
    "In the parlance of Lookout for Metrics, this means our Domains are `platform` and `marketplace` and our Measures are `views` and `revenue`. The values within the domains are responsible for a large portion of the number of distinct metrics and the number of columns of measures account for the rest. The basic calculator then for the total number of metrics is:\n",
    "\n",
    "```\n",
    "(distinct_values(domain1) * distinct_values(domain2)) * number_of_measure_columns\n",
    "```\n",
    "\n",
    "In the cell below we first state the number of measure columns, followed by the list of domains that we wish to monitor in our dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "number_of_measure_columns = 2\n",
    "list_of_domains = [\"platform\", \"marketplace\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cell below is a function that will take in our data, and the list of domains, and the number of columns and will return the total number of measures, you can simply run it to see the value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_unique_metrics(input_data, domain_list, number_of_measures):\n",
    "    \"\"\"\n",
    "    \"\"\"\n",
    "    # Assign to 0 first:\n",
    "    metrics = 0\n",
    "    for item in domain_list:\n",
    "        unique_values = input_data.eval(item).nunique()\n",
    "        # Check for the first entry\n",
    "        if metrics <= 0:\n",
    "            metrics += unique_values\n",
    "        # Sort the rest\n",
    "        else:\n",
    "            metrics = metrics * unique_values\n",
    "    # Now combine the number of measures:\n",
    "    metrics = metrics * number_of_measures\n",
    "    return metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "42"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "number_of_metrics = generate_unique_metrics(input_data=data, domain_list=list_of_domains, number_of_measures=number_of_measure_columns)\n",
    "number_of_metrics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we see that there are 42 unique metrics in our data, the next step is to determining the pricing, you can learn more about pricing here: https://aws.amazon.com/lookout-for-metrics/pricing/ . The cell below contains a function that will take in the total count then returns the USD price."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_pricing(number_of_metrics):\n",
    "    assert number_of_metrics>=0\n",
    "    price_tiers = [\n",
    "        ( 50000, 0.05 ),\n",
    "        ( 20000, 0.10 ),\n",
    "        ( 5000, 0.25 ),\n",
    "        ( 1000, 0.50 ),\n",
    "        ( 0, 0.75 ),\n",
    "    ]\n",
    "    price = 0\n",
    "    n = number_of_metrics\n",
    "    for bottom_number_of_metrics, cost_per_metric in price_tiers:\n",
    "        if n > bottom_number_of_metrics:\n",
    "            cost_for_this_tier = (n-bottom_number_of_metrics) * cost_per_metric\n",
    "            price += cost_for_this_tier\n",
    "            n = bottom_number_of_metrics\n",
    "            #print (\"Cost for %d ~ : %.2f\" % (bottom_number_of_metrics,cost_for_this_tier) )\n",
    "    print(\"The total cost monthly for this workload of: \" + str(number_of_metrics) +\" metrics is: $\" + str(format(price, '.2f')))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The total cost monthly for this workload of: 42 metrics is: $31.50\n"
     ]
    }
   ],
   "source": [
    "generate_pricing(number_of_metrics)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}