{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cost Calculator\n",
"\n",
"This will walk you through calculating the number of metrics in your dataset and then using that value to estimate your costs for Lookout for Metrics.\n",
"\n",
"**Note** This is reported as an estimate because it assumes that you may have new entries in terms of values in your dimensions that are not known in your historical dataset, they will of course have an impact on your total costs. Use this as a guide.**END_NOTE**\n",
"\n",
"This notebook can be executed in your environment by deploying the `getting_started` resources, then browsing back to this folder inside a SageMaker Notebook Instance.\n",
"\n",
"Next upload your historical data into this folder, we will then explore the pricing of a CSV file named `historical.csv` that has been included here. \n",
"\n",
"Follow along with the notebook as is first, then once you understand the process, update the filename to match your uploaded content and follow allong to completed the pricing exercise."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"CSV_FILENAME = \"historical.csv\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After updating the filename above to reflect your content, run the cell below to see a sample of your data:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" platform | \n",
" marketplace | \n",
" timestamp | \n",
" views | \n",
" revenue | \n",
"
\n",
" \n",
" \n",
" \n",
" 13232 | \n",
" pc_web | \n",
" de | \n",
" 2021-01-27 06:00:00 | \n",
" 269 | \n",
" 80.7 | \n",
"
\n",
" \n",
" 70769 | \n",
" mobile_app | \n",
" jp | \n",
" 2021-05-21 09:00:00 | \n",
" 195 | \n",
" 58.5 | \n",
"
\n",
" \n",
" 20118 | \n",
" pc_web | \n",
" us | \n",
" 2021-02-09 22:00:00 | \n",
" 498 | \n",
" 149.4 | \n",
"
\n",
" \n",
" 87217 | \n",
" pc_web | \n",
" es | \n",
" 2021-06-23 01:00:00 | \n",
" 102 | \n",
" 30.6 | \n",
"
\n",
" \n",
" 1015 | \n",
" mobile_web | \n",
" us | \n",
" 2021-01-03 00:00:00 | \n",
" 440 | \n",
" 132.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" platform marketplace timestamp views revenue\n",
"13232 pc_web de 2021-01-27 06:00:00 269 80.7\n",
"70769 mobile_app jp 2021-05-21 09:00:00 195 58.5\n",
"20118 pc_web us 2021-02-09 22:00:00 498 149.4\n",
"87217 pc_web es 2021-06-23 01:00:00 102 30.6\n",
"1015 mobile_web us 2021-01-03 00:00:00 440 132.0"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.read_csv(CSV_FILENAME)\n",
"data.sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the above cell, we see that `timestamp` was our timestamp field so now we can read the file again with some more specific instructions."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" platform | \n",
" marketplace | \n",
" views | \n",
" revenue | \n",
"
\n",
" \n",
" timestamp | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2021-07-17 09:00:00 | \n",
" mobile_app | \n",
" jp | \n",
" 185 | \n",
" 55.5 | \n",
"
\n",
" \n",
" 2021-04-01 13:00:00 | \n",
" mobile_app | \n",
" de | \n",
" 783 | \n",
" 234.9 | \n",
"
\n",
" \n",
" 2021-08-26 08:00:00 | \n",
" mobile_web | \n",
" de | \n",
" 180 | \n",
" 54.0 | \n",
"
\n",
" \n",
" 2021-04-11 08:00:00 | \n",
" pc_web | \n",
" fr | \n",
" 211 | \n",
" 63.3 | \n",
"
\n",
" \n",
" 2021-02-18 21:00:00 | \n",
" mobile_app | \n",
" es | \n",
" 446 | \n",
" 133.8 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" platform marketplace views revenue\n",
"timestamp \n",
"2021-07-17 09:00:00 mobile_app jp 185 55.5\n",
"2021-04-01 13:00:00 mobile_app de 783 234.9\n",
"2021-08-26 08:00:00 mobile_web de 180 54.0\n",
"2021-04-11 08:00:00 pc_web fr 211 63.3\n",
"2021-02-18 21:00:00 mobile_app es 446 133.8"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.read_csv(CSV_FILENAME,parse_dates=True, index_col='timestamp',)\n",
"data.sample(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here in this dataset we now see there are a few colums:\n",
"\n",
"Numerical:\n",
"* Views\n",
"* Revenue\n",
"\n",
"Categorical:\n",
"* platform\n",
"* marketplace\n",
"\n",
"In the parlance of Lookout for Metrics, this means our Domains are `platform` and `marketplace` and our Measures are `views` and `revenue`. The values within the domains are responsible for a large portion of the number of distinct metrics and the number of columns of measures account for the rest. The basic calculator then for the total number of metrics is:\n",
"\n",
"```\n",
"(distinct_values(domain1) * distinct_values(domain2)) * number_of_measure_columns\n",
"```\n",
"\n",
"In the cell below we first state the number of measure columns, followed by the list of domains that we wish to monitor in our dataset:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"number_of_measure_columns = 2\n",
"list_of_domains = [\"platform\", \"marketplace\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The cell below is a function that will take in our data, and the list of domains, and the number of columns and will return the total number of measures, you can simply run it to see the value:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def generate_unique_metrics(input_data, domain_list, number_of_measures):\n",
" \"\"\"\n",
" \"\"\"\n",
" # Assign to 0 first:\n",
" metrics = 0\n",
" for item in domain_list:\n",
" unique_values = input_data.eval(item).nunique()\n",
" # Check for the first entry\n",
" if metrics <= 0:\n",
" metrics += unique_values\n",
" # Sort the rest\n",
" else:\n",
" metrics = metrics * unique_values\n",
" # Now combine the number of measures:\n",
" metrics = metrics * number_of_measures\n",
" return metrics"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"42"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"number_of_metrics = generate_unique_metrics(input_data=data, domain_list=list_of_domains, number_of_measures=number_of_measure_columns)\n",
"number_of_metrics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we see that there are 42 unique metrics in our data, the next step is to determining the pricing, you can learn more about pricing here: https://aws.amazon.com/lookout-for-metrics/pricing/ . The cell below contains a function that will take in the total count then returns the USD price."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def generate_pricing(number_of_metrics):\n",
" assert number_of_metrics>=0\n",
" price_tiers = [\n",
" ( 50000, 0.05 ),\n",
" ( 20000, 0.10 ),\n",
" ( 5000, 0.25 ),\n",
" ( 1000, 0.50 ),\n",
" ( 0, 0.75 ),\n",
" ]\n",
" price = 0\n",
" n = number_of_metrics\n",
" for bottom_number_of_metrics, cost_per_metric in price_tiers:\n",
" if n > bottom_number_of_metrics:\n",
" cost_for_this_tier = (n-bottom_number_of_metrics) * cost_per_metric\n",
" price += cost_for_this_tier\n",
" n = bottom_number_of_metrics\n",
" #print (\"Cost for %d ~ : %.2f\" % (bottom_number_of_metrics,cost_for_this_tier) )\n",
" print(\"The total cost monthly for this workload of: \" + str(number_of_metrics) +\" metrics is: $\" + str(format(price, '.2f')))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The total cost monthly for this workload of: 42 metrics is: $31.50\n"
]
}
],
"source": [
"generate_pricing(number_of_metrics)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}