{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluating Your Forecast\n", "\n", "So far you have prepared your data, and generated your first Forecast. Now is the time to pull down the predictions from this Predictor, and compare them to the actual observed values. This will let us know the impact of accuracy based on the Forecast.\n", "\n", "You can extend the approaches here to compare multiple models or predictors and to determine the impact of improved accuracy on your use case.\n", "\n", "Overview:\n", "\n", "* Setup\n", "* Obtaining a Prediction\n", "* Plotting the Actual Results\n", "* Plotting the Prediction\n", "* Comparing the Prediction to Actual Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import the standard Python Libraries that are used in this lesson." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import time\n", "import dateutil.parser\n", "\n", "import boto3\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The line below will retrieve your shared variables from the earlier notebooks." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%store -r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again connect to the Forecast APIs via the SDK." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "session = boto3.Session(region_name=region) \n", "forecast = session.client(service_name='forecast') \n", "forecastquery = session.client(service_name='forecastquery')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Obtaining a Prediction:\n", "\n", "Now that your predictor is active we will query it to get a prediction that will be plotted later." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "forecastResponse = forecastquery.query_forecast(\n", " ForecastArn=forecast_arn_deep_ar,\n", " Filters={\"item_id\":\"client_12\"}\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting the Actual Results\n", "\n", "In the first notebook we created a file of observed values, we are now going to select a given date and customer from that dataframe and are going to plot the actual usage data for that customer. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "actual_df = pd.read_csv(\"data/item-demand-time-validation.csv\", names=['timestamp','value','item'])\n", "actual_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we need to reduce the data to just the day we wish to plot, which is the First of November 2014." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "actual_df = actual_df[(actual_df['timestamp'] >= '2014-11-01') & (actual_df['timestamp'] < '2014-11-02')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, only grab the items for client_12" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "actual_df = actual_df[(actual_df['item'] == 'client_12')]\n", "actual_df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "actual_df.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting the Prediction:\n", "\n", "Next we need to convert the JSON response from the Predictor to a dataframe that we can plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Generate DF \n", "prediction_df_p10 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p10'])\n", "prediction_df_p10.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot\n", "prediction_df_p10.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above merely did the p10 values, now do the same for p50 and p90." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "prediction_df_p50 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p50'])\n", "prediction_df_p90 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p90'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparing the Prediction to Actual Results\n", "\n", "After obtaining the dataframes the next task is to plot them together to determine the best fit." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We start by creating a dataframe to house our content, here source will be which dataframe it came from\n", "results_df = pd.DataFrame(columns=['timestamp', 'value', 'source'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import the observed values into the dataframe:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for index, row in actual_df.iterrows():\n", " clean_timestamp = dateutil.parser.parse(row['timestamp'])\n", " results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['value'], 'source': 'actual'} , ignore_index=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# To show the new dataframe\n", "results_df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now add the P10, P50, and P90 Values\n", "for index, row in prediction_df_p10.iterrows():\n", " clean_timestamp = dateutil.parser.parse(row['Timestamp'])\n", " results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p10'} , ignore_index=True)\n", "for index, row in prediction_df_p50.iterrows():\n", " clean_timestamp = dateutil.parser.parse(row['Timestamp'])\n", " results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p50'} , ignore_index=True)\n", "for index, row in prediction_df_p90.iterrows():\n", " clean_timestamp = dateutil.parser.parse(row['Timestamp'])\n", " results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p90'} , ignore_index=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "results_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pivot_df = results_df.pivot(columns='source', values='value', index=\"timestamp\")\n", "\n", "pivot_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pivot_df.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you are done exploring this Forecast you can cleanup all the work that was done by executing the cells inside `Cleanup.ipynb` within this folder." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 4 }