# Evaluating Your Forecast

So far you have prepared your data, and generated your first Forecast. Now is the time to pull down the predictions from this Predictor, and compare them to the actual observed values. This will let us know the impact of accuracy based on the Forecast.

You can extend the approaches here to compare multiple models or predictors and to determine the impact of improved accuracy on your use case.

Overview:

* Setup
* Obtaining a Prediction
* Plotting the Actual Results
* Plotting the Prediction
* Comparing the Prediction to Actual Results

## Setup

Import the standard Python Libraries that are used in this lesson.

In [None]:
import json
import time
import dateutil.parser

import boto3
import pandas as pd

The line below will retrieve your shared variables from the earlier notebooks.

In [None]:
%store -r

Once again connect to the Forecast APIs via the SDK.

In [None]:
session = boto3.Session(region_name=region) 
forecast = session.client(service_name='forecast') 
forecastquery = session.client(service_name='forecastquery')

## Obtaining a Prediction:

Now that your predictor is active we will query it to get a prediction that will be plotted later.

In [None]:
forecastResponse = forecastquery.query_forecast(
 ForecastArn=forecast_arn_deep_ar,
 Filters={"item_id":"client_12"}
)

## Plotting the Actual Results

In the first notebook we created a file of observed values, we are now going to select a given date and customer from that dataframe and are going to plot the actual usage data for that customer. 

In [None]:
actual_df = pd.read_csv("data/item-demand-time-validation.csv", names=['timestamp','value','item'])
actual_df.head()

Next we need to reduce the data to just the day we wish to plot, which is the First of November 2014.

In [None]:
actual_df = actual_df[(actual_df['timestamp'] >= '2014-11-01') & (actual_df['timestamp'] < '2014-11-02')]

Lastly, only grab the items for client_12

In [None]:
actual_df = actual_df[(actual_df['item'] == 'client_12')]
actual_df.head()

In [None]:
actual_df.plot()

## Plotting the Prediction:

Next we need to convert the JSON response from the Predictor to a dataframe that we can plot.

In [None]:
# Generate DF 
prediction_df_p10 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p10'])
prediction_df_p10.head()

In [None]:
# Plot
prediction_df_p10.plot()

The above merely did the p10 values, now do the same for p50 and p90.

In [None]:
prediction_df_p50 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p50'])
prediction_df_p90 = pd.DataFrame.from_dict(forecastResponse['Forecast']['Predictions']['p90'])

## Comparing the Prediction to Actual Results

After obtaining the dataframes the next task is to plot them together to determine the best fit.

In [None]:
# We start by creating a dataframe to house our content, here source will be which dataframe it came from
results_df = pd.DataFrame(columns=['timestamp', 'value', 'source'])

Import the observed values into the dataframe:

In [None]:
for index, row in actual_df.iterrows():
 clean_timestamp = dateutil.parser.parse(row['timestamp'])
 results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['value'], 'source': 'actual'} , ignore_index=True)

In [None]:
# To show the new dataframe
results_df.head()

In [None]:
# Now add the P10, P50, and P90 Values
for index, row in prediction_df_p10.iterrows():
 clean_timestamp = dateutil.parser.parse(row['Timestamp'])
 results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p10'} , ignore_index=True)
for index, row in prediction_df_p50.iterrows():
 clean_timestamp = dateutil.parser.parse(row['Timestamp'])
 results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p50'} , ignore_index=True)
for index, row in prediction_df_p90.iterrows():
 clean_timestamp = dateutil.parser.parse(row['Timestamp'])
 results_df = results_df.append({'timestamp' : clean_timestamp , 'value' : row['Value'], 'source': 'p90'} , ignore_index=True)

In [None]:
results_df

In [None]:
pivot_df = results_df.pivot(columns='source', values='value', index="timestamp")

pivot_df

In [None]:
pivot_df.plot()

Once you are done exploring this Forecast you can cleanup all the work that was done by executing the cells inside `Cleanup.ipynb` within this folder.