The first step is to run this notebook and prepare a dataset for input into Amazon Forecast.

# 1.Download dataset
We use data from the following sites to track sales on e-commerce sites.   
https://archive.ics.uci.edu/ml/datasets/Online+Retail+II

In [1]:
! wget https://archive.ics.uci.edu/ml/machine-learning-databases/00502/online_retail_II.xlsx -P ./input

--2020-08-01 05:51:39--  https://archive.ics.uci.edu/ml/machine-learning-databases/00502/online_retail_II.xlsx
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 45622278 (44M) [application/x-httpd-php]
Saving to: ‘./input/online_retail_II.xlsx’


2020-08-01 05:51:41 (19.0 MB/s) - ‘./input/online_retail_II.xlsx’ saved [45622278/45622278]



# 2.Load dataset
Load the downloaded data and add a sales column.

In [2]:
import pandas as pd

In [3]:
df = pd.read_excel('./input/online_retail_II.xlsx', sheet_name='Year 2009-2010')

In [4]:
df['sales'] = df['Price'] * df['Quantity']

# 3.Build dataset
From the dataset, create two sets, one for initial training and one for automatic training using the pipeline.

train:2009/12/01 - 2010/12/02   
train_added:2009/12/01 - 2010/12/09

In [5]:
df2 = df[['Country', 'InvoiceDate', 'sales']]

In [6]:
df2 = df2.query('Country == "United Kingdom"')

In [7]:
df2.head()

Unnamed: 0,Country,InvoiceDate,sales
0,United Kingdom,2009-12-01 07:45:00,83.4
1,United Kingdom,2009-12-01 07:45:00,81.0
2,United Kingdom,2009-12-01 07:45:00,81.0
3,United Kingdom,2009-12-01 07:45:00,100.8
4,United Kingdom,2009-12-01 07:45:00,30.0


In [8]:
!mkdir -p output

In [9]:
df2.to_csv('./output/tr_target_add_20091201_20101209.csv', header=False, index=False)

In [10]:
tr1 = df2.query('InvoiceDate <= "20101203"')

In [11]:
tr1.tail()

Unnamed: 0,Country,InvoiceDate,sales
508150,United Kingdom,2010-12-02 19:59:00,3.4
508151,United Kingdom,2010-12-02 19:59:00,0.65
508152,United Kingdom,2010-12-02 19:59:00,5.95
508153,United Kingdom,2010-12-02 19:59:00,5.9
508154,United Kingdom,2010-12-02 19:59:00,9.9


In [12]:
tr1.to_csv('./output/tr_target_20091201_20101202.csv', header=False, index=False)

# 4.Upload dataset to S3
Create a bucket in S3 and upload the dataset.

## make bucket

In [13]:
import boto3

In [14]:
boto3.__version__

'1.14.16'

In [15]:
sts = boto3.client('sts')
id_info = sts.get_caller_identity()
print(id_info['Account'])

805433377179


In [16]:
s3 = boto3.client('s3')

In [17]:
bucket_name = 'demo-forecast-' + id_info['Account']

In [18]:
bucket_name

'demo-forecast-805433377179'

In [19]:
s3.create_bucket(Bucket=bucket_name)

{'ResponseMetadata': {'RequestId': '3E78EF02EB2C9A75',
  'HostId': 'WEvfjkpLyQGz0gcZavGvEqZ5qLjyKrOIa2dtKAlum3ztL9iaQ8Xj2JoVLjMCabAuYU3Nz4H6ivk=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'WEvfjkpLyQGz0gcZavGvEqZ5qLjyKrOIa2dtKAlum3ztL9iaQ8Xj2JoVLjMCabAuYU3Nz4H6ivk=',
   'x-amz-request-id': '3E78EF02EB2C9A75',
   'date': 'Sat, 01 Aug 2020 05:53:27 GMT',
   'location': '/demo-forecast-805433377179',
   'content-length': '0',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'Location': '/demo-forecast-805433377179'}

## upload dataset

In [20]:
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)

In [21]:
bucket.upload_file('./output/tr_target_20091201_20101202.csv',
                   'input/tr_target_20091201_20101202.csv')

In [22]:
bucket.upload_file('./output/tr_target_add_20091201_20101209.csv',
                   'input/tr_target_add_20091201_20101209.csv')

## upload manifest file
Create a manifest file for use in Quick Sight and upload it to S3.

In [23]:
import json

In [24]:
manifest_for_qs={
  "fileLocations": [
    {
      "URIs": []
    },
    {
      "URIPrefixes": [
        "s3://" + bucket_name + "/output/"
      ]
    }
  ],
  "globalUploadSettings": {
    "format": "CSV",
    "delimiter": ",",
    "textqualifier": "'",
    "containsHeader": "true"
  }
}

In [25]:
!mkdir -p manifest_for_quicksight

In [26]:
with open('./manifest_for_quicksight/manifest_uk_sales_pred.json', 'w') as f:
    json.dump(manifest_for_qs, f, indent=2, ensure_ascii=False)

In [27]:
bucket.upload_file('./manifest_for_quicksight/manifest_uk_sales_pred.json',
                   'manifest_for_quicksight/manifest_uk_sales_pred.json')

# 5.NEXT
Manually run the forecast with Amazon Forecast. export the forecast results to S3 and visualize them in Amazon QuickSight.    
When the visualization is complete, run 2_build_forecast_pipeline.ipynb to build an automatic forecast pipeline.  