=== Data partitioning
The curated data in the business-zone S3 bucket is partitioned by *reading type* and *reading date*, as follows: s3://BusinessBucket/<reading_type_value>/<date>/<meter-data-file-in-parquet-format>.

You can find all meter reads for a day on the lowest level of the partition tree. To optimize query performance, the data is stored in a column-based file format (Parquet).

=== Late-arriving data
The data lake handles late-arriving data. If a meter sends data later, the ETL pipeline takes care of storing the late meter read in the correct partition.

=== Meter-data format

The input meter-data format is variable and can be adjusted as described in the section <<Customize this Quick Start,'Customize this Quick Start'>>. The sample input data format of the London meter reads looks like the following:

[cols="1,1,1", options="header"]
.Input schema
|===
|Field
|Type
|Format

|lclid|string|
|stdortou|string|
|datetime|string|yyyy-MM-dd HH:mm:ss.SSSSSSS
|kwh/hh (per half hour)|double|0.000
|acorn|string|
|acorn_grouped|string|
|===

=== Table schemas

This Quick Start uses the following table schema for ETL jobs, ML model training, and API queries. This schema is also used in Amazon Redshift.

[cols="1,1,1,1", options="header"]
.Internal schema
|===
|Field
|Type
|Mandatory
|Format

|meter_id| String| X|
|reading_value| double| X|0.000
|reading_type| String| X|AGG\|INT*
|reading_date_time| Timestamp| X|yyyy-MM-dd HH:mm:ss.SSS
|date_str| String|X| yyyyMMdd
|obis_code| String| |
|week_of_year| int| |
|month| int| |
|year| int| |
|hour| int| |
|minute| int| |
|===

AGG = Aggregated reads

INT = Interval reads

==== Energy-usage forecast

The energy forecast endpoint returns a list of predicted consumptions based on the trained ML model.

[%header,cols=2*]
.Parameter descriptions
|===
|Parameter
|Description

|meter_id
|The ID of the meter for which the forecast should be made.

|data_start
|Start date of meter reads that should be used a basis for the prediction.

|data_end
|End date of meter reads that should be used a basis for the prediction.

|ml_endpoint_name
|The ML endpoint created by SageMaker.
|===

.url structure
----
https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/forecast/<meter_id>?data_start=<>&data_end=<>
----

.example curl command
[source,shell script]
----
curl https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/forecast/MAC004734?data_start=2013-05-01&data_end=2013-10-01
----

.response
[source, json]
----
{
   "consumption":{
      "1380585600000":0.7684248686,
      "1380589200000":0.4294368029,
      "1380592800000":0.3510326743,
      "1380596400000":0.2964941561,
      "1380600000000":0.3064994216,
      "1380603600000":0.4563854337,
      "1380607200000":0.6386958361,
      "1380610800000":0.5963768363,
      "1380614400000":0.5587928891,
      "1380618000000":0.4409750104,
      "1380621600000":0.4719932675
   }
}
----

==== Anomaly detection

The endpoint returns a list of energy-usage anomalies for a specific meter in a given time range.

[%header,cols=2*]
.Parameter descriptions
|===
|Parameter
|Description

|meter_id
|The ID of the meter for which anomalies should be determined.

|data_start
|Start date of the anomaly detection.

|data_end
|End date of the anomaly detection.

|outlier_only
|Determines whether all data or only outliers should be returned for a certain meter.
|===

.url structure
----
https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/anomaly/<meter_id>?data_start=<>&data_end=<>&outlier_only=<>
----

.example curl command
[source,shell script]
----
curl https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/anomaly/MAC000005?data_start=2013-01-01&data_end=2013-12-31&outlier_only=0
----

The output returns a list of reading dates in which anomalies are marked on a daily basis.

.response
[source, json]
----
{
   "meter_id":{
      "1":"MAC000005",
      "2":"MAC000005"
   },
   "ds":{
      "1":"2013-01-02",
      "2":"2013-01-07"
   },
   "consumption":{
      "1":5.7,
      "2":11.436
   },
   "yhat_lower":{
      "1":3.3661955822,
      "2":3.5661772085
   },
   "yhat_upper":{
      "1":9.1361769262,
      "2":8.9443160402
   },
   "anomaly":{
      "1":0,
      "2":1
   },
   "importance":{
      "1":0.0,
      "2":0.217880724
   }
}
----

==== Meter-outage details

The endpoint returns a list of meters that delivered an error code instead of a valid reading. Data includes geographic information (latitude, longitude) which can be used to create a map of current meter outages; for example, to visualize an outage cluster.

[%header,cols=2*]
.Parameter descriptions
|===
|Parameter
|Description

|error_code
|Error code which should be retrieved from the API. The parameter is **optional**. If not set, all errors in the timeframe will be returned.

|start_date_time
|Start of outage time frame.

|end_date_time
|End of outage time frame.

|===

.url structure
----
https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/outage?error_code=<>&start_date_time=<>&end_date_time=<>
----

.example curl command
[source,shell script]
----
curl https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/outage?start_date_time=2013-01-03+09:00:01&end_date_time=2013-01-03+10:59:59
----

Returns a list of meters that had an outage in the requested time range.

.response
[source, json]
----
{
   "Items":[
      {
         "meter_id":"MAC000138",
         "reading_date_time":"2013-01-03 09:30:00.000",
         "date_str":"20130103",
         "lat":40.7177325,
         "long":-74.043845
      },
      {
         "meter_id":"MAC000139",
         "reading_date_time":"2013-01-03 10:00:00.000",
         "date_str":"20130103",
         "lat":40.7177325,
         "long":-74.043845
      }
  ]
   }
----


==== Aggregation API
The aggregation endpoint returns aggregated readings for a certain meter on a daily, weekly, or monthly basis.

[%header,cols=2*]
.Parameter descriptions
|===
|Parameter
|Description

|aggregation level
|Daily, weekly, or monthly aggregation level.

|year
|The year the aggregation should cover.

|meter_id
|The ID of the meter for which the aggregation should be done.

|===

.url structure
----
https://xxxxxx.execute-api.us-east-1.amazonaws.com/consumption/<daily|weekly|monthly>/<year>/<meter_id>
----

.example curl command for monthly aggregated data:
[source,shell script]
----
curl https://xxxxxx.execute-api.us-east-1.amazonaws.com/consumption/daily/2013/MAC001595
----

.response:
[source, json]
----
[
   [
      "MAC001595",
      "20130223",
      3.34
   ],
   [
      "MAC001595",
      "20130221",
      5.316
   ],
   [
      "MAC001595",
      "20130226",
      4.623
   ]
]
----