=== Data partitioning The curated data in the business-zone S3 bucket is partitioned by *reading type* and *reading date*, as follows: s3://BusinessBucket///. You can find all meter reads for a day on the lowest level of the partition tree. To optimize query performance, the data is stored in a column-based file format (Parquet). === Late-arriving data The data lake handles late-arriving data. If a meter sends data later, the ETL pipeline takes care of storing the late meter read in the correct partition. === Meter-data format The input meter-data format is variable and can be adjusted as described in the section <>. The sample input data format of the London meter reads looks like the following: [cols="1,1,1", options="header"] .Input schema |=== |Field |Type |Format |lclid|string| |stdortou|string| |datetime|string|yyyy-MM-dd HH:mm:ss.SSSSSSS |kwh/hh (per half hour)|double|0.000 |acorn|string| |acorn_grouped|string| |=== === Table schemas This Quick Start uses the following table schema for ETL jobs, ML model training, and API queries. This schema is also used in Amazon Redshift. [cols="1,1,1,1", options="header"] .Internal schema |=== |Field |Type |Mandatory |Format |meter_id| String| X| |reading_value| double| X|0.000 |reading_type| String| X|AGG\|INT* |reading_date_time| Timestamp| X|yyyy-MM-dd HH:mm:ss.SSS |date_str| String|X| yyyyMMdd |obis_code| String| | |week_of_year| int| | |month| int| | |year| int| | |hour| int| | |minute| int| | |=== AGG = Aggregated reads INT = Interval reads ==== Energy-usage forecast The energy forecast endpoint returns a list of predicted consumptions based on the trained ML model. [%header,cols=2*] .Parameter descriptions |=== |Parameter |Description |meter_id |The ID of the meter for which the forecast should be made. |data_start |Start date of meter reads that should be used a basis for the prediction. |data_end |End date of meter reads that should be used a basis for the prediction. |ml_endpoint_name |The ML endpoint created by SageMaker. |=== .url structure ---- https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/forecast/?data_start=<>&data_end=<> ---- .example curl command [source,shell script] ---- curl https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/forecast/MAC004734?data_start=2013-05-01&data_end=2013-10-01 ---- .response [source, json] ---- { "consumption":{ "1380585600000":0.7684248686, "1380589200000":0.4294368029, "1380592800000":0.3510326743, "1380596400000":0.2964941561, "1380600000000":0.3064994216, "1380603600000":0.4563854337, "1380607200000":0.6386958361, "1380610800000":0.5963768363, "1380614400000":0.5587928891, "1380618000000":0.4409750104, "1380621600000":0.4719932675 } } ---- ==== Anomaly detection The endpoint returns a list of energy-usage anomalies for a specific meter in a given time range. [%header,cols=2*] .Parameter descriptions |=== |Parameter |Description |meter_id |The ID of the meter for which anomalies should be determined. |data_start |Start date of the anomaly detection. |data_end |End date of the anomaly detection. |outlier_only |Determines whether all data or only outliers should be returned for a certain meter. |=== .url structure ---- https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/anomaly/?data_start=<>&data_end=<>&outlier_only=<> ---- .example curl command [source,shell script] ---- curl https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/anomaly/MAC000005?data_start=2013-01-01&data_end=2013-12-31&outlier_only=0 ---- The output returns a list of reading dates in which anomalies are marked on a daily basis. .response [source, json] ---- { "meter_id":{ "1":"MAC000005", "2":"MAC000005" }, "ds":{ "1":"2013-01-02", "2":"2013-01-07" }, "consumption":{ "1":5.7, "2":11.436 }, "yhat_lower":{ "1":3.3661955822, "2":3.5661772085 }, "yhat_upper":{ "1":9.1361769262, "2":8.9443160402 }, "anomaly":{ "1":0, "2":1 }, "importance":{ "1":0.0, "2":0.217880724 } } ---- ==== Meter-outage details The endpoint returns a list of meters that delivered an error code instead of a valid reading. Data includes geographic information (latitude, longitude) which can be used to create a map of current meter outages; for example, to visualize an outage cluster. [%header,cols=2*] .Parameter descriptions |=== |Parameter |Description |error_code |Error code which should be retrieved from the API. The parameter is **optional**. If not set, all errors in the timeframe will be returned. |start_date_time |Start of outage time frame. |end_date_time |End of outage time frame. |=== .url structure ---- https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/outage?error_code=<>&start_date_time=<>&end_date_time=<> ---- .example curl command [source,shell script] ---- curl https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/outage?start_date_time=2013-01-03+09:00:01&end_date_time=2013-01-03+10:59:59 ---- Returns a list of meters that had an outage in the requested time range. .response [source, json] ---- { "Items":[ { "meter_id":"MAC000138", "reading_date_time":"2013-01-03 09:30:00.000", "date_str":"20130103", "lat":40.7177325, "long":-74.043845 }, { "meter_id":"MAC000139", "reading_date_time":"2013-01-03 10:00:00.000", "date_str":"20130103", "lat":40.7177325, "long":-74.043845 } ] } ---- ==== Aggregation API The aggregation endpoint returns aggregated readings for a certain meter on a daily, weekly, or monthly basis. [%header,cols=2*] .Parameter descriptions |=== |Parameter |Description |aggregation level |Daily, weekly, or monthly aggregation level. |year |The year the aggregation should cover. |meter_id |The ID of the meter for which the aggregation should be done. |=== .url structure ---- https://xxxxxx.execute-api.us-east-1.amazonaws.com/consumption/// ---- .example curl command for monthly aggregated data: [source,shell script] ---- curl https://xxxxxx.execute-api.us-east-1.amazonaws.com/consumption/daily/2013/MAC001595 ---- .response: [source, json] ---- [ [ "MAC001595", "20130223", 3.34 ], [ "MAC001595", "20130221", 5.316 ], [ "MAC001595", "20130226", 4.623 ] ] ----