{ "cells": [ { "cell_type": "markdown", "id": "8d38587f", "metadata": {}, "source": [ "# Visualize your Lookout for Metrics on QuickSight\n", "## Introduction\n", "The notebook generates all the necessary resources to bring your inference results and backtest/live data from Amazon Lookout for Metrics (L4M) to Amazon Quickight for visualization. The services used in this notebook are:\n", "\n", "\n", "-[Amazon QuickSight](https://aws.amazon.com/quicksight/)\n", "\n", "-[AWS Glue](https://aws.amazon.com/glue)\n", "\n", "-[Amazon Athena](https://aws.amazon.com/athena)\n", "\n", "-[Amazon S3](https://aws.amazon.com/s3/)" ] }, { "cell_type": "code", "execution_count": null, "id": "5a88d4e9", "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import time\n", "import uuid \n", "import json" ] }, { "cell_type": "code", "execution_count": null, "id": "b906c988", "metadata": {}, "outputs": [], "source": [ "# Get AWS Region and AWS Account ID \n", "\n", "USING_SAGEMAKER = True\n", "# Change to region = \"us-east-1\" for example to use that region if you ARENT using SageMaker\n", "region = None\n", "\n", "if USING_SAGEMAKER: \n", " with open('/opt/ml/metadata/resource-metadata.json') as notebook_info:\n", " data = json.load(notebook_info)\n", " resource_arn = data['ResourceArn']\n", " region = resource_arn.split(':')[3]\n", "\n", "region_L4M = region\n", "region_user_Quicksight = region\n", "\n", "response = boto3.client('sts').get_caller_identity()\n", "AwsAccountId=response['Account']" ] }, { "cell_type": "code", "execution_count": null, "id": "becad71c", "metadata": {}, "outputs": [], "source": [ "# Define boto3 clients \n", "clientGlue = boto3.client('glue', region_name=region_L4M)\n", "clientS3 = boto3.client('s3', region_name=region_L4M)\n", "clientQuicksight = boto3.client('quicksight', region_name=region_L4M)" ] }, { "cell_type": "code", "execution_count": null, "id": "885ab206", "metadata": {}, "outputs": [], "source": [ "user_name = 'TeamRole/MasterKey'\n", "response = clientQuicksight.describe_user(\n", " UserName=user_name,\n", " AwsAccountId=AwsAccountId,\n", " Namespace='default'\n", ")\n", "user_ARN = response['User']['Arn']\n", "print(user_ARN)" ] }, { "cell_type": "code", "execution_count": null, "id": "c89e6c46", "metadata": {}, "outputs": [], "source": [ "# Load parameters\n", "with open('../ml_ops/params.json') as f:\n", " data = json.load(f)\n", "\n", "bucket_name= data['s3_bucket']\n", "# location of data used as input to backtest detector\n", "inference_data_input_path_s3='s3://'+bucket_name+'/ecommerce/backtest/' \n", "# location of anomalies from backtest \n", "inference_data_output_Lambda_path_s3='s3://'+bucket_name+'/anomalyResults/'\n", "\n", "crawler_role_arn = data['crawler_role_arn'] # /L4M_visualization_glue\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f2442774", "metadata": {}, "outputs": [], "source": [ "print(crawler_role_arn)" ] }, { "cell_type": "markdown", "id": "17a6d12c", "metadata": {}, "source": [ "## Amazon Glue\n", "### Database creation\n", "The following API will create a database in Glue in the region you specified previously. The database allows to store the metadata of the tables built by the crawler below. " ] }, { "cell_type": "code", "execution_count": null, "id": "611641b2", "metadata": {}, "outputs": [], "source": [ "#Database creation\n", "glue_dbname = 'l4m_anomalies_'+str(uuid.uuid4())\n", "response_glueDatabase = clientGlue.create_database(\n", " DatabaseInput={\n", " 'Name': glue_dbname,\n", " 'Description': 'Database to visualize the L4M results on Quicksight'\n", " }\n", ")\n", "print(glue_dbname)" ] }, { "cell_type": "markdown", "id": "5b1c1941", "metadata": {}, "source": [ "### Crawler\n", "Glue Crawler crawls S3 bucket for data files (e.g. CSV files) to create glue tables.\n", "The following API call will create the crawler and the next cell will run the crawler." ] }, { "cell_type": "code", "execution_count": null, "id": "2d2b18bd", "metadata": {}, "outputs": [], "source": [ "crawler_name = 'L4M_visualize_anomalies' + str(uuid.uuid4())\n", "response_crawler = clientGlue.create_crawler(\n", " Name=crawler_name,\n", " Role= crawler_role_arn,\n", " DatabaseName=glue_dbname,\n", " Description= 'Crawler - scan anomalies to visualize L4M results',\n", " Targets={\n", " 'S3Targets':[\n", " {\n", " 'Path': inference_data_input_path_s3\n", " },\n", " {\n", " 'Path': inference_data_output_Lambda_path_s3\n", " }\n", " ]\n", " }\n", ")\n", "print(crawler_name)" ] }, { "cell_type": "code", "execution_count": null, "id": "7f82ff1a", "metadata": {}, "outputs": [], "source": [ "clientGlue.start_crawler(Name=crawler_name)" ] }, { "cell_type": "code", "execution_count": null, "id": "8c507f02", "metadata": {}, "outputs": [], "source": [ "# check crawler job status \n", "finish=False\n", "while finish==False:\n", " \n", " response = clientGlue.get_crawler(\n", " Name=crawler_name, \n", " )\n", " if response['Crawler']['State']=='STOPPING':\n", " time.sleep(20)\n", " print('Crawler has finished')\n", " finish=True\n", " else:\n", " print('Crawler is running')\n", " time.sleep(10)" ] }, { "cell_type": "code", "execution_count": null, "id": "0900b8a0", "metadata": {}, "outputs": [], "source": [ "# Populate data from Glue tables. this will be used for setting up Quicksight datasets\n", "response_glue_tables = clientGlue.get_tables(\n", " DatabaseName=glue_dbname\n", ")\n", "s= response_glue_tables['TableList']\n", "table_dimensioncontributions = next((item for item in s if item['Name'] == 'dimensioncontributions'), None)\n", "table_metricvalueAnomalyscore = next((item for item in s if item['Name'] == 'metricvalue_anomalyscore'), None)\n", "table_liveData = next((item for item in s if item['Name'] == 'backtest'), None)\n", "\n", "#Formating the data to fit the dataset for QuickSight\n", "for item in table_dimensioncontributions['StorageDescriptor']['Columns']:\n", " item['Type'] = item['Type'].upper()\n", " if item['Type']=='DOUBLE' or item['Type']=='BIGINT':\n", " item['Type']='DECIMAL'\n", " \n", "for item in table_metricvalueAnomalyscore['StorageDescriptor']['Columns']:\n", " item['Type'] = item['Type'].upper()\n", " if item['Type']=='DOUBLE' or item['Type']=='BIGINT':\n", " item['Type']='DECIMAL'\n", " \n", "for item in table_liveData['StorageDescriptor']['Columns']:\n", " item['Type'] = item['Type'].upper()\n", " if item['Type']=='DOUBLE' or item['Type']=='BIGINT':\n", " item['Type']='DECIMAL'" ] }, { "cell_type": "markdown", "id": "e748180f", "metadata": {}, "source": [ "## Amazon QuickSight - Visualization in 3 steps \n", "1/ Define data source\n", "2/ Create Dataset\n", "3/ Create and visualize analysis" ] }, { "cell_type": "code", "execution_count": null, "id": "457874fb", "metadata": {}, "outputs": [], "source": [ "dsId = 'L4M_ds_' + str(uuid.uuid4())\n", "response_datasource = clientQuicksight.create_data_source(\n", " AwsAccountId=AwsAccountId,\n", " DataSourceId=dsId,\n", " Name='L4M_Visualization',\n", " Type='ATHENA',\n", " Permissions=[\n", " {\n", " 'Principal': user_ARN,\n", " 'Actions': [ \n", " 'quicksight:DescribeDataSource',\n", " 'quicksight:DescribeDataSourcePermissions',\n", " 'quicksight:UpdateDataSource',\n", " 'quicksight:UpdateDataSourcePermissions',\n", " 'quicksight:DeleteDataSource',\n", " 'quicksight:PassDataSource',\n", " ]\n", " },\n", " ],\n", " \n", ")\n", "print('Data source ID :', dsId)" ] }, { "cell_type": "markdown", "id": "e3214e47", "metadata": {}, "source": [ "### Dataset creation\n", "We will generate a dataset by using the API call [create_data_set()](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/quicksight.html#QuickSight.Client.create_data_set) below. We join the tables *backtestData* and *metricvalueAnomalyscore*. The variable *colummns_to_be_shown* allows to choose what metrics you would like to see in your analysis." ] }, { "cell_type": "code", "execution_count": null, "id": "ddc135e1", "metadata": {}, "outputs": [], "source": [ "colummns_to_be_shown = ['timestamp','marketplace','platform','views','revenue','revenueanomalymetricvalue',\n", " 'revenuegroupscore','viewsanomalymetricvalue','viewsgroupscore']\n", "\n", "\n", "response_dataset_live = clientQuicksight.create_data_set(\n", " AwsAccountId= AwsAccountId,\n", " DataSetId='L4M_liveData'+str(uuid.uuid4()),\n", " Name='L4M_Visualization_dataset_with_backtestData',\n", " PhysicalTableMap={\n", " 'liveData': {\n", " 'RelationalTable': {\n", " 'DataSourceArn': response_datasource['Arn'],\n", " 'Catalog': 'AwsDataCatalog',\n", " 'Schema': glue_dbname,\n", " 'Name': 'backtest',\n", " 'InputColumns': table_liveData['StorageDescriptor']['Columns'] \n", " } \n", " },\n", " 'metricvalueAnomalyscore': {\n", " 'RelationalTable': {\n", " 'DataSourceArn': response_datasource['Arn'],\n", " 'Catalog': 'AwsDataCatalog',\n", " 'Schema': glue_dbname,\n", " 'Name': 'metricvalue_anomalyscore',\n", " 'InputColumns': table_metricvalueAnomalyscore['StorageDescriptor']['Columns'] \n", " } \n", " },\n", " },\n", " \n", " LogicalTableMap={\n", " 'liveData': {\n", " 'Alias': 'alias liveData',\n", " 'Source':{\n", " 'PhysicalTableId': 'liveData'\n", " },\n", " 'DataTransforms': [\n", " {\n", " 'RenameColumnOperation': {\n", " 'ColumnName': 'timestamp',\n", " 'NewColumnName': 'timestampInput'\n", " }\n", " }\n", " ],\n", " \n", " },\n", " 'metricvalueAnomalyscore':{\n", " 'Alias': 'alias metricvalueAnomalyscore',\n", " 'Source':{\n", " 'PhysicalTableId': 'metricvalueAnomalyscore'\n", " },\n", " 'DataTransforms': [\n", " {\n", " 'RenameColumnOperation': {\n", " 'ColumnName': 'timestamp',\n", " 'NewColumnName': 'timestampAnomalyscore'\n", " }\n", " },\n", " {\n", " 'RenameColumnOperation': {\n", " 'ColumnName': 'marketplace',\n", " 'NewColumnName': 'marketplace_anomalyScore'\n", " }\n", " },\n", " {\n", " 'RenameColumnOperation': {\n", " 'ColumnName': 'platform',\n", " 'NewColumnName': 'platform_anomalyScore'\n", " }\n", " }\n", " ],\n", " },\n", " \n", " 'joinWithRight':{\n", " 'Alias': 'RIGHT join of input and output',\n", " 'Source': {\n", " 'JoinInstruction': {\n", " 'LeftOperand': 'liveData',\n", " 'RightOperand': 'metricvalueAnomalyscore',\n", " 'Type': 'LEFT',\n", " 'OnClause': ' timestampInput = timestampAnomalyscore AND marketplace = marketplace_anomalyScore AND platform= platform_anomalyScore' \n", " },\n", " \n", " },\n", " 'DataTransforms': [\n", " {\n", " 'CreateColumnsOperation': {\n", " 'Columns': [\n", " {\n", " 'ColumnName': 'timestamp',\n", " 'ColumnId': 'calculatedTimestamp',\n", " 'Expression': 'parseDate(timestampInput,\"yyyy-MM-dd HH:mm:ss\")'\n", " },\n", " ]\n", " \n", " },\n", " },\n", " \n", " {\n", " 'ProjectOperation': {\n", " 'ProjectedColumns': colummns_to_be_shown\n", " }\n", " }\n", " \n", " \n", " \n", " ],\n", " },\n", " \n", " \n", " },\n", " ImportMode='DIRECT_QUERY', \n", " Permissions= [\n", " {\n", " 'Principal': user_ARN,\n", " \"Actions\": [\n", " \"quicksight:DescribeDataSet\",\n", " \"quicksight:DescribeDataSetPermissions\",\n", " \"quicksight:PassDataSet\",\n", " \"quicksight:DescribeIngestion\",\n", " \"quicksight:ListIngestions\",\n", " 'quicksight:UpdateDataSet',\n", " \"quicksight:DeleteDataSet\"\n", " ]\n", " \n", " } \n", " ],\n", ")\n" ] }, { "cell_type": "markdown", "id": "1ed0824f", "metadata": {}, "source": [ "\n", "The next step is to build the dashboards of the data you would like to see)" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 }