{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Amazon SageMaker Model Monitor\n", "This notebook shows how to:\n", "* Host a machine learning model in Amazon SageMaker and capture inference requests, results, and metadata \n", "* Analyze a training dataset to generate baseline constraints\n", "* Monitor a live endpoint or batch transforms for violations against constraints\n", "\n", "---\n", "## Background\n", "\n", "Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that encompasses the entire machine learning workflow. You can label and prepare your data, choose an algorithm, train a model, and then tune and optimize it for deployment. You can deploy your models to production with Amazon SageMaker to make predictions and lower costs than was previously possible.\n", "\n", "In addition, Amazon SageMaker enables you to capture the input, output and metadata for invocations of the models that you deploy. It also enables you to analyze the data and monitor its quality. In this notebook, you learn how Amazon SageMaker enables these capabilities.\n", "\n", "---\n", "## Setup\n", "\n", "To get started, make sure you have these prerequisites completed.\n", "\n", "* Specify an AWS Region to host your model.\n", "* An IAM role ARN exists that is used to give Amazon SageMaker access to your data in Amazon Simple Storage Service (Amazon S3). See the documentation for how to fine tune the permissions needed. \n", "* Create an S3 bucket used to store the data used to train your model, any additional model data, and the data captured from model invocations. For demonstration purposes, you are using the same bucket for these. In reality, you might want to separate them with different security policies." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "isConfigCell": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RoleArn: arn:aws:iam::802173394839:role/service-role/AmazonSageMaker-ExecutionRole-20210202T065513\n", "Demo Bucket: sagemaker-eu-west-1-802173394839\n", "Capture path: s3://sagemaker-eu-west-1-802173394839/sagemaker/DEMO-ModelMonitor/datacapture\n", "Report path: s3://sagemaker-eu-west-1-802173394839/sagemaker/DEMO-ModelMonitor/reports\n", "Preproc Code path: s3://sagemaker-eu-west-1-802173394839/sagemaker/DEMO-ModelMonitor/code/preprocessor.py\n", "Postproc Code path: s3://sagemaker-eu-west-1-802173394839/sagemaker/DEMO-ModelMonitor/code/postprocessor.py\n", "CPU times: user 607 ms, sys: 118 ms, total: 724 ms\n", "Wall time: 1.13 s\n" ] } ], "source": [ "%%time\n", "# cell 01\n", "\n", "# Handful of configuration\n", "\n", "import os\n", "import boto3\n", "import re\n", "import json\n", "from sagemaker import get_execution_role, session\n", "\n", "region= boto3.Session().region_name\n", "\n", "role = get_execution_role()\n", "print(\"RoleArn: {}\".format(role))\n", "\n", "# You can use a different bucket, but make sure the role you chose for this notebook\n", "# has the s3:PutObject permissions. This is the bucket into which the data is captured\n", "bucket = session.Session(boto3.Session()).default_bucket()\n", "print(\"Demo Bucket: {}\".format(bucket))\n", "prefix = 'sagemaker/DEMO-ModelMonitor'\n", "\n", "data_capture_prefix = '{}/datacapture'.format(prefix)\n", "s3_capture_upload_path = 's3://{}/{}'.format(bucket, data_capture_prefix)\n", "reports_prefix = '{}/reports'.format(prefix)\n", "s3_report_path = 's3://{}/{}'.format(bucket,reports_prefix)\n", "code_prefix = '{}/code'.format(prefix)\n", "s3_code_preprocessor_uri = 's3://{}/{}/{}'.format(bucket,code_prefix, 'preprocessor.py')\n", "s3_code_postprocessor_uri = 's3://{}/{}/{}'.format(bucket,code_prefix, 'postprocessor.py')\n", "\n", "print(\"Capture path: {}\".format(s3_capture_upload_path))\n", "print(\"Report path: {}\".format(s3_report_path))\n", "print(\"Preproc Code path: {}\".format(s3_code_preprocessor_uri))\n", "print(\"Postproc Code path: {}\".format(s3_code_postprocessor_uri))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can quickly verify that the execution role for this notebook has the necessary permissions to proceed. Put a simple test object into the S3 bucket you speciļ¬ed above. If this command fails, update the role to have `s3:PutObject` permission on the bucket and try again." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Success! You are all set to proceed.\n" ] } ], "source": [ "# cell 02\n", "# Upload some test files\n", "boto3.Session().resource('s3').Bucket(bucket).Object(\"test_upload/test.txt\").upload_file('test_data/upload-test-file.txt')\n", "print(\"Success! You are all set to proceed.\")" ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "# Option 1: Model monitoring with Real time endpoints" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## PART A: Capturing real-time inference data from Amazon SageMaker endpoints\n", "Create an endpoint to showcase the data capture capability in action.\n", "\n", "### Upload the pre-trained model to Amazon S3\n", "This code uploads a pre-trained XGBoost model that is ready for you to deploy. This model was trained using the XGB Churn Prediction Notebook in SageMaker. You can also use your own pre-trained model in this step. If you already have a pretrained model in Amazon S3, you can add it instead by specifying the s3_key." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# cell 03\n", "model_file = open(\"model/xgb-churn-prediction-model.tar.gz\", 'rb')\n", "s3_key = os.path.join(prefix, 'xgb-churn-prediction-model.tar.gz')\n", "boto3.Session().resource('s3').Bucket(bucket).Object(s3_key).upload_fileobj(model_file)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Deploy the model to Amazon SageMaker\n", "Start with deploying a pre-trained churn prediction model. Here, you create the model object with the image and model data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# cell 04\n", "from time import gmtime, strftime\n", "from sagemaker.model import Model\n", "from sagemaker.image_uris import retrieve\n", "\n", "model_name = \"DEMO-xgb-churn-pred-model-monitor-\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "model_url = 'https://{}.s3-{}.amazonaws.com/{}/xgb-churn-prediction-model.tar.gz'.format(bucket, region, prefix)\n", "image_uri = retrieve(region=boto3.Session().region_name, framework='xgboost', version='0.90-2')\n", "\n", "model = Model(image_uri=image_uri, model_data=model_url, role=role)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To enable data capture for monitoring the model data quality, you specify the new capture option called `DataCaptureConfig`. You can capture the request payload, the response payload or both with this configuration. The capture config applies to all variants. Go ahead with the deployment." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "EndpointName=DEMO-xgb-churn-pred-model-monitor-2021-02-02-12-30-27\n", "---------------!" ] } ], "source": [ "# cell 05\n", "from sagemaker.model_monitor import DataCaptureConfig\n", "\n", "endpoint_name = 'DEMO-xgb-churn-pred-model-monitor-' + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "print(\"EndpointName={}\".format(endpoint_name))\n", "\n", "data_capture_config = DataCaptureConfig(\n", " enable_capture=True,\n", " sampling_percentage=100,\n", " destination_s3_uri=s3_capture_upload_path)\n", "\n", "predictor = model.deploy(initial_instance_count=1,\n", " instance_type='ml.m4.xlarge',\n", " endpoint_name=endpoint_name,\n", " data_capture_config=data_capture_config)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Invoke the deployed model\n", "\n", "You can now send data to this endpoint to get inferences in real time. Because you enabled the data capture in the previous steps, the request and response payload, along with some additional metadata, is saved in the Amazon Simple Storage Service (Amazon S3) location you have specified in the DataCaptureConfig." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This step invokes the endpoint with included sample data for about 2 minutes. Data is captured based on the sampling percentage specified and the capture continues until the data capture option is turned off." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sending test traffic to the endpoint DEMO-xgb-churn-pred-model-monitor-2021-02-02-12-30-27. \n", "Please wait...\n", "Done!\n" ] } ], "source": [ "# cell 06\n", "from sagemaker.predictor import Predictor\n", "import sagemaker\n", "import time\n", "\n", "predictor = Predictor(endpoint_name=endpoint_name, serializer=sagemaker.serializers.CSVSerializer())\n", "\n", "# get a subset of test data for a quick test\n", "!head -120 test_data/test-dataset-input-cols.csv > test_data/test_sample.csv\n", "print(\"Sending test traffic to the endpoint {}. \\nPlease wait...\".format(endpoint_name))\n", "\n", "with open('test_data/test_sample.csv', 'r') as f:\n", " for row in f:\n", " payload = row.rstrip('\\n')\n", " response = predictor.predict(data=payload)\n", " time.sleep(0.5)\n", " \n", "print(\"Done!\") " ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### View captured data\n", "\n", "Now list the data capture files stored in Amazon S3. You should expect to see different files from different time periods organized based on the hour in which the invocation occurred. The format of the Amazon S3 path is:\n", "\n", "`s3://{destination-bucket-prefix}/{endpoint-name}/{variant-name}/yyyy/mm/dd/hh/filename.jsonl`" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found Capture Files:\n", "sagemaker/DEMO-ModelMonitor/datacapture/DEMO-xgb-churn-pred-model-monitor-2021-02-02-12-30-27/AllTraffic/2021/02/02/12/41-57-151-55d98342-3097-4e58-ad99-b193fbc45df6.jsonl\n", " sagemaker/DEMO-ModelMonitor/datacapture/DEMO-xgb-churn-pred-model-monitor-2021-02-02-12-30-27/AllTraffic/2021/02/02/12/42-57-388-0186e29c-95d7-44d2-988a-6224237b6ef9.jsonl\n" ] } ], "source": [ "# cell 07\n", "s3_client = boto3.Session().client('s3')\n", "current_endpoint_capture_prefix = '{}/{}'.format(data_capture_prefix, endpoint_name)\n", "result = s3_client.list_objects(Bucket=bucket, Prefix=current_endpoint_capture_prefix)\n", "capture_files = [capture_file.get(\"Key\") for capture_file in result.get('Contents')]\n", "print(\"Found Capture Files:\")\n", "print(\"\\n \".join(capture_files))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, view the contents of a single capture file. Here you should see all the data captured in an Amazon SageMaker specific JSON-line formatted file. Take a quick peek at the first few lines in the captured file." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"text/csv\",\"mode\":\"INPUT\",\"data\":\"92,0,176.3,85,93.4,125,207.2,107,9.6,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0\",\"encoding\":\"CSV\"},\"endpointOutput\":{\"observedContentType\":\"text/csv; charset=utf-8\",\"mode\":\"OUTPUT\",\"data\":\"0.039806101471185684\",\"encoding\":\"CSV\"}},\"eventMetadata\":{\"eventId\":\"8ea647eb-96e1-4f4f-ad5b-c3c953cc3fa6\",\"inferenceTime\":\"2021-02-02T12:42:57Z\"},\"eventVersion\":\"0\"}\n", "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"text/csv\",\"mode\":\"INPUT\",\"data\":\"138,0,46.5,104,186.0,114,167.5,95,9.6,4,4,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0\",\"encoding\":\"CSV\"},\"endpointOutput\":{\"observedContentType\":\"text/csv; charset=utf-8\",\"mode\":\"OUTPUT\",\"data\":\"0.9562002420425415\",\"encoding\":\"CSV\"}},\"eventMetadata\":{\"eventId\":\"c4d500cc-8020-42cf-8e69-d279b7dd6385\",\"inferenceTime\":\"2021-02-02T12:42:57Z\"},\"eventVersion\":\"0\"}\n", "{\"captureData\":{\"endpointInput\":{\"observedContentType\":\"text/csv\",\"mode\":\"INPUT\",\"data\":\"93,0,176.1,103,199.7,130,263.9,96,8.5,6,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0\",\"encoding\":\"CSV\"},\"endpointOutput\":{\"observedContentType\":\"text/csv; charset=utf-8\",\"mode\":\"OUTPUT\",\"data\":\"0.007474285550415516\",\"encoding\":\"CSV\"}},\"eventMetadata\":{\"eventId\":\"b8d95698-55a4-403f-8c08-2db0bbc82d58\",\"inferenceTime\":\"2021-02-02T12:42:58Z\"},\"eventVersion\":\"0\"}\n", "\n" ] } ], "source": [ "# cell 08\n", "def get_obj_body(obj_key):\n", " return s3_client.get_object(Bucket=bucket, Key=obj_key).get('Body').read().decode(\"utf-8\")\n", "\n", "capture_file = get_obj_body(capture_files[-1])\n", "print(capture_file[:2000])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, the contents of a single line is present below in a formatted JSON file so that you can observe a little better." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"captureData\": {\n", " \"endpointInput\": {\n", " \"observedContentType\": \"text/csv\",\n", " \"mode\": \"INPUT\",\n", " \"data\": \"92,0,176.3,85,93.4,125,207.2,107,9.6,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0\",\n", " \"encoding\": \"CSV\"\n", " },\n", " \"endpointOutput\": {\n", " \"observedContentType\": \"text/csv; charset=utf-8\",\n", " \"mode\": \"OUTPUT\",\n", " \"data\": \"0.039806101471185684\",\n", " \"encoding\": \"CSV\"\n", " }\n", " },\n", " \"eventMetadata\": {\n", " \"eventId\": \"8ea647eb-96e1-4f4f-ad5b-c3c953cc3fa6\",\n", " \"inferenceTime\": \"2021-02-02T12:42:57Z\"\n", " },\n", " \"eventVersion\": \"0\"\n", "}\n" ] } ], "source": [ "# cell 09\n", "import json\n", "print(json.dumps(json.loads(capture_file.split('\\n')[0]), indent=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, each inference request is captured in one line in the jsonl file. The line contains both the input and output merged together. In the example, you provided the ContentType as `text/csv` which is reflected in the `observedContentType` value. Also, you expose the encoding that you used to encode the input and output payloads in the capture format with the `encoding` value.\n", "\n", "To recap, you observed how you can enable capturing the input or output payloads to an endpoint with a new parameter. You have also observed what the captured format looks like in Amazon S3. Next, continue to explore how Amazon SageMaker helps with monitoring the data collected in Amazon S3." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## PART B: Model Monitor - Baseling and continuous monitoring" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to collecting the data, Amazon SageMaker provides the capability for you to monitor and evaluate the data observed by the endpoints. For this:\n", "1. Create a baseline with which you compare the realtime traffic. \n", "1. Once a baseline is ready, setup a schedule to continously evaluate and compare against the baseline." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### 1. Constraint suggestion with baseline/training dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The training dataset with which you trained the model is usually a good baseline dataset. Note that the training dataset data schema and the inference dataset schema should exactly match (i.e. the number and order of the features).\n", "\n", "From the training dataset you can ask Amazon SageMaker to suggest a set of baseline `constraints` and generate descriptive `statistics` to explore the data. For this example, upload the training dataset that was used to train the pre-trained model included in this example. 12:51:47 INFO DAGScheduler:54 - Got job 6 (count at StatsGenerator.scala:67) with 1 output partitions\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - Final stage: ResultStage 10 (count at StatsGenerator.scala:67)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 9)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - Missing parents: List(ShuffleMapStage 9)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - Submitting ShuffleMapStage 9 (MapPartitionsRDD[56] at count at StatsGenerator.scala:67), which has no missing parents\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO MemoryStore:54 - Block broadcast_12 stored as values in memory (estimated size 64.2 KB, free 1458.1 MB)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO MemoryStore:54 - Block broadcast_12_piece0 stored as bytes in memory (estimated size 24.8 KB, free 1458.1 MB)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO BlockManagerInfo:54 - Added broadcast_12_piece0 in memory on (size: 24.8 KB, free: 1458.5 MB)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO SparkContext:54 - Created broadcast 12 from broadcast at DAGScheduler.scala:1039\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - Submitting 1 missing tasks from ShuffleMapStage 9 (MapPartitionsRDD[56] at count at StatsGenerator.scala:67) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO YarnScheduler:54 - Adding task set 9.0 with 1 tasks\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO TaskSetManager:54 - Starting task 0.0 in stage 9.0 (TID 9, algo-1, executor 1, partition 0, PROCESS_LOCAL, 8352 bytes)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO BlockManagerInfo:54 - Added broadcast_12_piece0 in memory on algo-1:45359 (size: 24.8 KB, free: 5.8 GB)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO TaskSetManager:54 - Finished task 0.0 in stage 9.0 (TID 9) in 103 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO YarnScheduler:54 - Removed TaskSet 9.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - ShuffleMapStage 9 (count at StatsGenerator.scala:67) finished in 0.118 s\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - looking for newly runnable stages\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - running: Set()\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - waiting: Set(ResultStage 10)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - failed: Set()\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - Submitting ResultStage 10 (MapPartitionsRDD[59] at count at StatsGenerator.scala:67), which has no missing parents\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO MemoryStore:54 - Block broadcast_13 stored as values in memory (estimated size 7.4 KB, free 1458.0 MB)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO MemoryStore:54 - Block broadcast_13_piece0 stored as bytes in memory (estimated size 3.8 KB, free 1458.0 MB)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO BlockManagerInfo:54 - Added broadcast_13_piece0 in memory on (size: 3.8 KB, free: 1458.5 MB)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO SparkContext:54 - Created broadcast 13 from broadcast at DAGScheduler.scala:1039\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - Submitting 1 missing tasks from ResultStage 10 (MapPartitionsRDD[59] at count at StatsGenerator.scala:67) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO YarnScheduler:54 - Adding task set 10.0 with 1 tasks\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO TaskSetManager:54 - Starting task 0.0 in stage 10.0 (TID 10, algo-1, executor 1, partition 0, NODE_LOCAL, 7765 bytes)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO BlockManagerInfo:54 - Added broadcast_13_piece0 in memory on algo-1:45359 (size: 3.8 KB, free: 5.8 GB)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO MapOutputTrackerMasterEndpoint:54 - Asked to send map output locations for shuffle 3 to\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO TaskSetManager:54 - Finished task 0.0 in stage 10.0 (TID 10) in 59 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO YarnScheduler:54 - Removed TaskSet 10.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - ResultStage 10 (count at StatsGenerator.scala:67) finished in 0.082 s\u001b[0m\n", "\u001b[34m2021-02-02 12:51:47 INFO DAGScheduler:54 - Job 6 finished: count at StatsGenerator.scala:67, took 0.205966 s\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO StatsGenerator:70 - Stats: {\n", " \"version\" : 0.0,\n", " \"dataset\" : {\n", " \"item_count\" : 2333\n", " },\n", " \"features\" : [ {\n", " \"name\" : \"Churn\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 0.1393056150878697,\n", " \"sum\" : 325.0,\n", " \"std_dev\" : 0.34626515951342846,\n", " \"min\" : 0.0,\n", " \"max\" : 1.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 0.1,\n", " \"count\" : 2008.0\n", " }, {\n", " \"lower_bound\" : 0.1,\n", " \"upper_bound\" : 0.2,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.2,\n", " \"upper_bound\" : 0.3,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.3,\n", " \"upper_bound\" : 0.4,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.4,\n", " \"upper_bound\" : 0.5,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.5,\n", " \"upper_bound\" : 0.6,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.6,\n", " \"upper_bound\" : 0.7,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.7,\n", " \"upper_bound\" : 0.8,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.8,\n", " \"upper_bound\" : 0.9,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.9,\n", " \"upper_bound\" : 1.0,\n", " \"count\" : 325.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", " \"data\" : [ [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ] ]\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"Account Length\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 101.2768966995285,\n", " \"sum\" : 236279.0,\n", " \"std_dev\" : 39.552442167470566,\n", " \"min\" : 1.0,\n", " \"max\" : 243.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 1.0,\n", " \"upper_bound\" : 25.2,\n", " \"count\" : 70.0\n", " }, {\n", " \"lower_bound\" : 25.2,\n", " \"upper_bound\" : 49.4,\n", " \"count\" : 150.0\n", " }, {\n", " \"lower_bound\" : 49.4,\n", " \"upper_bound\" : 73.6,\n", " \"count\" : 353.0\n", " }, {\n", " \"lower_bound\" : 73.6,\n", " \"upper_bound\" : 97.8,\n", " \"count\" : 518.0\n", " }, {\n", " \"lower_bound\" : 97.8,\n", " \"upper_bound\" : 122.0,\n", " \"count\" : 538.0\n", " }, {\n", " \"lower_bound\" : 122.0,\n", " \"upper_bound\" : 146.2,\n", " \"count\" : 401.0\n", " }, {\n", " \"lower_bound\" : 146.2,\n", " \"upper_bound\" : 170.4,\n", " \"count\" : 208.0\n", " }, {\n", " \"lower_bound\" : 170.4,\n", " \"upper_bound\" : 194.6,\n", " \"count\" : 72.0\n", " }, {\n", " \"lower_bound\" : 194.6,\n", " \"upper_bound\" : 218.8,\n", " \"count\" : 19.0\n", " }, {\n", " \"lower_bound\" : 218.8,\n", " \"upper_bound\" : 243.0,\n", " \"count\" : 4.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", " \"data\" : [ [ 119.0, 100.0, 111.0, 181.0, 95.0, 104.0, 70.0, 120.0, 88.0, 111.0, 33.0, 106.0, 54.0, 87.0, 94.0, 135.0, 107.0, 159.0, 106.0, 136.0, 116.0, 115.0, 103.0, 95.0, 115.0, 143.0, 48.0, 94.0, 153.0, 94.0, 107.0, 91.0, 141.0, 58.0, 49.0, 41.0, 137.0, 111.0, 71.0, 43.0, 97.0, 3.0, 124.0, 86.0, 87.0, 83.0, 67.0, 46.0, 129.0, 90.0, 97.0, 87.0, 141.0, 136.0, 88.0, 170.0, 44.0, 121.0, 111.0, 105.0, 112.0, 73.0, 147.0, 66.0, 136.0, 119.0, 135.0, 102.0, 169.0, 60.0, 73.0, 83.0, 90.0, 148.0, 59.0, 152.0, 136.0, 112.0, 122.0, 44.0, 122.0, 89.0, 176.0, 64.0, 112.0, 133.0, 52.0, 91.0, 127.0, 153.0, 117.0, 163.0, 76.0, 80.0, 136.0, 91.0, 143.0, 125.0, 126.0, 87.0, 119.0, 13.0, 138.0, 159.0, 111.0, 46.0, 68.0, 107.0, 70.0, 215.0, 22.0, 122.0, 73.0, 75.0, 87.0, 148.0, 105.0, 182.0, 139.0, 105.0, 166.0, 60.0, 76.0, 28.0, 94.0, 146.0, 101.0, 132.0, 93.0, 105.0, 100.0, 134.0, 63.0, 126.0, 166.0, 160.0, 162.0, 70.0, 116.0, 75.0, 74.0, 115.0, 42.0, 132.0, 171.0, 135.0, 99.0, 27.0, 139.0, 76.0, 123.0, 54.0, 70.0, 163.0, 96.0, 62.0, 115.0, 97.0, 137.0, 82.0, 118.0, 64.0, 186.0, 117.0, 117.0, 116.0, 164.0, 103.0, 137.0, 97.0, 144.0, 96.0, 183.0, 42.0, 100.0, 131.0, 88.0, 91.0, 104.0, 63.0, 159.0, 147.0, 123.0, 100.0, 105.0, 163.0, 90.0, 125.0, 64.0, 113.0, 101.0, 123.0, 212.0, 73.0, 44.0, 96.0, 74.0, 77.0, 120.0, 122.0, 87.0, 52.0, 48.0, 61.0, 141.0, 170.0, 17.0, 162.0, 85.0, 160.0, 29.0, 91.0, 96.0, 104.0, 95.0, 84.0, 157.0, 165.0, 57.0, 95.0, 51.0, 97.0, 13.0, 50.0, 46.0, 121.0, 68.0, 72.0, 82.0, 38.0, 41.0, 96.0, 129.0, 31.0, 122.0, 51.0, 109.0, 161.0, 72.0, 65.0, 129.0, 137.0, 48.0, 134.0, 125.0, 153.0, 103.0, 45.0, 80.0, 57.0, 94.0, 59.0, 72.0, 62.0, 155.0, 96.0, 77.0, 58.0, 134.0, 24.0, 158.0, 89.0, 138.0, 61.0, 123.0, 87.0, 74.0, 37.0, 105.0, 56.0, 64.0, 202.0, 91.0, 120.0, 89.0, 95.0, 92.0, 45.0, 106.0, 125.0, 129.0, 159.0, 99.0 ], [ 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 5.0, 6.0, 7.0, 9.0, 9.0, 10.0, 11.0, 13.0, 13.0, 13.0, 15.0, 16.0, 16.0, 16.0, 17.0, 19.0, 19.0, 20.0, 21.0, 21.0, 21.0, 22.0, 23.0, 24.0, 24.0, 25.0, 26.0, 27.0, 27.0, 28.0, 28.0, 29.0, 30.0, 31.0, 31.0, 32.0, 32.0, 32.0, 33.0, 33.0, 34.0, 35.0, 35.0, 35.0, 36.0, 36.0, 36.0, 36.0, 37.0, 37.0, 37.0, 38.0, 38.0, 39.0, 39.0, 39.0, 40.0, 40.0, 40.0, 40.0, 41.0, 41.0, 41.0, 41.0, 42.0, 42.0, 43.0, 43.0, 43.0, 44.0, 44.0, 44.0, 45.0, 45.0, 45.0, 45.0, 46.0, 46.0, 46.0, 46.0, 47.0, 47.0, 47.0, 48.0, 48.0, 48.0, 48.0, 49.0, 49.0, 50.0, 50.0, 51.0, 51.0, 51.0, 51.0, 52.0, 52.0, 52.0, 52.0, 52.0, 53.0, 53.0, 53.0, 53.0, 54.0, 54.0, 54.0, 54.0, 54.0, 55.0, 55.0, 55.0, 55.0, 55.0, 55.0, 55.0, 56.0, 56.0, 56.0, 56.0, 57.0, 57.0, 57.0, 57.0, 57.0, 57.0, 58.0, 58.0, 58.0, 58.0, 58.0, 59.0, 59.0, 59.0, 59.0, 59.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 61.0, 61.0, 61.0, 61.0, 61.0, 61.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 63.0, 63.0, 63.0, 63.0, 63.0, 63.0, 63.0, 63.0, 64.0, 64.0, 64.0, 64.0, 64.0, 64.0, 64.0, 64.0, 65.0, 65.0, 65.0, 65.0, 65.0, 65.0, 65.0, 65.0, 66.0, 66.0, 66.0, 66.0, 66.0, 66.0, 66.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 68.0, 68.0, 68.0, 68.0, 68.0, 68.0, 68.0, 68.0, 68.0, 68.0, 69.0, 69.0, 69.0, 69.0, 69.0, 69.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 71.0, 71.0, 71.0, 71.0, 71.0, 71.0, 71.0, 72.0, 72.0, 72.0, 72.0, 72.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 75.0, 75.0, 75.0, 75.0, 75.0, 75.0, 75.0, 75.0, 75.0, 75.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 77.0, 77.0, 77.0, 77.0, 77.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 122.0, 122.0, 122.0, 122.0, 122.0, 122.0, 122.0, 122.0, 122.0, 122.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 125.0, 125.0, 125.0, 125.0, 125.0, 125.0, 125.0, 125.0, 126.0, 126.0, 126.0, 126.0, 126.0, 126.0, 126.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 128.0, 128.0, 128.0, 128.0, 128.0, 128.0, 128.0, 129.0, 129.0, 129.0, 129.0, 129.0, 129.0, 129.0, 130.0, 130.0, 130.0, 130.0, 130.0, 130.0, 130.0, 131.0, 131.0, 131.0, 131.0, 131.0, 132.0, 132.0, 132.0, 132.0, 132.0, 132.0, 132.0, 132.0, 132.0, 133.0, 133.0, 133.0, 133.0, 133.0, 133.0, 134.0, 134.0, 134.0, 134.0, 135.0, 135.0, 135.0, 135.0, 136.0, 136.0, 136.0, 136.0, 136.0, 136.0, 136.0, 137.0, 137.0, 137.0, 137.0, 137.0, 137.0, 137.0, 138.0, 138.0, 138.0, 138.0, 138.0, 138.0, 138.0, 139.0, 139.0, 139.0, 139.0, 139.0, 139.0, 139.0, 140.0, 140.0, 140.0, 140.0, 140.0, 141.0, 141.0, 141.0, 141.0, 141.0, 141.0, 141.0, 141.0, 142.0, 142.0, 142.0, 142.0, 142.0, 143.0, 143.0, 143.0, 143.0, 143.0, 144.0, 144.0, 144.0, 144.0, 144.0, 144.0, 145.0, 145.0, 145.0, 145.0, 145.0, 145.0, 146.0, 146.0, 146.0, 146.0, 146.0, 146.0, 147.0, 147.0, 147.0, 147.0, 147.0, 148.0, 148.0, 148.0, 148.0, 148.0, 148.0, 148.0, 148.0, 148.0, 148.0, 149.0, 149.0, 149.0, 149.0, 149.0, 149.0, 149.0, 150.0, 150.0, 150.0, 150.0, 151.0, 151.0, 151.0, 151.0, 151.0, 152.0, 152.0, 153.0, 153.0, 153.0, 154.0, 154.0, 154.0, 154.0, 155.0, 155.0, 155.0, 155.0, 155.0, 155.0, 156.0, 156.0, 156.0, 157.0, 157.0, 157.0, 157.0, 158.0, 158.0, 158.0, 159.0, 159.0, 159.0, 160.0, 160.0, 160.0, 160.0, 161.0, 161.0, 161.0, 161.0, 161.0, 162.0, 162.0, 163.0, 163.0, 163.0, 164.0, 164.0, 165.0, 165.0, 165.0, 166.0, 166.0, 166.0, 166.0, 167.0, 168.0, 168.0, 169.0, 169.0, 170.0, 170.0, 172.0, 172.0, 172.0, 173.0, 173.0, 174.0, 174.0, 174.0, 176.0, 176.0, 177.0, 177.0, 177.0, 179.0, 179.0, 180.0, 180.0, 181.0, 181.0, 182.0, 183.0, 183.0, 184.0, 184.0, 185.0, 185.0, 189.0, 189.0, 190.0, 190.0, 192.0, 193.0, 193.0, 195.0, 197.0, 201.0, 204.0, 205.0, 209.0, 210.0, 217.0, 224.0, 225.0 ] ]\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"VMail Message\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 8.214316330904415,\n", " \"sum\" : 19164.0,\n", " \"std_dev\" : 13.776907846587017,\n", " \"min\" : 0.0,\n", " \"max\" : 51.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 5.1,\n", " \"count\" : 1684.0\n", " }, {\n", " \"lower_bound\" : 5.1,\n", " \"upper_bound\" : 10.2,\n", " \"count\" : 2.0\n", " }, {\n", " \"lower_bound\" : 10.2,\n", " \"upper_bound\" : 15.3,\n", " \"count\" : 15.0\n", " }, {\n", " \"lower_bound\" : 15.3,\n", " \"upper_bound\" : 20.4,\n", " \"count\" : 52.0\n", " }, {\n", " \"lower_bound\" : 20.4,\n", " \"upper_bound\" : 25.5,\n", " \"count\" : 127.0\n", " }, {\n", " \"lower_bound\" : 25.5,\n", " \"upper_bound\" : 30.6,\n", " \"count\" : 171.0\n", " }, {\n", " \"lower_bound\" : 30.6,\n", " \"upper_bound\" : 35.7,\n", " \"count\" : 135.0\n", " }, {\n", " \"lower_bound\" : 35.7,\n", " \"upper_bound\" : 40.8,\n", " \"count\" : 106.0\n", " }, {\n", " \"lower_bound\" : 40.8,\n", " \"upper_bound\" : 45.9,\n", " \"count\" : 32.0\n", " }, {\n", " \"lower_bound\" : 45.9,\n", " \"upper_bound\" : 51.0,\n", " \"count\" : 9.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", " \"data\" : [ [ 19.0, 0.0, 0.0, 40.0, 36.0, 0.0, 0.0, 24.0, 0.0, 0.0, 35.0, 0.0, 0.0, 0.0, 0.0, 41.0, 0.0, 0.0, 0.0, 24.0, 0.0, 33.0, 0.0, 37.0, 0.0, 0.0, 43.0, 0.0, 31.0, 28.0, 0.0, 37.0, 0.0, 0.0, 28.0, 34.0, 0.0, 0.0, 0.0, 35.0, 0.0, 36.0, 0.0, 29.0, 0.0, 30.0, 35.0, 0.0, 33.0, 0.0, 0.0, 0.0, 37.0, 0.0, 0.0, 0.0, 0.0, 24.0, 0.0, 24.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 26.0, 0.0, 0.0, 29.0, 20.0, 33.0, 0.0, 0.0, 0.0, 23.0, 0.0, 0.0, 22.0, 0.0, 32.0, 0.0, 34.0, 0.0, 22.0, 0.0, 0.0, 0.0, 0.0, 27.0, 0.0, 0.0, 29.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 23.0, 0.0, 28.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 19.0, 16.0, 0.0, 0.0, 0.0, 38.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 19.0, 46.0, 0.0, 16.0, 0.0, 0.0, 0.0, 22.0, 0.0, 0.0, 0.0, 0.0, 0.0, 39.0, 0.0, 25.0, 33.0, 0.0, 0.0, 0.0, 19.0, 0.0, 36.0, 0.0, 26.0, 0.0, 0.0, 35.0, 30.0, 0.0, 50.0, 28.0, 0.0, 26.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 49.0, 15.0, 0.0, 0.0, 27.0, 0.0, 23.0, 0.0, 0.0, 0.0, 0.0, 21.0, 0.0, 0.0, 0.0, 0.0, 26.0, 33.0, 0.0, 0.0, 0.0, 0.0, 24.0, 36.0, 40.0, 0.0, 0.0, 35.0, 0.0, 0.0, 0.0, 37.0, 0.0, 0.0, 0.0, 27.0, 0.0, 0.0, 0.0, 30.0, 0.0, 28.0, 0.0, 0.0, 0.0, 0.0, 35.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 28.0, 0.0, 0.0, 0.0, 38.0, 0.0, 29.0, 31.0, 0.0, 34.0, 0.0, 0.0, 0.0, 24.0, 0.0, 38.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.0, 21.0, 0.0, 30.0, 32.0, 0.0, 0.0, 0.0, 17.0, 27.0, 0.0, 0.0, 0.0, 0.0, 31.0, 0.0, 31.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 33.0 ], [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 9.0, 12.0, 13.0, 14.0, 14.0, 14.0, 15.0, 15.0, 16.0, 16.0, 16.0, 16.0, 17.0, 17.0, 17.0, 18.0, 18.0, 18.0, 19.0, 19.0, 19.0, 19.0, 19.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 22.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 24.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 26.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 29.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 31.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 33.0, 33.0, 33.0, 33.0, 33.0, 33.0, 33.0, 33.0, 33.0, 33.0, 33.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 35.0, 35.0, 35.0, 35.0, 35.0, 35.0, 35.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 38.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 40.0, 40.0, 40.0, 40.0, 40.0, 40.0, 41.0, 41.0, 41.0, 42.0, 42.0, 42.0, 42.0, 42.0, 43.0, 43.0, 43.0, 44.0, 44.0, 45.0, 45.0, 46.0, 47.0, 50.0 ] ]\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"Day Mins\",\n", " \"inferred_type\" : \"Fractional\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 180.22648949849963,\n", " \"sum\" : 420468.3999999996,\n", " \"std_dev\" : 53.987178959901556,\n", " \"min\" : 0.0,\n", " \"max\" : 350.8,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 35.08,\n", " \"count\" : 14.0\n", " }, {\n", " \"lower_bound\" : 35.08,\n", " \"upper_bound\" : 70.16,\n", " \"count\" : 48.0\n", " }, {\n", " \"lower_bound\" : 70.16,\n", " \"upper_bound\" : 105.24000000000001,\n", " \"count\" : 130.0\n", " }, {\n", " \"lower_bound\" : 105.24000000000001,\n", " \"upper_bound\" : 140.32,\n", " \"count\" : 318.0\n", " }, {\n", " \"lower_bound\" : 140.32,\n", " \"upper_bound\" : 175.4,\n", " \"count\" : 565.0\n", " }, {\n", " \"lower_bound\" : 175.4,\n", " \"upper_bound\" : 210.48000000000002,\n", " \"count\" : 587.0\n", " }, {\n", " \"lower_bound\" : 210.48000000000002,\n", " \"upper_bound\" : 245.56,\n", " \"count\" : 423.0\n", " }, {\n", " \"lower_bound\" : 245.56,\n", " \"upper_bound\" : 280.64,\n", " \"count\" : 180.0\n", " }, {\n", " \"lower_bound\" : 280.64,\n", " \"upper_bound\" : 315.72,\n", " \"count\" : 58.0\n", " }, {\n", " \"lower_bound\" : 315.72,\n", " \"upper_bound\" : 350.8,\n", " \"count\" : 10.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", " \"data\" : [ [ 178.1, 160.3, 197.1, 105.2, 283.1, 113.6, 232.1, 212.7, 73.3, 176.9, 161.9, 128.6, 190.5, 223.2, 157.9, 173.1, 273.5, 275.8, 119.2, 174.6, 133.3, 145.0, 150.6, 220.2, 109.7, 155.4, 172.0, 235.6, 218.5, 92.7, 90.7, 162.3, 146.5, 210.1, 214.4, 194.4, 237.3, 255.9, 197.9, 200.2, 120.8, 118.1, 131.8, 225.4, 205.2, 272.5, 181.1, 122.2, 119.6, 109.6, 112.7, 136.3, 185.4, 199.6, 218.2, 259.9, 143.2, 218.2, 249.8, 274.7, 167.8, 182.3, 157.0, 207.7, 250.2, 81.9, 246.8, 103.1, 147.2, 252.7, 192.2, 226.4, 145.5, 178.3, 133.1, 214.6, 203.9, 185.4, 140.0, 240.3, 134.2, 141.1, 47.4, 200.4, 167.6, 221.1, 165.5, 175.3, 146.7, 167.7, 184.5, 202.9, 273.3, 194.8, 187.7, 133.7, 209.1, 260.8, 211.6, 156.8, 109.2, 303.2, 240.8, 167.4, 110.4, 90.4, 162.1, 212.1, 214.8, 83.6, 182.1, 170.5, 198.2, 143.2, 184.5, 185.2, 156.5, 69.1, 139.0, 101.4, 274.3, 220.6, 107.3, 121.7, 163.5, 176.6, 118.9, 240.1, 179.3, 246.4, 177.1, 258.8, 211.8, 226.2, 220.7, 166.4, 115.1, 213.4, 155.7, 214.1, 200.4, 133.3, 155.4, 195.1, 189.8, 197.1, 217.2, 236.7, 192.8, 224.4, 114.4, 206.9, 134.7, 219.6, 183.3, 281.0, 147.9, 144.2, 175.3, 199.3, 294.9, 97.2, 74.3, 161.6, 181.5, 118.0, 238.8, 246.5, 186.5, 202.3, 201.1, 145.8, 116.7, 303.9, 107.2, 211.8, 113.7, 149.0, 118.5, 214.9, 113.9, 225.2, 172.2, 221.7, 150.0, 160.0, 142.4, 182.3, 219.6, 92.6, 238.0, 224.0, 226.0, 224.4, 204.6, 175.8, 193.7, 169.4, 252.1, 173.6, 169.1, 170.9, 230.9, 105.0, 215.6, 285.7, 198.5, 220.6, 96.7, 97.5, 235.0, 109.8, 197.7, 264.0, 129.5, 159.0, 276.2, 207.7, 234.5, 167.6, 276.7, 146.0, 220.4, 131.7, 250.3, 68.7, 207.6, 118.2, 208.8, 137.8, 209.9, 179.5, 216.0, 210.5, 234.1, 181.5, 222.5, 240.4, 109.1, 158.1, 193.0, 205.9, 198.0, 244.1, 240.7, 122.5, 111.8, 155.7, 236.6, 149.3, 181.3, 151.8, 139.9, 248.7, 61.6, 247.6, 185.9, 178.1, 80.3, 235.6, 172.4, 178.7, 225.2, 187.5, 204.4, 134.2, 262.3, 191.1, 109.6, 197.0, 228.6, 115.4, 147.2, 198.8, 129.2, 238.1, 208.0, 211.3, 194.8, 143.2, 143.7, 198.8, 179.1 ], [ 0.0, 2.6, 7.9, 17.6, 19.5, 27.0, 34.0, 37.8, 40.9, 45.0, 47.8, 49.9, 51.1, 51.8, 54.7, 55.3, 55.6, 57.5, 58.4, 58.9, 60.0, 61.3, 61.9, 62.4, 62.8, 62.9, 64.9, 67.4, 68.5, 70.7, 70.9, 72.5, 72.8, 75.8, 77.6, 78.6, 80.3, 81.3, 81.7, 82.3, 82.5, 82.6, 83.2, 83.8, 84.8, 85.7, 85.9, 86.0, 86.3, 87.2, 87.7, 88.5, 89.5, 89.7, 90.0, 91.5, 92.3, 92.8, 93.4, 93.8, 94.7, 95.0, 95.5, 95.9, 96.3, 96.8, 97.5, 98.0, 98.2, 98.4, 99.4, 99.9, 100.8, 101.1, 101.7, 102.1, 102.6, 102.8, 103.2, 103.4, 103.5, 103.7, 104.6, 104.7, 104.9, 105.0, 105.3, 105.8, 105.9, 106.4, 106.7, 107.5, 107.8, 108.3, 108.6, 109.1, 109.4, 109.5, 110.1, 110.5, 111.1, 111.6, 111.9, 112.8, 113.0, 113.3, 114.3, 114.3, 114.8, 115.0, 115.4, 115.4, 115.5, 115.5, 115.6, 115.8, 115.9, 116.2, 116.9, 117.5, 117.6, 117.9, 118.1, 118.4, 119.0, 119.2, 119.3, 119.7, 120.5, 120.7, 120.9, 121.1, 121.7, 122.0, 122.9, 123.1, 123.2, 123.7, 123.7, 124.0, 124.1, 124.3, 124.3, 124.4, 124.5, 124.7, 124.8, 125.0, 125.2, 125.4, 125.5, 125.7, 126.0, 126.1, 126.3, 126.7, 126.9, 127.3, 127.8, 128.2, 128.3, 128.5, 128.7, 128.8, 129.0, 129.3, 129.4, 129.5, 129.7, 129.9, 130.1, 130.2, 130.5, 130.9, 131.1, 131.6, 131.9, 132.0, 132.1, 132.4, 133.1, 133.3, 133.3, 133.5, 133.8, 134.0, 134.2, 134.3, 134.4, 134.7, 134.8, 134.9, 135.0, 135.1, 135.2, 135.4, 135.8, 135.9, 136.1, 136.1, 136.4, 136.7, 136.8, 137.0, 137.1, 137.4, 137.5, 137.9, 138.1, 138.3, 138.5, 138.7, 138.9, 139.0, 139.2, 139.3, 139.4, 139.6, 139.7, 139.8, 140.1, 140.1, 140.5, 140.7, 141.1, 141.2, 141.4, 141.6, 141.7, 142.0, 142.1, 142.3, 142.3, 142.4, 142.5, 142.5, 142.8, 142.9, 143.3, 143.3, 143.4, 143.5, 143.6, 143.7, 143.7, 143.9, 144.0, 144.2, 144.4, 144.5, 144.6, 144.8, 144.9, 145.0, 145.3, 145.5, 145.6, 145.9, 146.2, 146.3, 146.3, 146.3, 146.4, 146.4, 146.5, 146.6, 146.7, 146.8, 147.0, 147.0, 147.1, 147.2, 147.5, 147.7, 147.9, 148.1, 148.2, 148.2, 148.4, 148.5, 148.5, 148.6, 148.7, 149.0, 149.2, 149.3, 149.4, 149.4, 149.7, 149.7, 149.8, 149.9, 150.0, 150.1, 150.4, 150.5, 150.6, 150.7, 151.0, 151.1, 151.5, 151.5, 151.7, 151.8, 152.0, 152.1, 152.4, 152.6, 152.9, 153.1, 153.2, 153.2, 153.5, 153.5, 153.5, 153.6, 153.7, 153.8, 154.0, 154.0, 154.0, 154.1, 154.2, 154.3, 154.4, 154.5, 154.5, 154.6, 154.6, 154.7, 154.8, 155.0, 155.0, 155.2, 155.3, 155.3, 155.5, 156.0, 156.1, 156.2, 156.4, 156.6, 156.7, 157.0, 157.1, 157.2, 157.3, 157.4, 157.6, 157.6, 157.7, 157.8, 158.0, 158.0, 158.4, 158.6, 158.6, 158.7, 158.7, 158.8, 159.0, 159.1, 159.3, 159.5, 159.7, 159.7, 159.8, 159.9, 160.0, 160.1, 160.1, 160.2, 160.4, 160.4, 160.6, 160.9, 161.1, 161.3, 161.5, 161.5, 161.7, 161.9, 162.0, 162.1, 162.3, 162.4, 162.6, 162.8, 163.0, 163.3, 163.5, 163.6, 163.7, 163.8, 164.1, 164.2, 164.5, 164.8, 164.9, 165.0, 165.3, 165.4, 165.4, 165.8, 165.9, 166.0, 166.1, 166.2, 166.5, 166.5, 166.6, 166.8, 166.9, 166.9, 167.1, 167.3, 167.4, 167.5, 167.8, 167.8, 167.9, 168.0, 168.3, 168.4, 168.6, 168.6, 168.6, 168.8, 169.2, 169.2, 169.3, 169.3, 169.5, 169.5, 169.6, 169.7, 169.8, 169.9, 169.9, 170.2, 170.5, 170.5, 170.6, 170.7, 170.7, 170.9, 171.2, 171.2, 171.5, 171.6, 171.7, 171.7, 171.8, 172.1, 172.3, 172.5, 172.7, 172.8, 173.0, 173.0, 173.1, 173.2, 173.5, 173.9, 174.1, 174.3, 174.4, 174.4, 174.5, 174.5, 174.7, 174.7, 174.8, 174.9, 175.2, 175.3, 175.4, 175.4, 175.5, 175.5, 175.7, 175.7, 175.8, 175.9, 176.0, 176.2, 176.2, 176.3, 176.4, 176.6, 176.8, 176.9, 177.2, 177.2, 177.3, 177.5, 177.9, 178.2, 178.3, 178.4, 178.7, 178.8, 178.8, 179.1, 179.2, 179.2, 179.3, 179.4, 179.4, 179.5, 179.9, 180.0, 180.0, 180.5, 180.7, 180.7, 180.9, 181.1, 181.4, 181.5, 181.5, 181.6, 181.8, 182.0, 182.1, 182.1, 182.3, 182.6, 182.8, 183.0, 183.1, 183.1, 183.2, 183.3, 183.4, 183.4, 183.6, 183.8, 183.9, 184.0, 184.1, 184.2, 184.5, 184.6, 184.8, 185.0, 185.0, 185.1, 185.3, 185.3, 185.6, 186.0, 186.0, 186.1, 186.2, 186.4, 186.7, 186.8, 187.1, 187.2, 187.3, 187.4, 187.5, 187.7, 187.8, 187.8, 187.9, 188.0, 188.5, 188.8, 188.9, 189.1, 189.2, 189.3, 189.3, 189.5, 189.6, 189.8, 189.8, 190.0, 190.1, 190.2, 190.3, 190.3, 190.4, 190.4, 190.5, 190.7, 190.8, 190.9, 191.0, 191.1, 191.3, 191.3, 191.4, 191.4, 191.9, 191.9, 192.0, 192.1, 192.3, 192.3, 192.6, 192.6, 193.0, 193.0, 193.2, 193.3, 193.4, 193.6, 193.7, 193.8, 193.9, 194.0, 194.2, 194.3, 194.4, 194.6, 194.8, 194.8, 194.9, 195.0, 195.3, 195.4, 195.5, 195.7, 195.9, 196.0, 196.1, 196.1, 196.4, 196.5, 196.6, 196.6, 196.7, 196.8, 197.0, 197.1, 197.2, 197.3, 197.4, 197.6, 197.7, 197.8, 197.9, 198.1, 198.2, 198.3, 198.4, 198.4, 198.6, 198.7, 198.8, 198.9, 199.1, 199.2, 199.3, 199.6, 200.0, 200.2, 200.3, 200.3, 200.6, 201.1, 201.3, 201.3, 201.4, 201.5, 201.8, 201.9, 201.9, 202.0, 202.1, 202.4, 202.6, 202.7, 202.9, 203.1, 203.2, 203.3, 203.3, 203.4, 203.4, 203.5, 203.5, 203.7, 203.9, 204.0, 204.2, 204.4, 204.5, 204.5, 204.7, 204.9, 205.0, 205.1, 205.2, 205.2, 205.4, 205.7, 205.7, 205.9, 206.0, 206.2, 206.2, 206.2, 206.3, 206.5, 206.7, 206.9, 206.9, 207.0, 207.0, 207.2, 207.3, 207.6, 207.6, 207.7, 207.8, 207.9, 208.0, 208.3, 208.7, 208.8, 208.9, 209.1, 209.2, 209.4, 209.4, 209.7, 209.8, 209.9, 210.2, 210.3, 210.5, 210.6, 210.7, 210.8, 211.0, 211.1, 211.1, 211.2, 211.3, 211.7, 211.9, 212.0, 212.1, 212.4, 212.8, 213.0, 213.0, 213.1, 213.4, 213.5, 213.6, 213.8, 214.0, 214.1, 214.2, 214.2, 214.3, 214.3, 214.6, 214.7, 214.9, 215.1, 215.4, 215.5, 215.6, 215.8, 215.9, 215.9, 216.0, 216.0, 216.2, 216.4, 216.7, 216.8, 216.9, 217.1, 217.2, 217.2, 217.5, 217.8, 218.0, 218.5, 218.7, 218.8, 218.9, 219.2, 219.4, 219.9, 220.0, 220.1, 220.2, 220.3, 220.7, 220.8, 220.9, 221.0, 221.1, 221.1, 221.3, 221.4, 221.6, 221.8, 221.9, 222.1, 222.3, 222.4, 222.6, 222.7, 222.8, 223.0, 223.2, 223.5, 223.9, 224.0, 224.2, 224.5, 224.7, 224.9, 225.0, 225.2, 225.3, 225.5, 225.9, 226.2, 226.5, 226.9, 227.1, 227.2, 227.4, 227.4, 227.5, 227.8, 228.1, 228.4, 228.7, 228.9, 229.3, 229.6, 229.9, 230.1, 230.2, 230.2, 230.4, 230.6, 230.7, 230.9, 231.0, 231.3, 231.8, 231.9, 232.1, 232.4, 232.5, 232.6, 232.8, 233.2, 233.5, 233.8, 234.2, 234.4, 234.5, 234.8, 234.9, 235.1, 235.5, 235.6, 235.7, 236.2, 236.2, 236.4, 236.5, 236.8, 236.8, 237.1, 237.7, 237.8, 238.0, 238.9, 239.1, 239.2, 239.2, 239.5, 239.7, 239.8, 239.8, 239.9, 239.9, 240.2, 240.3, 240.8, 241.1, 241.2, 241.7, 241.8, 241.9, 242.2, 242.4, 242.5, 242.9, 243.0, 243.1, 243.4, 243.4, 243.7, 243.9, 244.3, 244.8, 244.9, 245.0, 245.2, 245.3, 245.5, 245.7, 245.8, 246.2, 246.4, 246.8, 247.2, 247.5, 247.8, 248.6, 248.9, 249.4, 249.6, 249.9, 250.9, 251.4, 251.5, 251.6, 252.3, 252.4, 252.9, 253.0, 253.2, 253.4, 254.1, 254.4, 254.7, 254.9, 255.8, 256.0, 256.5, 256.7, 257.2, 257.7, 258.0, 259.4, 259.8, 260.5, 261.3, 261.4, 261.7, 261.9, 262.2, 262.9, 263.8, 264.4, 265.1, 265.9, 266.0, 266.3, 266.7, 267.4, 268.4, 268.7, 269.0, 269.8, 270.3, 270.7, 271.1, 271.2, 271.5, 271.6, 271.8, 272.4, 272.6, 272.7, 273.2, 273.6, 274.0, 274.6, 275.2, 276.5, 277.0, 277.9, 278.5, 279.1, 279.8, 280.0, 280.4, 281.1, 281.3, 282.5, 283.2, 283.9, 285.7, 286.4, 287.1, 287.4, 288.0, 288.5, 289.5, 290.4, 291.8, 293.3, 295.0, 298.1, 299.5, 301.5, 305.1, 308.0, 309.9, 312.0, 313.2, 314.1, 315.6, 322.3, 322.4, 326.3, 335.5, 345.3 ] ]\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"Day Calls\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 100.25932276039434,\n", " \"sum\" : 233905.0,\n", " \"std_dev\" : 20.165008436664074,\n", " \"min\" : 0.0,\n", " \"max\" : 165.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 16.5,\n", " \"count\" : 2.0\n", " }, {\n", " \"lower_bound\" : 16.5,\n", " \"upper_bound\" : 33.0,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 33.0,\n", " \"upper_bound\" : 49.5,\n", " \"count\" : 14.0\n", " }, {\n", " \"lower_bound\" : 49.5,\n", " \"upper_bound\" : 66.0,\n", " \"count\" : 80.0\n", " }, {\n", " \"lower_bound\" : 66.0,\n", " \"upper_bound\" : 82.5,\n", " \"count\" : 344.0\n", " }, {\n", " \"lower_bound\" : 82.5,\n", " \"upper_bound\" : 99.0,\n", " \"count\" : 636.0\n", " }, {\n", " \"lower_bound\" : 99.0,\n", " \"upper_bound\" : 115.5,\n", " \"count\" : 737.0\n", " }, {\n", " \"lower_bound\" : 115.5,\n", " \"upper_bound\" : 132.0,\n", " \"count\" : 377.0\n", " }, {\n", " \"lower_bound\" : 132.0,\n", " \"upper_bound\" : 148.5,\n", " \"count\" : 127.0\n", " }, {\n", " \"lower_bound\" : 148.5,\n", " \"upper_bound\" : 165.0,\n", " \"count\" : 16.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", " \"data\" : [ [ 110.0, 138.0, 117.0, 61.0, 112.0, 87.0, 122.0, 73.0, 86.0, 128.0, 85.0, 83.0, 108.0, 109.0, 105.0, 85.0, 104.0, 103.0, 142.0, 76.0, 94.0, 72.0, 125.0, 109.0, 148.0, 112.0, 111.0, 131.0, 130.0, 107.0, 90.0, 107.0, 121.0, 126.0, 78.0, 63.0, 103.0, 97.0, 108.0, 105.0, 96.0, 117.0, 82.0, 79.0, 106.0, 105.0, 59.0, 67.0, 104.0, 88.0, 119.0, 97.0, 87.0, 89.0, 76.0, 68.0, 77.0, 88.0, 109.0, 99.0, 88.0, 115.0, 79.0, 85.0, 121.0, 75.0, 129.0, 70.0, 115.0, 97.0, 86.0, 117.0, 92.0, 98.0, 114.0, 108.0, 106.0, 114.0, 101.0, 146.0, 85.0, 92.0, 125.0, 80.0, 100.0, 137.0, 78.0, 96.0, 91.0, 104.0, 97.0, 100.0, 66.0, 116.0, 84.0, 75.0, 127.0, 81.0, 70.0, 93.0, 96.0, 133.0, 104.0, 68.0, 103.0, 108.0, 86.0, 95.0, 87.0, 148.0, 94.0, 94.0, 107.0, 92.0, 81.0, 87.0, 102.0, 114.0, 110.0, 48.0, 110.0, 57.0, 140.0, 48.0, 136.0, 88.0, 112.0, 115.0, 93.0, 83.0, 88.0, 85.0, 84.0, 88.0, 106.0, 117.0, 89.0, 86.0, 104.0, 62.0, 87.0, 110.0, 127.0, 100.0, 122.0, 113.0, 112.0, 110.0, 104.0, 121.0, 91.0, 143.0, 96.0, 99.0, 115.0, 66.0, 109.0, 91.0, 96.0, 112.0, 106.0, 80.0, 107.0, 104.0, 95.0, 103.0, 100.0, 47.0, 94.0, 97.0, 99.0, 108.0, 92.0, 106.0, 98.0, 115.0, 67.0, 115.0, 92.0, 86.0, 102.0, 111.0, 92.0, 100.0, 106.0, 104.0, 126.0, 64.0, 126.0, 85.0, 88.0, 99.0, 127.0, 90.0, 117.0, 96.0, 91.0, 102.0, 110.0, 110.0, 105.0, 71.0, 92.0, 78.0, 113.0, 44.0, 123.0, 117.0, 97.0, 113.0, 101.0, 100.0, 68.0, 108.0, 106.0, 80.0, 95.0, 109.0, 130.0, 96.0, 121.0, 121.0, 100.0, 108.0, 100.0, 95.0, 68.0, 106.0, 101.0, 86.0, 105.0, 125.0, 85.0, 101.0, 101.0, 108.0, 74.0, 112.0, 97.0, 104.0, 99.0, 88.0, 70.0, 99.0, 82.0, 145.0, 85.0, 110.0, 69.0, 100.0, 135.0, 98.0, 117.0, 109.0, 103.0, 95.0, 95.0, 111.0, 94.0, 132.0, 114.0, 81.0, 116.0, 124.0, 88.0, 80.0, 114.0, 69.0, 108.0, 110.0, 88.0, 137.0, 121.0, 56.0, 71.0, 65.0, 125.0, 87.0, 133.0, 80.0, 114.0, 107.0, 93.0 ], [ 0.0, 35.0, 40.0, 44.0, 45.0, 49.0, 51.0, 52.0, 53.0, 54.0, 54.0, 55.0, 55.0, 55.0, 56.0, 56.0, 57.0, 57.0, 58.0, 58.0, 59.0, 59.0, 60.0, 61.0, 61.0, 61.0, 61.0, 61.0, 61.0, 62.0, 62.0, 62.0, 63.0, 63.0, 63.0, 63.0, 64.0, 65.0, 65.0, 65.0, 65.0, 65.0, 66.0, 66.0, 66.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 68.0, 68.0, 68.0, 68.0, 69.0, 69.0, 69.0, 69.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 71.0, 71.0, 71.0, 71.0, 71.0, 71.0, 71.0, 71.0, 72.0, 72.0, 72.0, 72.0, 72.0, 72.0, 72.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 75.0, 75.0, 75.0, 75.0, 75.0, 75.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 89.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 90.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 91.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 92.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 93.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 94.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 95.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 96.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 97.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 99.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 101.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 102.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 103.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 104.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 105.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 106.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 107.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 108.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 109.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 110.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 111.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 112.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 113.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 114.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 115.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 116.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 117.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 118.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 119.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 120.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 121.0, 122.0, 122.0, 122.0, 122.0, 122.0, 122.0, 122.0, 122.0, 122.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 123.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 124.0, 125.0, 125.0, 125.0, 125.0, 125.0, 125.0, 125.0, 125.0, 126.0, 126.0, 126.0, 126.0, 126.0, 126.0, 126.0, 126.0, 126.0, 126.0, 126.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 127.0, 128.0, 128.0, 128.0, 128.0, 128.0, 128.0, 129.0, 129.0, 129.0, 129.0, 129.0, 129.0, 129.0, 130.0, 130.0, 130.0, 130.0, 130.0, 130.0, 130.0, 130.0, 131.0, 131.0, 131.0, 131.0, 132.0, 132.0, 132.0, 132.0, 132.0, 132.0, 133.0, 133.0, 133.0, 133.0, 133.0, 133.0, 134.0, 134.0, 134.0, 134.0, 134.0, 134.0, 135.0, 135.0, 135.0, 136.0, 136.0, 137.0, 137.0, 137.0, 137.0, 137.0, 138.0, 138.0, 138.0, 138.0, 138.0, 139.0, 139.0, 139.0, 140.0, 140.0, 140.0, 140.0, 140.0, 140.0, 141.0, 141.0, 141.0, 141.0, 142.0, 142.0, 143.0, 143.0, 145.0, 145.0, 145.0, 146.0, 147.0, 147.0, 149.0, 150.0, 151.0, 151.0, 151.0, 156.0, 158.0, 163.0 ] ]\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"Eve Mins\",\n", " \"inferred_type\" : \"Fractional\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 200.0501071581652,\n", " \"sum\" : 466716.8999999994,\n", " \"std_dev\" : 50.01592824933489,\n", " \"min\" : 31.2,\n", " \"max\" : 361.8,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 31.2,\n", " \"upper_bound\" : 64.26,\n", " \"count\" : 7.0\n", " }, {\n", " \"lower_bound\" : 64.26,\n", " \"upper_bound\" : 97.32000000000001,\n", " \"count\" : 43.0\n", " }, {\n", " \"lower_bound\" : 97.32000000000001,\n", " \"upper_bound\" : 130.38,\n", " \"count\" : 135.0\n", " }, {\n", " \"lower_bound\" : 130.38,\n", " \"upper_bound\" : 163.44,\n", " \"count\" : 360.0\n", " }, {\n", " \"lower_bound\" : 163.44,\n", " \"upper_bound\" : 196.5,\n", " \"count\" : 555.0\n", " }, {\n", " \"lower_bound\" : 196.5,\n", " \"upper_bound\" : 229.56,\n", " \"count\" : 587.0\n", " }, {\n", " \"lower_bound\" : 229.56,\n", " \"upper_bound\" : 262.62,\n", " \"count\" : 404.0\n", " }, {\n", " \"lower_bound\" : 262.62,\n", " \"upper_bound\" : 295.68,\n", " \"count\" : 178.0\n", " }, {\n", " \"lower_bound\" : 295.68,\n", " \"upper_bound\" : 328.74,\n", " \"count\" : 49.0\n", " }, {\n", " \"lower_bound\" : 328.74,\n", " \"upper_bound\" : 361.8,\n", " \"count\" : 15.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", " \"data\" : [ [ 212.8, 221.3, 227.8, 341.3, 286.2, 158.6, 292.3, 257.5, 161.4, 102.8, 151.2, 134.0, 259.7, 127.5, 155.0, 203.9, 183.8, 189.5, 228.4, 176.6, 247.8, 194.5, 169.1, 185.3, 223.8, 290.9, 200.2, 194.8, 134.2, 127.8, 207.5, 233.9, 169.9, 248.9, 235.2, 254.9, 176.7, 204.1, 181.5, 244.4, 169.8, 221.5, 284.3, 187.1, 99.5, 253.0, 215.9, 167.2, 278.7, 137.6, 217.7, 172.2, 178.5, 211.4, 169.3, 245.0, 169.8, 348.5, 242.4, 193.5, 247.9, 199.2, 103.1, 196.7, 267.1, 253.8, 187.8, 275.0, 161.9, 221.1, 168.6, 234.7, 217.7, 282.6, 221.2, 96.6, 187.6, 191.4, 196.4, 164.6, 227.3, 249.1, 167.8, 131.1, 154.5, 264.9, 205.5, 262.3, 203.5, 246.8, 351.6, 178.6, 263.6, 209.9, 221.0, 195.3, 106.1, 163.7, 216.9, 215.8, 153.1, 170.5, 144.5, 143.8, 137.3, 276.2, 155.0, 150.1, 131.0, 120.9, 164.6, 173.7, 139.1, 209.1, 172.0, 170.4, 140.2, 230.3, 132.9, 159.1, 52.9, 211.1, 238.2, 125.8, 143.7, 162.7, 228.3, 180.4, 188.8, 256.2, 163.7, 129.5, 230.9, 140.3, 177.8, 317.0, 196.8, 204.7, 185.4, 200.9, 309.2, 185.7, 164.1, 148.8, 173.7, 259.4, 246.7, 231.9, 234.4, 147.9, 216.6, 127.8, 235.9, 210.4, 201.4, 160.6, 228.4, 226.7, 241.3, 193.4, 165.7, 186.2, 177.3, 196.3, 205.1, 167.2, 230.0, 195.5, 178.0, 69.2, 303.5, 192.2, 213.8, 232.2, 86.8, 260.5, 165.1, 245.3, 177.8, 198.2, 145.3, 184.9, 162.6, 236.1, 293.8, 189.4, 126.2, 139.8, 303.3, 177.6, 209.6, 210.7, 304.6, 159.5, 205.2, 206.6, 246.1, 184.9, 226.1, 91.7, 169.9, 201.4, 167.6, 180.6, 200.6, 167.5, 270.6, 155.2, 193.8, 268.1, 183.3, 189.6, 250.5, 132.2, 248.9, 167.9, 165.8, 164.8, 195.2, 176.0, 203.7, 203.0, 211.2, 216.5, 260.6, 209.2, 251.6, 167.2, 213.7, 286.3, 121.9, 162.3, 186.9, 250.5, 200.2, 196.9, 169.7, 201.8, 115.7, 322.2, 224.8, 209.3, 273.7, 246.9, 269.4, 273.3, 239.6, 260.3, 197.5, 200.2, 182.4, 209.9, 223.6, 220.0, 255.1, 256.3, 212.0, 236.7, 199.9, 115.9, 256.6, 233.7, 173.4, 146.6, 137.5, 165.0, 198.9, 129.2, 249.3, 222.8, 248.5, 178.7, 175.2, 230.1, 214.1, 187.2, 198.9, 165.7, 213.4, 88.1, 297.8, 195.5, 238.3 ], [ 31.2, 58.9, 60.8, 65.2, 66.5, 71.0, 73.2, 75.3, 77.1, 78.3, 79.3, 80.6, 82.2, 83.9, 87.6, 88.6, 89.7, 90.0, 90.5, 92.0, 93.7, 95.1, 98.3, 101.3, 102.2, 102.6, 103.4, 105.5, 105.7, 106.2, 106.8, 107.9, 108.2, 109.9, 110.2, 110.8, 112.5, 113.2, 113.3, 114.3, 114.5, 114.7, 115.0, 115.7, 116.5, 116.6, 117.0, 117.9, 118.0, 118.5, 118.7, 118.9, 119.3, 119.6, 120.0, 120.3, 120.4, 120.5, 120.7, 121.0, 121.6, 122.2, 122.8, 123.0, 123.4, 123.5, 123.5, 123.9, 123.9, 124.4, 126.0, 126.9, 127.3, 127.8, 128.7, 128.9, 129.1, 129.3, 129.4, 129.8, 130.1, 130.2, 130.7, 131.1, 131.4, 131.7, 131.8, 132.3, 132.5, 132.9, 133.0, 133.4, 133.9, 134.1, 134.3, 134.5, 134.7, 134.9, 135.0, 135.2, 136.0, 136.1, 136.4, 136.7, 136.9, 137.3, 137.8, 138.1, 138.3, 138.5, 138.7, 138.9, 139.1, 139.5, 139.6, 140.2, 140.9, 140.9, 141.2, 141.4, 141.6, 141.8, 141.9, 142.0, 142.1, 142.3, 142.6, 142.6, 142.7, 143.1, 143.4, 143.7, 143.8, 144.1, 144.3, 144.4, 145.0, 145.1, 145.5, 145.9, 146.4, 146.7, 146.9, 147.0, 147.4, 147.7, 148.0, 148.2, 148.3, 148.5, 148.7, 149.1, 149.3, 149.5, 149.6, 149.9, 150.0, 150.0, 150.1, 150.6, 150.8, 151.3, 151.4, 151.7, 152.0, 152.1, 152.3, 152.5, 152.5, 152.7, 152.7, 152.8, 152.9, 153.1, 153.1, 153.2, 153.3, 153.6, 153.7, 153.8, 154.0, 154.2, 154.5, 154.6, 154.7, 154.9, 154.9, 155.1, 155.4, 155.5, 155.6, 155.8, 156.0, 156.0, 156.1, 156.3, 156.4, 156.9, 157.0, 157.3, 157.5, 157.6, 157.6, 158.0, 158.2, 158.4, 158.6, 158.8, 159.2, 159.4, 159.5, 159.6, 159.6, 159.7, 159.9, 160.1, 160.1, 160.3, 160.5, 160.6, 160.6, 160.7, 160.8, 160.9, 161.1, 161.4, 161.7, 161.7, 161.7, 161.8, 161.9, 161.9, 162.1, 162.3, 162.3, 162.5, 162.5, 162.6, 162.8, 163.0, 163.1, 163.2, 163.3, 163.4, 163.6, 163.7, 164.0, 164.2, 164.5, 164.5, 164.5, 164.7, 164.9, 165.1, 165.2, 165.4, 165.6, 165.8, 165.8, 165.9, 165.9, 166.4, 166.6, 166.7, 166.8, 167.0, 167.1, 167.2, 167.2, 167.3, 167.5, 167.6, 167.7, 167.7, 167.7, 167.8, 167.9, 167.9, 168.0, 168.2, 168.3, 168.4, 168.5, 168.6, 168.7, 169.0, 169.1, 169.3, 169.5, 169.6, 169.8, 169.9, 169.9, 170.0, 170.0, 170.2, 170.5, 170.5, 170.7, 170.8, 170.9, 171.2, 171.4, 171.6, 171.7, 171.8, 171.9, 172.2, 172.7, 172.8, 173.1, 173.3, 173.3, 173.4, 173.5, 173.7, 174.0, 174.4, 174.5, 174.6, 174.8, 174.9, 175.1, 175.2, 175.4, 175.7, 175.8, 175.9, 176.0, 176.1, 176.2, 176.4, 176.6, 176.6, 176.7, 177.0, 177.2, 177.5, 177.6, 177.8, 177.9, 178.2, 178.3, 178.4, 178.6, 178.8, 178.9, 179.1, 179.3, 179.3, 179.5, 179.7, 179.9, 180.0, 180.0, 180.0, 180.2, 180.3, 180.4, 180.5, 180.5, 180.6, 180.8, 181.0, 181.2, 181.4, 181.6, 181.6, 181.7, 182.0, 182.0, 182.2, 182.2, 182.5, 182.9, 182.9, 183.0, 183.1, 183.4, 183.5, 183.6, 183.6, 183.9, 184.0, 184.1, 184.2, 184.3, 184.5, 184.6, 184.8, 185.0, 185.4, 185.5, 185.5, 185.7, 185.8, 185.9, 186.0, 186.4, 186.6, 186.6, 186.7, 186.8, 187.0, 187.0, 187.1, 187.2, 187.2, 187.3, 187.4, 187.5, 187.5, 187.7, 187.8, 188.0, 188.2, 188.2, 188.4, 188.5, 188.5, 188.6, 188.8, 188.8, 189.0, 189.1, 189.3, 189.3, 189.4, 189.6, 189.6, 189.7, 189.8, 190.0, 190.0, 190.2, 190.3, 190.4, 190.6, 190.7, 190.8, 190.9, 191.1, 191.3, 191.6, 191.8, 191.9, 192.0, 192.2, 192.3, 192.4, 192.6, 192.7, 192.8, 193.0, 193.0, 193.1, 193.3, 193.6, 193.8, 193.9, 194.0, 194.1, 194.4, 194.4, 194.5, 194.6, 194.6, 194.7, 194.8, 194.9, 195.2, 195.5, 195.5, 195.6, 195.7, 195.7, 195.7, 195.8, 195.9, 196.0, 196.2, 196.3, 196.5, 196.6, 196.7, 196.7, 196.8, 197.0, 197.0, 197.3, 197.3, 197.4, 197.4, 197.5, 197.6, 197.6, 197.7, 197.8, 198.0, 198.3, 198.4, 198.5, 198.6, 198.6, 198.9, 199.4, 199.5, 199.6, 199.7, 199.7, 199.9, 200.0, 200.2, 200.3, 200.5, 200.6, 200.7, 200.7, 200.9, 201.0, 201.0, 201.0, 201.1, 201.3, 201.6, 202.2, 202.3, 202.4, 202.4, 202.5, 202.6, 202.6, 202.7, 202.8, 202.9, 203.0, 203.2, 203.4, 203.4, 203.6, 203.7, 203.8, 203.9, 203.9, 204.0, 204.1, 204.2, 204.3, 204.5, 204.6, 204.7, 204.9, 204.9, 205.0, 205.1, 205.2, 205.2, 205.5, 205.5, 205.7, 205.9, 206.0, 206.0, 206.2, 206.4, 206.5, 206.7, 206.8, 206.9, 207.0, 207.0, 207.1, 207.3, 207.5, 207.6, 207.7, 207.9, 207.9, 208.0, 208.0, 208.2, 208.2, 208.4, 208.5, 208.6, 208.8, 208.9, 209.0, 209.1, 209.1, 209.3, 209.4, 209.4, 209.4, 209.5, 209.5, 209.6, 209.8, 209.8, 209.9, 210.0, 210.1, 210.2, 210.2, 210.4, 210.5, 210.6, 210.7, 210.8, 210.9, 211.0, 211.1, 211.2, 211.3, 211.4, 211.5, 211.5, 211.6, 211.6, 211.7, 211.8, 211.8, 212.0, 212.1, 212.2, 212.3, 212.5, 212.7, 213.0, 213.2, 213.3, 213.4, 213.6, 213.7, 213.7, 213.9, 213.9, 214.1, 214.2, 214.2, 214.3, 214.5, 214.7, 214.8, 215.1, 215.2, 215.3, 215.5, 215.5, 215.6, 215.7, 215.8, 216.2, 216.3, 216.3, 216.4, 216.5, 216.5, 216.6, 216.7, 216.8, 216.9, 217.0, 217.1, 217.2, 217.4, 217.5, 217.7, 218.2, 218.6, 218.8, 218.9, 219.1, 219.1, 219.2, 219.3, 219.5, 219.6, 219.6, 219.7, 219.9, 220.0, 220.2, 220.4, 220.5, 220.6, 220.6, 220.9, 220.9, 221.1, 221.2, 221.4, 221.6, 221.8, 221.9, 222.0, 222.1, 222.2, 222.3, 222.4, 222.7, 223.0, 223.0, 223.1, 223.2, 223.3, 223.5, 223.5, 223.5, 223.6, 224.1, 224.2, 224.3, 224.4, 224.6, 224.7, 224.9, 224.9, 225.0, 225.1, 225.1, 225.1, 225.2, 225.3, 225.5, 225.9, 225.9, 226.1, 226.2, 226.6, 226.7, 226.8, 227.2, 227.4, 227.8, 228.4, 228.6, 228.9, 229.0, 229.4, 229.4, 229.7, 229.9, 230.0, 230.1, 230.1, 230.3, 230.5, 230.7, 230.9, 230.9, 231.3, 231.4, 231.6, 231.7, 231.8, 231.8, 232.1, 232.3, 232.4, 232.6, 232.7, 232.9, 232.9, 233.2, 233.2, 233.5, 233.6, 233.7, 233.7, 233.8, 234.1, 234.5, 234.9, 235.0, 235.3, 235.3, 235.5, 235.8, 235.9, 236.0, 236.0, 236.2, 236.3, 236.5, 236.7, 236.8, 237.0, 237.2, 237.4, 237.7, 237.9, 238.1, 238.3, 238.6, 238.7, 239.5, 239.7, 240.1, 240.1, 240.2, 240.6, 240.7, 240.7, 240.8, 241.0, 241.4, 241.5, 241.6, 241.9, 242.0, 242.2, 242.2, 242.3, 242.6, 242.8, 242.8, 243.1, 243.2, 243.3, 243.5, 243.7, 243.9, 244.0, 244.3, 244.7, 244.8, 245.1, 245.4, 245.6, 246.0, 246.1, 246.3, 246.5, 246.5, 246.5, 246.6, 246.7, 247.0, 247.2, 247.3, 247.6, 247.7, 247.8, 248.1, 248.2, 248.5, 248.6, 248.7, 248.7, 248.8, 249.0, 249.1, 249.3, 249.5, 249.6, 249.7, 249.8, 249.8, 249.9, 250.0, 250.2, 250.7, 250.7, 250.8, 251.1, 251.3, 251.7, 251.8, 252.2, 252.4, 252.7, 253.0, 253.4, 253.4, 253.6, 253.6, 253.9, 254.1, 254.2, 254.5, 255.1, 255.6, 255.7, 255.9, 255.9, 256.1, 256.1, 256.6, 256.6, 256.7, 256.8, 256.8, 256.9, 257.2, 257.2, 257.5, 257.7, 258.0, 258.2, 258.4, 258.7, 258.8, 259.2, 259.3, 259.6, 259.8, 260.0, 260.2, 260.7, 260.9, 261.5, 261.5, 261.6, 262.0, 262.2, 262.3, 262.6, 263.0, 263.4, 263.7, 264.1, 264.4, 264.5, 264.7, 264.7, 265.0, 265.3, 265.5, 265.7, 265.8, 266.3, 266.5, 266.9, 267.1, 267.4, 267.8, 268.3, 268.5, 268.5, 269.1, 269.5, 269.8, 270.5, 270.8, 271.0, 271.7, 272.8, 273.0, 273.0, 273.6, 273.8, 274.0, 274.4, 274.6, 274.8, 275.0, 275.4, 275.6, 276.0, 276.3, 276.4, 276.4, 276.8, 277.1, 277.5, 278.0, 278.2, 278.5, 279.3, 280.1, 280.8, 281.1, 281.7, 282.8, 283.2, 283.3, 284.5, 285.2, 285.8, 285.9, 286.0, 286.7, 287.3, 287.4, 287.7, 288.0, 288.7, 289.4, 289.8, 290.0, 290.9, 291.3, 292.5, 292.7, 292.8, 293.6, 294.0, 295.7, 298.0, 298.6, 300.9, 301.3, 301.5, 302.6, 303.4, 303.8, 304.9, 307.2, 308.7, 313.2, 313.7, 317.2, 317.8, 318.8, 319.3, 322.3, 327.0, 328.2, 329.3, 330.6, 332.8, 337.1, 347.3, 354.2 ] ]\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"Eve Calls\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 99.57393913416202,\n", " \"sum\" : 232306.0,\n", " \"std_dev\" : 19.67557809959058,\n", " \"min\" : 12.0,\n", " \"max\" : 170.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 12.0,\n", " \"upper_bound\" : 27.8,\n", " \"count\" : 2.0\n", " }, {\n", " \"lower_bound\" : 27.8,\n", " \"upper_bound\" : 43.6,\n", " \"count\" : 2.0\n", " }, {\n", " \"lower_bound\" : 43.6,\n", " \"upper_bound\" : 59.4,\n", " \"count\" : 44.0\n", " }, {\n", " \"lower_bound\" : 59.4,\n", " \"upper_bound\" : 75.2,\n", " \"count\" : 195.0\n", " }, {\n", " \"lower_bound\" : 75.2,\n", " \"upper_bound\" : 91.0,\n", " \"count\" : 530.0\n", " }, {\n", " \"lower_bound\" : 91.0,\n", " \"upper_bound\" : 106.8,\n", " \"count\" : 708.0\n", " }, {\n", " \"lower_bound\" : 106.8,\n", " \"upper_bound\" : 122.6,\n", " \"count\" : 576.0\n", " }, {\n", " \"lower_bound\" : 122.6,\n", " \"upper_bound\" : 138.4,\n", " \"count\" : 215.0\n", " }, {\n", " \"lower_bound\" : 138.4,\n", " \"upper_bound\" : 154.2,\n", " \"count\" : 56.0\n", " }, {\n", " \"lower_bound\" : 154.2,\n", " \"upper_bound\" : 170.0,\n", " \"count\" : 5.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", " \"data\" : [ [ 100.0, 92.0, 128.0, 79.0, 86.0, 98.0, 112.0, 103.0, 82.0, 56.0, 82.0, 114.0, 108.0, 86.0, 101.0, 107.0, 68.0, 108.0, 139.0, 114.0, 126.0, 157.0, 126.0, 99.0, 87.0, 92.0, 64.0, 107.0, 103.0, 86.0, 109.0, 115.0, 125.0, 108.0, 100.0, 110.0, 84.0, 129.0, 109.0, 88.0, 101.0, 125.0, 119.0, 112.0, 122.0, 83.0, 116.0, 62.0, 88.0, 108.0, 109.0, 108.0, 128.0, 96.0, 60.0, 122.0, 114.0, 108.0, 106.0, 118.0, 81.0, 97.0, 94.0, 112.0, 118.0, 114.0, 121.0, 129.0, 123.0, 121.0, 116.0, 97.0, 114.0, 110.0, 82.0, 82.0, 99.0, 119.0, 77.0, 83.0, 132.0, 126.0, 90.0, 84.0, 90.0, 99.0, 89.0, 122.0, 78.0, 91.0, 80.0, 46.0, 121.0, 93.0, 147.0, 87.0, 80.0, 112.0, 80.0, 68.0, 80.0, 86.0, 92.0, 74.0, 102.0, 77.0, 86.0, 88.0, 114.0, 91.0, 59.0, 109.0, 123.0, 142.0, 103.0, 96.0, 134.0, 109.0, 93.0, 119.0, 109.0, 115.0, 133.0, 112.0, 111.0, 66.0, 97.0, 91.0, 65.0, 101.0, 108.0, 114.0, 137.0, 114.0, 118.0, 129.0, 111.0, 77.0, 118.0, 111.0, 105.0, 111.0, 45.0, 95.0, 85.0, 95.0, 89.0, 92.0, 96.0, 97.0, 123.0, 72.0, 90.0, 99.0, 87.0, 108.0, 117.0, 137.0, 146.0, 120.0, 115.0, 90.0, 116.0, 119.0, 88.0, 106.0, 121.0, 84.0, 106.0, 84.0, 74.0, 89.0, 112.0, 54.0, 122.0, 102.0, 127.0, 105.0, 109.0, 89.0, 146.0, 98.0, 76.0, 70.0, 123.0, 64.0, 118.0, 121.0, 100.0, 92.0, 84.0, 80.0, 83.0, 88.0, 94.0, 84.0, 96.0, 144.0, 103.0, 84.0, 102.0, 80.0, 121.0, 100.0, 81.0, 144.0, 74.0, 121.0, 95.0, 69.0, 79.0, 104.0, 53.0, 75.0, 90.0, 128.0, 119.0, 94.0, 116.0, 89.0, 99.0, 141.0, 79.0, 103.0, 90.0, 69.0, 123.0, 136.0, 87.0, 76.0, 105.0, 139.0, 114.0, 86.0, 121.0, 87.0, 75.0, 102.0, 96.0, 81.0, 87.0, 86.0, 121.0, 111.0, 85.0, 103.0, 102.0, 103.0, 68.0, 110.0, 108.0, 92.0, 96.0, 118.0, 110.0, 150.0, 98.0, 109.0, 124.0, 129.0, 69.0, 74.0, 88.0, 103.0, 111.0, 71.0, 96.0, 113.0, 119.0, 102.0, 109.0, 70.0, 87.0, 73.0, 68.0, 98.0, 76.0, 97.0, 73.0, 94.0, 98.0, 91.0, 102.0 ], [ 12.0, 42.0, 44.0, 48.0, 48.0, 48.0, 50.0, 52.0, 52.0, 53.0, 54.0, 56.0, 56.0, 57.0, 58.0, 58.0, 58.0, 58.0, 59.0, 59.0, 59.0, 60.0, 60.0, 60.0, 60.0, 60.0, 61.0, 61.0, 61.0, 62.0, 62.0, 63.0, 63.0, 63.0, 63.0, 63.0, 64.0, 64.0, 64.0, 65.0, 65.0, 65.0, 65.0, 65.0, 66.0, 66.0, 66.0, 66.0, 66.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 67.0, 68.0, 68.0, 68.0, 69.0, 69.0, 69.0, 69.0, 70.0, 70.0, 70.0, 70.0, 70.0, 71.0, 71.0, 71.0, 71.0, 71.0, 71.0, 71.0, 71.0, 72.0, 72.0, 72.0, 72.0, 72.0, 72.0, 72.0, 72.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 73.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 74.0, 75.0, 75.0, 75.0, 75.0, 75.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 76.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 77.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 78.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 79.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 80.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 81.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 82.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 83.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 84.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 85.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 86.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 88.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ] ]\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"State_KS\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 0.02186026575225032,\n", " \"sum\" : 51.0,\n", " \"std_dev\" : 0.14622720175634665,\n", " \"min\" : 0.0,\n", " \"max\" : 1.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 0.1,\n", " \"count\" : 2282.0\n", " }, {\n", " \"lower_bound\" : 0.1,\n", " \"upper_bound\" : 0.2,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.2,\n", " \"upper_bound\" : 0.3,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.3,\n", " \"upper_bound\" : 0.4,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.4,\n", " \"upper_bound\" : 0.5,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.5,\n", " \"upper_bound\" : 0.6,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.6,\n", " \"upper_bound\" : 0.7,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.7,\n", " \"upper_bound\" : 0.8,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.8,\n", " \"upper_bound\" : 0.9,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.9,\n", " \"upper_bound\" : 1.0,\n", " \"count\" : 51.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", " \"data\" : [ [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ], [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ] ]\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"State_KY\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 0.018859837119588514,\n", " \"sum\" : 44.0,\n", " \"std_dev\" : 0.13602993664414845,\n", " \"min\" : 0.0,\n", " \"max\" : 1.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 0.1,\n", " \"count\" : 2289.0\n", " }, {\n", " \"lower_bound\" : 0.1,\n", " \"upper_bound\" : 0.2,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.2,\n", " \"upper_bound\" : 0.3,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.3,\n", " \"upper_bound\" : 0.4,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.4,\n", " \"upper_bound\" : 0.5,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.5,\n", " \"upper_bound\" : 0.6,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.6,\n", " \"upper_bound\" : 0.7,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.7,\n", " \"upper_bound\" : 0.8,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.8,\n", " \"upper_bound\" : 0.9,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.9,\n", " \"upper_bound\" : 1.0,\n", " \"count\" : 44.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", "[TRUNCATED MESSAGE] 6571 bytes are truncated.\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"State_LA\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 0.017573939134162022,\n", " \"sum\" : 41.0,\n", " \"std_dev\" : 0.13139671151695853,\n", " \"min\" : 0.0,\n", " \"max\" : 1.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 0.1,\n", " \"count\" : 2292.0\n", " }, {\n", " \"lower_bound\" : 0.1,\n", " \"upper_bound\" : 0.2,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.2,\n", " \"upper_bound\" : 0.3,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.3,\n", " \"upper_bound\" : 0.4,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.4,\n", " \"upper_bound\" : 0.5,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.5,\n", " \"upper_bound\" : 0.6,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.6,\n", " \"upper_bound\" : 0.7,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.7,\n", " \"upper_bound\" : 0.8,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.8,\n", " \"upper_bound\" : 0.9,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.9,\n", " \"upper_bound\" : 1.0,\n", " \"count\" : 41.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", "[TRUNCATED MESSAGE] 6571 bytes are truncated.\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"State_MA\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 0.015430775825117874,\n", " \"sum\" : 36.0,\n", " \"std_dev\" : 0.12325853715890375,\n", " \"min\" : 0.0,\n", " \"max\" : 1.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 0.1,\n", " \"count\" : 2298.0\n", " }, {\n", " \"lower_bound\" : 0.1,\n", " \"upper_bound\" : 0.2,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.2,\n", " \"upper_bound\" : 0.3,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.3,\n", " \"upper_bound\" : 0.4,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.4,\n", " \"upper_bound\" : 0.5,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.5,\n", " \"upper_bound\" : 0.6,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.6,\n", " \"upper_bound\" : 0.7,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.7,\n", " \"upper_bound\" : 0.8,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.8,\n", " \"upper_bound\" : 0.9,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.9,\n", " \"upper_bound\" : 1.0,\n", " \"count\" : 35.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", "[TRUNCATED MESSAGE] 6571 bytes are truncated.\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"State_MD\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 0.020574367766823833,\n", " \"sum\" : 48.0,\n", " \"std_dev\" : 0.1419544404300876,\n", " \"min\" : 0.0,\n", " \"max\" : 1.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 0.1,\n", " \"count\" : 2285.0\n", " }, {\n", " \"lower_bound\" : 0.1,\n", " \"upper_bound\" : 0.2,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.2,\n", " \"upper_bound\" : 0.3,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.3,\n", " \"upper_bound\" : 0.4,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.4,\n", " \"upper_bound\" : 0.5,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.5,\n", " \"upper_bound\" : 0.6,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.6,\n", " \"upper_bound\" : 0.7,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.7,\n", " \"upper_bound\" : 0.8,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.8,\n", " \"upper_bound\" : 0.9,\n", " \"count\" : 0.0\n", " }, {\n", " \"lower_bound\" : 0.9,\n", " \"upper_bound\" : 1.0,\n", " \"count\" : 48.0\n", " } ],\n", " \"sketch\" : {\n", " \"parameters\" : {\n", " \"c\" : 0.64,\n", " \"k\" : 2048.0\n", " },\n", "[TRUNCATED MESSAGE] 6571 bytes are truncated.\n", " }\n", " }\n", " }\n", " }\n", " }, {\n", " \"name\" : \"State_ME\",\n", " \"inferred_type\" : \"Integral\",\n", " \"numerical_statistics\" : {\n", " \"common\" : {\n", " \"num_present\" : 2333,\n", " \"num_missing\" : 0\n", " },\n", " \"mean\" : 0.017573939134162022,\n", " \"sum\" : 41.0,\n", " \"std_dev\" : 0.1313967115169584,\n", " \"min\" : 0.0,\n", " \"max\" : 1.0,\n", " \"distribution\" : {\n", " \"kll\" : {\n", " \"buckets\" : [ {\n", " \"lower_bound\" : 0.0,\n", " \"upper_bound\" : 0.1,\n", " }\u001b[0m\n", "\u001b[34m}\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO FileUtil:29 - Write to file statistics.json at path /opt/ml/processing/output.\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO YarnClientSchedulerBackend:54 - Interrupting monitor thread\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO YarnClientSchedulerBackend:54 - Shutting down all executors\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Asking each executor to shut down\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO SchedulerExtensionServices:54 - Stopping SchedulerExtensionServices\u001b[0m\n", "\u001b[34m(serviceOption=None,\n", " services=List(),\n", " started=false)\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO YarnClientSchedulerBackend:54 - Stopped\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO MemoryStore:54 - MemoryStore cleared\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO BlockManager:54 - BlockManager stopped\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO BlockManagerMaster:54 - BlockManagerMaster stopped\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO SparkContext:54 - Successfully stopped SparkContext\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO Main:65 - Completed: Job completed successfully with no violations.\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO Main:141 - Write to file /opt/ml/output/message.\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO ShutdownHookManager:54 - Shutdown hook called\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-0d19b328-7885-4672-b8b3-6c1d66a8f479\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-f709120a-6ae8-4f6a-b00e-938a3075fdda\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48,823 - DefaultDataAnalyzer - INFO - Completed spark-submit with return code : 0\u001b[0m\n", "\u001b[34m2021-02-02 12:51:48,824 - DefaultDataAnalyzer - INFO - Spark job completed.\u001b[0m\n", "\n" ] } ], "source": [ "# cell 12\n", "from sagemaker.model_monitor import DefaultModelMonitor\n", "from sagemaker.model_monitor.dataset_format import DatasetFormat\n", "\n", "my_default_monitor = DefaultModelMonitor(\n", " role=role,\n", " instance_count=1,\n", " instance_type='ml.m5.xlarge',\n", " volume_size_in_gb=20,\n", " max_runtime_in_seconds=3600,\n", ")\n", "\n", "my_default_monitor_baseline = my_default_monitor.suggest_baseline(\n", " baseline_dataset=baseline_data_uri+'/training-dataset-with-header.csv',\n", " dataset_format=DatasetFormat.csv(header=True),\n", " output_s3_uri=baseline_results_uri,\n", " wait=True\n", ")" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### Explore the generated constraints and statistics" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found Files:\n", "sagemaker/DEMO-ModelMonitor/baselining/results/constraints.json\n", " sagemaker/DEMO-ModelMonitor/baselining/results/statistics.json\n" ] } ], "source": [ "# cell 13\n", "s3_client = boto3.Session().client('s3')\n", "result = s3_client.list_objects(Bucket=bucket, Prefix=baseline_results_prefix)\n", "report_files = [report_file.get(\"Key\") for report_file in result.get('Contents')]\n", "print(\"Found Files:\")\n", "print(\"\\n \".join(report_files))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0ChurnIntegral233300.139306325.00.3462650.01.0[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...0.642048.0[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0,...
1Account LengthIntegral23330101.276897236279.039.5524421.0243.0[{'lower_bound': 1.0, 'upper_bound': 25.2, 'co...0.642048.0[[119.0, 100.0, 111.0, 181.0, 95.0, 104.0, 70....
2VMail MessageIntegral233308.21431619164.013.7769080.051.0[{'lower_bound': 0.0, 'upper_bound': 5.1, 'cou...0.642048.0[[19.0, 0.0, 0.0, 40.0, 36.0, 0.0, 0.0, 24.0, ...
3Day MinsFractional23330180.226489420468.453.9871790.0350.8[{'lower_bound': 0.0, 'upper_bound': 35.08, 'c...0.642048.0[[178.1, 160.3, 197.1, 105.2, 283.1, 113.6, 23...
4Day CallsIntegral23330100.259323233905.020.1650080.0165.0[{'lower_bound': 0.0, 'upper_bound': 16.5, 'co...0.642048.0[[110.0, 138.0, 117.0, 61.0, 112.0, 87.0, 122....
5Eve MinsFractional23330200.050107466716.950.01592831.2361.8[{'lower_bound': 31.2, 'upper_bound': 64.26, '...0.642048.0[[212.8, 221.3, 227.8, 341.3, 286.2, 158.6, 29...
6Eve CallsIntegral2333099.573939232306.019.67557812.0170.0[{'lower_bound': 12.0, 'upper_bound': 27.8, 'c...0.642048.0[[100.0, 92.0, 128.0, 79.0, 86.0, 98.0, 112.0,...
7Night MinsFractional23330201.388598469839.650.62796123.2395.0[{'lower_bound': 23.2, 'upper_bound': 60.37999...0.642048.0[[226.3, 150.4, 214.0, 165.7, 261.7, 187.7, 20...
8Night CallsIntegral23330100.227175233830.019.28202942.0175.0[{'lower_bound': 42.0, 'upper_bound': 55.3, 'c...0.642048.0[[123.0, 120.0, 101.0, 97.0, 129.0, 87.0, 112....
9Intl MinsFractional2333010.25306523920.42.7787660.018.4[{'lower_bound': 0.0, 'upper_bound': 1.8399999...0.642048.0[[10.0, 11.2, 9.3, 6.3, 11.3, 10.5, 0.0, 9.7, ...
\n", "
" ], "text/plain": [ " name inferred_type numerical_statistics.common.num_present \\\n", "0 Churn Integral 2333 \n", "1 Account Length Integral 2333 \n", "2 VMail Message Integral 2333 \n", "3 Day Mins Fractional 2333 \n", "4 Day Calls Integral 2333 \n", "5 Eve Mins Fractional 2333 \n", "6 Eve Calls Integral 2333 \n", "7 Night Mins Fractional 2333 \n", "8 Night Calls Integral 2333 \n", "9 Intl Mins Fractional 2333 \n", "\n", " numerical_statistics.common.num_missing numerical_statistics.mean \\\n", "0 0 0.139306 \n", "1 0 101.276897 \n", "2 0 8.214316 \n", "3 0 180.226489 \n", "4 0 100.259323 \n", "5 0 200.050107 \n", "6 0 99.573939 \n", "7 0 201.388598 \n", "8 0 100.227175 \n", "9 0 10.253065 \n", "\n", " numerical_statistics.sum numerical_statistics.std_dev \\\n", "0 325.0 0.346265 \n", "1 236279.0 39.552442 \n", "2 19164.0 13.776908 \n", "3 420468.4 53.987179 \n", "4 233905.0 20.165008 \n", "5 466716.9 50.015928 \n", "6 232306.0 19.675578 \n", "7 469839.6 50.627961 \n", "8 233830.0 19.282029 \n", "9 23920.4 2.778766 \n", "\n", " numerical_statistics.min numerical_statistics.max \\\n", "0 0.0 1.0 \n", "1 1.0 243.0 \n", "2 0.0 51.0 \n", "3 0.0 350.8 \n", "4 0.0 165.0 \n", "5 31.2 361.8 \n", "6 12.0 170.0 \n", "7 23.2 395.0 \n", "8 42.0 175.0 \n", "9 0.0 18.4 \n", "\n", " numerical_statistics.distribution.kll.buckets \\\n", "0 [{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou... \n", "1 [{'lower_bound': 1.0, 'upper_bound': 25.2, 'co... \n", "2 [{'lower_bound': 0.0, 'upper_bound': 5.1, 'cou... \n", "3 [{'lower_bound': 0.0, 'upper_bound': 35.08, 'c... \n", "4 [{'lower_bound': 0.0, 'upper_bound': 16.5, 'co... \n", "5 [{'lower_bound': 31.2, 'upper_bound': 64.26, '... \n", "6 [{'lower_bound': 12.0, 'upper_bound': 27.8, 'c... \n", "7 [{'lower_bound': 23.2, 'upper_bound': 60.37999... \n", "8 [{'lower_bound': 42.0, 'upper_bound': 55.3, 'c... \n", "9 [{'lower_bound': 0.0, 'upper_bound': 1.8399999... \n", "\n", " numerical_statistics.distribution.kll.sketch.parameters.c \\\n", "0 0.64 \n", "1 0.64 \n", "2 0.64 \n", "3 0.64 \n", "4 0.64 \n", "5 0.64 \n", "6 0.64 \n", "7 0.64 \n", "8 0.64 \n", "9 0.64 \n", "\n", " numerical_statistics.distribution.kll.sketch.parameters.k \\\n", "0 2048.0 \n", "1 2048.0 \n", "2 2048.0 \n", "3 2048.0 \n", "4 2048.0 \n", "5 2048.0 \n", "6 2048.0 \n", "7 2048.0 \n", "8 2048.0 \n", "9 2048.0 \n", "\n", " numerical_statistics.distribution.kll.sketch.data \n", "0 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0,... \n", "1 [[119.0, 100.0, 111.0, 181.0, 95.0, 104.0, 70.... \n", "2 [[19.0, 0.0, 0.0, 40.0, 36.0, 0.0, 0.0, 24.0, ... \n", "3 [[178.1, 160.3, 197.1, 105.2, 283.1, 113.6, 23... \n", "4 [[110.0, 138.0, 117.0, 61.0, 112.0, 87.0, 122.... \n", "5 [[212.8, 221.3, 227.8, 341.3, 286.2, 158.6, 29... \n", "6 [[100.0, 92.0, 128.0, 79.0, 86.0, 98.0, 112.0,... \n", "7 [[226.3, 150.4, 214.0, 165.7, 261.7, 187.7, 20... \n", "8 [[123.0, 120.0, 101.0, 97.0, 129.0, 87.0, 112.... \n", "9 [[10.0, 11.2, 9.3, 6.3, 11.3, 10.5, 0.0, 9.7, ... " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# cell 14\n", "import pandas as pd\n", "\n", "baseline_job = my_default_monitor.latest_baselining_job\n", "schema_df = pd.json_normalize(baseline_job.baseline_statistics().body_dict[\"features\"])\n", "schema_df.head(10)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1Account LengthIntegral1.0True
2VMail MessageIntegral1.0True
3Day MinsFractional1.0True
4Day CallsIntegral1.0True
5Eve MinsFractional1.0True
6Eve CallsIntegral1.0True
7Night MinsFractional1.0True
8Night CallsIntegral1.0True
9Intl MinsFractional1.0True
\n", "
" ], "text/plain": [ " name inferred_type completeness num_constraints.is_non_negative\n", "0 Churn Integral 1.0 True\n", "1 Account Length Integral 1.0 True\n", "2 VMail Message Integral 1.0 True\n", "3 Day Mins Fractional 1.0 True\n", "4 Day Calls Integral 1.0 True\n", "5 Eve Mins Fractional 1.0 True\n", "6 Eve Calls Integral 1.0 True\n", "7 Night Mins Fractional 1.0 True\n", "8 Night Calls Integral 1.0 True\n", "9 Intl Mins Fractional 1.0 True" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# cell 15\n", "constraints_df = pd.json_normalize(baseline_job.suggested_constraints().body_dict[\"features\"])\n", "constraints_df.head(10)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### 2. Analyzing collected data for data quality issues\n", "\n", "When you have collected the data above, analyze and monitor the data with Monitoring Schedules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create a schedule" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# cell 16\n", "# First, copy over some test scripts to the S3 bucket so that they can be used for pre and post processing\n", "boto3.Session().resource('s3').Bucket(bucket).Object(code_prefix+\"/preprocessor.py\").upload_file('preprocessor.py')\n", "boto3.Session().resource('s3').Bucket(bucket).Object(code_prefix+\"/postprocessor.py\").upload_file('postprocessor.py')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create a model monitoring schedule for the endpoint created earlier. Use the baseline resources (constraints and statistics) to compare against the realtime traffic." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# cell 17\n", "from sagemaker.model_monitor import CronExpressionGenerator\n", "from time import gmtime, strftime\n", "\n", "mon_schedule_name = 'DEMO-xgb-churn-pred-model-monitor-schedule-' + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "my_default_monitor.create_monitoring_schedule(\n", " monitor_schedule_name=mon_schedule_name,\n", " endpoint_input=predictor.endpoint_name,\n", " #record_preprocessor_script=pre_processor_script,\n", " post_analytics_processor_script=s3_code_postprocessor_uri,\n", " output_s3_uri=s3_report_path,\n", " statistics=my_default_monitor.baseline_statistics(),\n", " constraints=my_default_monitor.suggested_constraints(),\n", " schedule_cron_expression=CronExpressionGenerator.hourly(),\n", " enable_cloudwatch_metrics=True,\n", "\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Start generating some artificial traffic\n", "The cell below starts a thread to send some traffic to the endpoint. Note that you need to stop the kernel to terminate this thread. If there is no traffic, the monitoring jobs are marked as `Failed` since there is no data to process." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# cell 18\n", "from threading import Thread\n", "from time import sleep\n", "import time\n", "\n", "endpoint_name=predictor.endpoint_name\n", "runtime_client = boto3.client('runtime.sagemaker')\n", "\n", "# (just repeating code from above for convenience/ able to run this section independently)\n", "def invoke_endpoint(ep_name, file_name, runtime_client):\n", " with open(file_name, 'r') as f:\n", " for row in f:\n", " payload = row.rstrip('\\n')\n", " response = runtime_client.invoke_endpoint(EndpointName=ep_name,\n", " ContentType='text/csv', \n", " Body=payload)\n", " response['Body'].read()\n", " time.sleep(1)\n", " \n", "def invoke_endpoint_forever():\n", " while True:\n", " invoke_endpoint(endpoint_name, 'test_data/test-dataset-input-cols.csv', runtime_client)\n", " \n", "thread = Thread(target = invoke_endpoint_forever)\n", "thread.start()\n", "\n", "# Note that you need to stop the kernel to stop the invocations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Describe and inspect the schedule\n", "Once you describe, observe that the MonitoringScheduleStatus changes to Scheduled." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Schedule status: Pending\n" ] } ], "source": [ "# cell 19\n", "desc_schedule_result = my_default_monitor.describe_schedule()\n", "print('Schedule status: {}'.format(desc_schedule_result['MonitoringScheduleStatus']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### List executions\n", "The schedule starts jobs at the previously specified intervals. Here, you list the latest five executions. Note that if you are kicking this off after creating the hourly schedule, you might find the executions empty. You might have to wait until you cross the hour boundary (in UTC) to see executions kick off. The code below has the logic for waiting.\n", "\n", "Note: Even for an hourly schedule, Amazon SageMaker has a buffer period of 20 minutes to schedule your execution. You might see your execution start in anywhere from zero to ~20 minutes from the hour boundary. This is expected and done for load balancing in the backend." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "We created a hourly schedule above and it will kick off executions ON the hour (plus 0 - 20 min buffer.\n", "We will have to wait till we hit the hour...\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30\n", "Waiting for the 1st execution to happen...\n" ] } ], "source": [ "# cell 20\n", "mon_executions = my_default_monitor.list_executions()\n", "print(\"We created a hourly schedule above and it will kick off executions ON the hour (plus 0 - 20 min buffer.\\nWe will have to wait till we hit the hour...\")\n", "\n", "while len(mon_executions) == 0:\n", " print(\"Waiting for the 1st execution to happen...\")\n", " time.sleep(60)\n", " mon_executions = my_default_monitor.list_executions() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Inspect a specific execution (latest execution)\n", "In the previous cell, you picked up the latest completed or failed scheduled execution. Here are the possible terminal states and what each of them mean: \n", "* Completed - This means the monitoring execution completed and no issues were found in the violations report.\n", "* CompletedWithViolations - This means the execution completed, but constraint violations were detected.\n", "* Failed - The monitoring execution failed, maybe due to client error (perhaps incorrect role premissions) or infrastructure issues. Further examination of FailureReason and ExitMessage is necessary to identify what exactly happened.\n", "* Stopped - job exceeded max runtime or was manually stopped." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "............................................!Latest execution status: Completed\n", "Latest execution result: CompletedWithViolations: Job completed successfully with 60 violations.\n" ] } ], "source": [ "# cell 21\n", "latest_execution = mon_executions[-1] # latest execution's index is -1, second to last is -2 and so on..\n", "time.sleep(60)\n", "latest_execution.wait(logs=False)\n", "\n", "print(\"Latest execution status: {}\".format(latest_execution.describe()['ProcessingJobStatus']))\n", "print(\"Latest execution result: {}\".format(latest_execution.describe()['ExitMessage']))\n", "\n", "latest_job = latest_execution.describe()\n", "if (latest_job['ProcessingJobStatus'] != 'Completed'):\n", " print(\"====STOP==== \\n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures.\")" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Report Uri: s3://sagemaker-eu-west-1-802173394839/sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2021-02-02-12-30-27/DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30/2021/02/02/14\n" ] } ], "source": [ "# cell 22\n", "report_uri=latest_execution.output.destination\n", "print('Report Uri: {}'.format(report_uri))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### List the generated reports" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Report bucket: sagemaker-eu-west-1-802173394839\n", "Report key: sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2021-02-02-12-30-27/DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30/2021/02/02/14\n", "Found Report Files:\n", "sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2021-02-02-12-30-27/DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30/2021/02/02/14/constraint_violations.json\n", " sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2021-02-02-12-30-27/DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30/2021/02/02/14/constraints.json\n", " sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-2021-02-02-12-30-27/DEMO-xgb-churn-pred-model-monitor-schedule-2021-02-02-13-09-30/2021/02/02/14/statistics.json\n" ] } ], "source": [ "# cell 23\n", "from urllib.parse import urlparse\n", "s3uri = urlparse(report_uri)\n", "report_bucket = s3uri.netloc\n", "report_key = s3uri.path.lstrip('/')\n", "print('Report bucket: {}'.format(report_bucket))\n", "print('Report key: {}'.format(report_key))\n", "\n", "s3_client = boto3.Session().client('s3')\n", "result = s3_client.list_objects(Bucket=report_bucket, Prefix=report_key)\n", "report_files = [report_file.get(\"Key\") for report_file in result.get('Contents')]\n", "print(\"Found Report Files:\")\n", "print(\"\\n \".join(report_files))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Violations report" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If there are any violations compared to the baseline, they will be listed here." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0State_ILdata_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral.
1State_VTdata_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral.
2State_AKdata_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral.
3VMail Plan_nodata_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral.
4State_NDdata_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral.
5State_NVdata_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral.
6Churndata_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral.
7State_ALdata_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral.
8Area Code_510data_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral.
9State_RIdata_type_checkData type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral.
\n", "
" ], "text/plain": [ " feature_name constraint_check_type \\\n", "0 State_IL data_type_check \n", "1 State_VT data_type_check \n", "2 State_AK data_type_check \n", "3 VMail Plan_no data_type_check \n", "4 State_ND data_type_check \n", "5 State_NV data_type_check \n", "6 Churn data_type_check \n", "7 State_AL data_type_check \n", "8 Area Code_510 data_type_check \n", "9 State_RI data_type_check \n", "\n", " description \n", "0 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral. \n", "1 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral. \n", "2 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral. \n", "3 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral. \n", "4 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral. \n", "5 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral. \n", "6 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 0.0% of data is Integral. \n", "7 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral. \n", "8 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral. \n", "9 Data type match requirement is not met. Expected data type: Integral, Expected match: 100.0%. Observed: Only 99.7% of data is Integral. " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# cell 24\n", "violations = my_default_monitor.latest_monitoring_constraint_violations()\n", "pd.set_option('display.max_colwidth', None)\n", "constraints_df = pd.json_normalize(violations.body_dict[\"violations\"])\n", "constraints_df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Other commands\n", "We can also start and stop the monitoring schedules." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# cell 25\n", "#my_default_monitor.stop_monitoring_schedule()\n", "#my_default_monitor.start_monitoring_schedule()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Delete the resources\n", "\n", "You can keep your endpoint running to continue capturing data. If you do not plan to collect more data or use this endpoint further, you should delete the endpoint to avoid incurring additional charges. Note that deleting your endpoint does not delete the data that was captured during the model invocations. That data persists in Amazon S3 until you delete it yourself.\n", "\n", "But before that, you need to delete the schedule first." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# cell 26\n", "my_default_monitor.delete_monitoring_schedule()\n", "time.sleep(60) # actually wait for the deletion" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# cell 27\n", "predictor.delete_endpoint()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# cell 28\n", "predictor.delete_model()" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Option 2: Model monitoring with Batch transform" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## PART A: Capturing data from Batch Transform jobs\n", "Create a Batch transform job to showcase the data capture capability in action.\n", "\n", "### 1) Upload the pre-trained model to Amazon S3\n", "This code uploads a pre-trained XGBoost model that is ready for you to deploy. This model was trained using the XGB Churn Prediction Notebook in SageMaker. You can also use your own pre-trained model in this step. If you already have a pretrained model in Amazon S3, you can add it instead by specifying the s3_key.\n", " " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [], "source": [ "model_file = open(\"model/xgb-churn-prediction-model.tar.gz\", \"rb\")\n", "s3_key = os.path.join(prefix, \"xgb-churn-prediction-model.tar.gz\")\n", "boto3.Session().resource(\"s3\").Bucket(bucket).Object(s3_key).upload_fileobj(model_file)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [], "source": [ "from time import gmtime, strftime\n", "from sagemaker.model import Model\n", "from sagemaker.image_uris import retrieve\n", "\n", "model_name = \"DEMO-xgb-churn-pred-model-monitor-\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", "model_url = \"https://{}.s3-{}.amazonaws.com/{}/xgb-churn-prediction-model.tar.gz\".format(\n", " bucket, region, prefix\n", ")\n", "\n", "image_uri = retrieve(\"xgboost\", boto3.Session().region_name, \"0.90-1\")\n", "\n", "model = Model(image_uri=image_uri, model_data=model_url, role=role)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2) Upload test data for batch inference that will be used as input for a Batch Transform Job" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "upload: test_data/test-dataset-input-cols.csv to s3://sagemaker-us-east-1-609196052567/transform-input/test-dataset-input-cols.csv\n" ] } ], "source": [ "!aws s3 cp test_data/test-dataset-input-cols.csv s3://{bucket}/transform-input/test-dataset-input-cols.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3) Create the Batch Transform Job" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.inputs import BatchDataCaptureConfig" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:sagemaker:Creating transform job with name: sagemaker-xgboost-2023-03-24-13-40-32-898\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "...................................\u001b[34m[2023-03-24 13:46:14 +0000] [14] [INFO] Starting gunicorn 19.10.0\u001b[0m\n", "\u001b[34m[2023-03-24 13:46:14 +0000] [14] [INFO] Listening at: unix:/tmp/gunicorn.sock (14)\u001b[0m\n", "\u001b[34m[2023-03-24 13:46:14 +0000] [14] [INFO] Using worker: gevent\u001b[0m\n", "\u001b[34m[2023-03-24 13:46:14 +0000] [21] [INFO] Booting worker with pid: 21\u001b[0m\n", "\u001b[34m[2023-03-24 13:46:14 +0000] [22] [INFO] Booting worker with pid: 22\u001b[0m\n", "\u001b[34m[2023-03-24 13:46:14 +0000] [26] [INFO] Booting worker with pid: 26\u001b[0m\n", "\u001b[34m[2023-03-24 13:46:14 +0000] [27] [INFO] Booting worker with pid: 27\u001b[0m\n", "\u001b[34m[2023-03-24:13:46:21:INFO] No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[34m169.254.255.130 - - [24/Mar/2023:13:46:21 +0000] \"GET /ping HTTP/1.1\" 200 0 \"-\" \"Go-http-client/1.1\"\u001b[0m\n", "\u001b[34m[2023-03-24:13:46:21:INFO] No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[34m169.254.255.130 - - [24/Mar/2023:13:46:21 +0000] \"GET /execution-parameters HTTP/1.1\" 200 84 \"-\" \"Go-http-client/1.1\"\u001b[0m\n", "\u001b[34m[2023-03-24:13:46:21:INFO] No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[34m[2023-03-24:13:46:21:INFO] Determined delimiter of CSV input is ','\u001b[0m\n", "\u001b[34m169.254.255.130 - - [24/Mar/2023:13:46:21 +0000] \"POST /invocations HTTP/1.1\" 200 6762 \"-\" \"Go-http-client/1.1\"\u001b[0m\n", "\n", "\u001b[32m2023-03-24T13:46:21.241:[sagemaker logs]: MaxConcurrentTransforms=4, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD\u001b[0m\n" ] } ], "source": [ "transfomer = model.transformer(\n", " instance_count=1,\n", " instance_type=\"ml.m4.xlarge\",\n", " accept=\"text/csv\",\n", " assemble_with=\"Line\",\n", ")\n", "\n", "transfomer.transform(\n", " \"s3://{}/transform-input\".format(bucket),\n", " content_type=\"text/csv\",\n", " split_type=\"Line\",\n", " # configure the data capturing\n", " batch_data_capture_config=BatchDataCaptureConfig(\n", " destination_s3_uri=s3_capture_upload_path,\n", " ),\n", " wait=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4) Examine the Batch Transform Captured Data\n", "\n", "There are two directory under `s3_capture_upload_path`, one is the `/input`, another is the `/output`. Under the `/input` is the captured data file for transform input, whereas, the under the `/output` is the captured data file for transform output. Note that, batch transform data capture is unlike Endpoint data capture, it does not capture the data and log to s3 as this will create tremendous amount of duplications. Instead, batch transform captures data in manifests. The manifests contain the source transform input or output s3 locations.\n", "\n", "Lets take a look at the captured data. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023-03-15 12:40:29 99 sagemaker/DEMO-ModelMonitor/datacapture/input/2023/03/15/12/d2b7486f-1693-4b31-b891-205605cef428.json\n", "2023-03-24 11:45:23 99 sagemaker/DEMO-ModelMonitor/datacapture/input/2023/03/24/11/1da419ce-024e-4f9e-8fc5-b25eb1640777.json\n", "2023-03-24 12:45:27 99 sagemaker/DEMO-ModelMonitor/datacapture/input/2023/03/24/12/8834abd7-4889-4eaa-ab18-85d0af218a46.json\n", "2023-03-24 12:46:05 99 sagemaker/DEMO-ModelMonitor/datacapture/input/2023/03/24/12/97a9ca4e-c887-4b5a-ad74-7154e654a4c7.json\n", "2023-03-24 12:46:51 99 sagemaker/DEMO-ModelMonitor/datacapture/input/2023/03/24/12/d4990cfa-cbf4-4f08-961b-a33a5107d548.json\n", "2023-03-24 13:46:22 99 sagemaker/DEMO-ModelMonitor/datacapture/input/2023/03/24/13/caca8d16-5b96-4ac7-9614-7e4b567f57f0.json\n" ] } ], "source": [ "!aws s3 ls {s3_capture_upload_path}/input/ --recursive" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [] }, "outputs": [], "source": [ "s3 = boto3.client(\"s3\")\n", "\n", "captured_input_s3_key = [\n", " k[\"Key\"]\n", " for k in s3.list_objects_v2(Bucket=bucket, Prefix=f\"{data_capture_prefix}/input/\")[\"Contents\"]\n", "]\n", "assert len(captured_input_s3_key) > 0" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [] }, "outputs": [], "source": [ "sample_input_body = s3.get_object(Bucket=bucket, Key=captured_input_s3_key[0])[\"Body\"]\n", "sample_input_content = json.loads(sample_input_body.read())" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[{'prefix': 's3://sagemaker-us-east-1-609196052567/transform-input'},\n", " '/test-dataset-input-cols.csv']" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample_input_content" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To avoid data duplication, the captured data are manifest files. Each manifest is a JSONL file that contains the Amazon S3 locations of the source objects." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023-03-15 12:40:29 129 sagemaker/DEMO-ModelMonitor/datacapture/output/2023/03/15/12/689dcd0e-b680-4027-b029-a67323bb95ac.json\n", "2023-03-24 11:45:23 129 sagemaker/DEMO-ModelMonitor/datacapture/output/2023/03/24/11/b10c356a-5cfa-4465-8943-cc75a301d93e.json\n", "2023-03-24 12:46:51 129 sagemaker/DEMO-ModelMonitor/datacapture/output/2023/03/24/12/ad75cf2c-38fe-4f9f-9677-02670165a223.json\n", "2023-03-24 12:46:05 129 sagemaker/DEMO-ModelMonitor/datacapture/output/2023/03/24/12/df05677b-2692-4805-8689-02ad744af0c6.json\n", "2023-03-24 12:45:27 129 sagemaker/DEMO-ModelMonitor/datacapture/output/2023/03/24/12/eaa0a70a-54de-47d6-a14a-7227771b941c.json\n", "2023-03-24 13:46:22 129 sagemaker/DEMO-ModelMonitor/datacapture/output/2023/03/24/13/a8129c9d-deba-4bc1-9fa8-e9f3dcf9611f.json\n" ] } ], "source": [ "!aws s3 ls {s3_capture_upload_path}/output/ --recursive" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [] }, "outputs": [], "source": [ "captured_input_s3_key = [\n", " k[\"Key\"]\n", " for k in s3.list_objects_v2(Bucket=bucket, Prefix=f\"{data_capture_prefix}/output/\")[\"Contents\"]\n", "]\n", "assert len(captured_input_s3_key) > 0\n", "sample_output_body = s3.get_object(Bucket=bucket, Key=captured_input_s3_key[0])[\"Body\"]\n", "sample_output_content = json.loads(sample_output_body.read())" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[{'prefix': 's3://sagemaker-us-east-1-609196052567/sagemaker-xgboost-2023-03-15-12-34-45-249/'},\n", " 'test-dataset-input-cols.csv.out']" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample_output_content" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To recap, you observed how you can enable capturing the input or output payloads of your batch transform job with a new parameter. You have also observed what the captured format looks like in Amazon S3. Next, continue to explore how Amazon SageMaker helps with monitoring the data collected in Amazon S3." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## PART B: Model Monitor - Baseling and continuous monitoring\n", "\n", "### 5) Create a Baseline that will be used by Model Monitor\n", "\n", "In addition to collecting the data, Amazon SageMaker provides the capability for you to monitor and evaluate the data observed by Batch transform. For this:\n", "1. Create a baseline with which you compare the realtime traffic. \n", "1. Once a baseline is ready, setup a schedule to continously evaluate and compare against the baseline.\n", "\n", "In general this can be done parallel to the Transform Job\n", "\n", "The training dataset with which you trained the model is usually a good baseline dataset. Note that the training dataset data schema and the inference dataset schema should exactly match (i.e. the number and order of the features).\n", "\n", "From the training dataset you can ask Amazon SageMaker to suggest a set of baseline `constraints` and generate descriptive `statistics` to explore the data. For this example, upload the training dataset that was used to train the pre-trained model included in this example. If you already have it in Amazon S3, you can directly point to it." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Baseline data uri: s3://sagemaker-us-east-1-609196052567/sagemaker/DEMO-ModelMonitor/baselining/data\n", "Baseline results uri: s3://sagemaker-us-east-1-609196052567/sagemaker/DEMO-ModelMonitor/baselining/results\n" ] } ], "source": [ "# copy over the training dataset to Amazon S3 (if you already have it in Amazon S3, you could reuse it)\n", "baseline_prefix = prefix + \"/baselining\"\n", "baseline_data_prefix = baseline_prefix + \"/data\"\n", "baseline_results_prefix = baseline_prefix + \"/results\"\n", "\n", "baseline_data_uri = \"s3://{}/{}\".format(bucket, baseline_data_prefix)\n", "baseline_results_uri = \"s3://{}/{}\".format(bucket, baseline_results_prefix)\n", "print(\"Baseline data uri: {}\".format(baseline_data_uri))\n", "print(\"Baseline results uri: {}\".format(baseline_results_uri))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [] }, "outputs": [], "source": [ "training_data_file = open(\"test_data/training-dataset-with-header.csv\", \"rb\")\n", "s3_key = os.path.join(baseline_prefix, \"data\", \"training-dataset-with-header.csv\")\n", "boto3.Session().resource(\"s3\").Bucket(bucket).Object(s3_key).upload_fileobj(training_data_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that you have the training data ready in Amazon S3, start a job to `suggest` constraints. `DefaultModelMonitor.suggest_baseline(..)` starts a `ProcessingJob` using an Amazon SageMaker provided Model Monitor container to generate the constraints." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: .\n", "INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.\n", "INFO:sagemaker:Creating processing-job with name baseline-suggestion-job-2023-03-24-13-46-53-013\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "..............................\u001b[34m2023-03-24 13:51:45,679 - matplotlib.font_manager - INFO - Generating new fontManager, this may take some time...\u001b[0m\n", "\u001b[34m2023-03-24 13:51:46.273599: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\u001b[0m\n", "\u001b[34m2023-03-24 13:51:46.273633: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\u001b[0m\n", "\u001b[34m2023-03-24 13:51:48.051127: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory\u001b[0m\n", "\u001b[34m2023-03-24 13:51:48.051163: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)\u001b[0m\n", "\u001b[34m2023-03-24 13:51:48.051187: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-10-2-82-156.ec2.internal): /proc/driver/nvidia/version does not exist\u001b[0m\n", "\u001b[34m2023-03-24 13:51:48.051466: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA\u001b[0m\n", "\u001b[34mTo enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,768 - __main__ - INFO - All params:{'ProcessingJobArn': 'arn:aws:sagemaker:us-east-1:609196052567:processing-job/baseline-suggestion-job-2023-03-24-13-46-53-013', 'ProcessingJobName': 'baseline-suggestion-job-2023-03-24-13-46-53-013', 'Environment': {'dataset_format': '{\"csv\": {\"header\": true, \"output_columns_position\": \"START\"}}', 'dataset_source': '/opt/ml/processing/input/baseline_dataset_input', 'output_path': '/opt/ml/processing/output', 'publish_cloudwatch_metrics': 'Disabled'}, 'AppSpecification': {'ImageUri': '156813124566.dkr.ecr.us-east-1.amazonaws.com/sagemaker-model-monitor-analyzer', 'ContainerEntrypoint': None, 'ContainerArguments': None}, 'ProcessingInputs': [{'InputName': 'baseline_dataset_input', 'AppManaged': False, 'S3Input': {'LocalPath': '/opt/ml/processing/input/baseline_dataset_input', 'S3Uri': 's3://sagemaker-us-east-1-609196052567/sagemaker/DEMO-ModelMonitor/baselining/data/training-dataset-with-header.csv', 'S3DataDistributionType': 'FullyReplicated', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3CompressionType': 'None', 'S3DownloadMode': 'StartOfJob'}, 'DatasetDefinition': None}], 'ProcessingOutputConfig': {'Outputs': [{'OutputName': 'monitoring_output', 'AppManaged': False, 'S3Output': {'LocalPath': '/opt/ml/processing/output', 'S3Uri': 's3://sagemaker-us-east-1-609196052567/sagemaker/DEMO-ModelMonitor/baselining/results', 'S3UploadMode': 'EndOfJob'}, 'FeatureStoreOutput': None}], 'KmsKeyId': None}, 'ProcessingResources': {'ClusterConfig': {'InstanceCount': 1, 'InstanceType': 'ml.m5.xlarge', 'VolumeSizeInGB': 20, 'VolumeKmsKeyId': None}}, 'RoleArn': 'arn:aws:iam::609196052567:role/service-role/AmazonSageMaker-ExecutionRole-20221212T101059', 'StoppingCondition': {'MaxRuntimeInSeconds': 3600}}\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,768 - __main__ - INFO - Current Environment:{'dataset_format': '{\"csv\": {\"header\": true, \"output_columns_position\": \"START\"}}', 'dataset_source': '/opt/ml/processing/input/baseline_dataset_input', 'output_path': '/opt/ml/processing/output', 'publish_cloudwatch_metrics': 'Disabled'}\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,768 - DefaultDataAnalyzer - INFO - Performing analysis with input: {\"dataset_source\": \"/opt/ml/processing/input/baseline_dataset_input\", \"dataset_format\": {\"csv\": {\"header\": true, \"output_columns_position\": \"START\"}}, \"output_path\": \"/opt/ml/processing/output\", \"monitoring_input_type\": null, \"analysis_type\": null, \"problem_type\": null, \"inference_attribute\": null, \"probability_attribute\": null, \"ground_truth_attribute\": null, \"probability_threshold_attribute\": null, \"positive_label\": null, \"record_preprocessor_script\": null, \"post_analytics_processor_script\": null, \"baseline_constraints\": null, \"baseline_statistics\": null, \"start_time\": null, \"end_time\": null, \"metric_time\": null, \"cloudwatch_metrics_directory\": \"/opt/ml/output/metrics/cloudwatch\", \"publish_cloudwatch_metrics\": \"Disabled\", \"sagemaker_endpoint_name\": null, \"sagemaker_monitoring_schedule_name\": null, \"output_message_file\": \"/opt/ml/output/message\", \"detect_outliers\": null, \"detect_drift\": null, \"image_data\": null, \"report_enabled\": false, \"auto_ml_job_detail\": null}\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,768 - DefaultDataAnalyzer - INFO - Bootstrapping yarn\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,768 - bootstrap - INFO - Copy aws jars\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,826 - bootstrap - INFO - Copy cluster config\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,827 - bootstrap - INFO - Write runtime cluster config\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,827 - bootstrap - INFO - Resource Config is: {'current_host': 'algo-1', 'hosts': ['algo-1']}\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,841 - bootstrap - INFO - Finished Yarn configuration files setup.\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,841 - bootstrap - INFO - Starting spark process for master node algo-1\u001b[0m\n", "\u001b[34m2023-03-24 13:51:49,841 - bootstrap - INFO - Running command: /usr/hadoop-3.0.0/bin/hdfs namenode -format -force\u001b[0m\n", "\u001b[34mWARNING: /usr/hadoop-3.0.0/logs does not exist. Creating.\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,345 INFO namenode.NameNode: STARTUP_MSG: \u001b[0m\n", "\u001b[34m/************************************************************\u001b[0m\n", "\u001b[34mSTARTUP_MSG: Starting NameNode\u001b[0m\n", "\u001b[34mSTARTUP_MSG: host = algo-1/\u001b[0m\n", "\u001b[34mSTARTUP_MSG: args = [-format, -force]\u001b[0m\n", "\u001b[34mSTARTUP_MSG: version = 3.0.0\u001b[0m\n", "\u001b[34mSTARTUP_MSG: classpath = /usr/hadoop-3.0.0/etc/hadoop:/usr/hadoop-3.0.0/share/hadoop/common/lib/curator-recipes-2.12.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-beanutils-1.9.3.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jetty-security-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/zookeeper-3.4.9.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerb-common-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jetty-http-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-net-3.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jsr311-api-1.1.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jersey-core-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jetty-server-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/junit-4.11.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jsch-0.1.54.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jetty-xml-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jersey-json-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerby-util-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerb-core-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jersey-servlet-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/gson-2.2.4.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/metrics-core-3.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/hadoop-auth-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jetty-webapp-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/paranamer-2.3.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/re2j-1.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jackson-annotations-2.7.8.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/slf4j-api-1.7.25.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/curator-framework-2.12.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/httpcore-4.4.4.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/token-provider-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jetty-servlet-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/asm-5.0.4.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jackson-core-2.7.8.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/nimbus-jose-jwt-4.41.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/avro-1.7.7.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/json-smart-2.3.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jersey-server-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/guava-11.0.2.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-lang3-3.4.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerb-util-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/snappy-java-1.0.5.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerb-server-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/accessors-smart-1.2.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jetty-io-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jetty-util-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/xz-1.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jackson-databind-2.7.8.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jettison-1.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/hadoop-annotations-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-io-2.4.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/netty-3.10.5.Final.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/curator-client-2.12.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerb-client-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/httpclient-4.5.2.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/stax2-api-3.1.4.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/kerby-config-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/hadoop-aws-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/lib/aws-java-sdk-bundle-1.11.199.jar:/usr/hadoop-3.0.0/share/hadoop/common/hadoop-kms-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0-tests.jar:/usr/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/common/hadoop-nfs-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/curator-recipes-2.12.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-beanutils-1.9.3.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jetty-security-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/zookeeper-3.4.9.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jetty-http-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-net-3.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jersey-core-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jetty-server-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/woodstox-core-5.0.3.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jsch-0.1.54.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jetty-xml-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jersey-json-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/gson-2.2.4.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/hadoop-auth-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jetty-webapp-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jcip-annotations-1.0-1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/paranamer-2.3.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/re2j-1.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jackson-annotations-2.7.8.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/okhttp-2.4.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/curator-framework-2.12.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/httpcore-4.4.4.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jetty-servlet-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/asm-5.0.4.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jackson-core-2.7.8.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/nimbus-jose-jwt-4.41.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/avro-1.7.7.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/json-smart-2.3.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jersey-server-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-lang3-3.4.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/snappy-java-1.0.5.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/accessors-smart-1.2.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-compress-1.4.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jetty-io-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jetty-util-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/xz-1.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/okio-1.4.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jackson-databind-2.7.8.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jettison-1.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/hadoop-annotations-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jetty-util-ajax-9.3.19.v20170502.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/netty-3.10.5.Final.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/curator-client-2.12.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/httpclient-4.5.2.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/stax2-api-3.1.4.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-client-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-native-client-3.0.0-tests.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-3.0.0-tests.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-native-client-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-client-3.0.0-tests.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-nfs-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-httpfs-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-client-nativetask-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-tests.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/hbase-hadoop-compat-1.2.6.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/findbugs-annotations-1.3.9-1.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jersey-guice-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.7.8.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/java-util-1.9.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.7.8.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/hbase-protocol-1.2.6.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/htrace-core-3.1.0-incubating.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/hbase-annotations-1.2.6.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/metrics-core-3.0.1.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jsp-api-2.1-6.1.14.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/fst-2.50.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/guice-4.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/commons-math-2.2.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/joni-2.1.2.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jasper-compiler-5.5.23.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jamon-runtime-2.4.1.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/hbase-prefix-tree-1.2.6.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jcodings-1.0.8.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jsp-2.1-6.1.14.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/metrics-core-2.2.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/disruptor-3.3.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/hbase-procedure-1.2.6.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jersey-client-1.19.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/commons-csv-1.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/hbase-server-1.2.6.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jackson-jaxrs-base-2.7.8.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/guice-servlet-4.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/hbase-client-1.2.6.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/jasper-runtime-5.5.23.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/hbase-common-1.2.6.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/commons-el-1.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/hbase-hadoop2-compat-1.2.6.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/servlet-api-2.5-6.1.14.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/lib/json-io-2.5.1.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-router-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-tests-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-registry-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-common-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-api-3.0\u001b[0m\n", "\u001b[34m.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-timelineservice-hbase-tests-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-timelineservice-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-timelineservice-hbase-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-common-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.0.0.jar:/usr/hadoop-3.0.0/share/hadoop/yarn/hadoop-yarn-client-3.0.0.jar\u001b[0m\n", "\u001b[34mSTARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533; compiled by 'andrew' on 2017-12-08T19:16Z\u001b[0m\n", "\u001b[34mSTARTUP_MSG: java = 1.8.0_362\u001b[0m\n", "\u001b[34m************************************************************/\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,353 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,357 INFO namenode.NameNode: createNameNode [-format, -force]\u001b[0m\n", "\u001b[34mFormatting using clusterid: CID-54ded3ab-bf27-4fb1-87bc-cc65ff0cc195\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,885 INFO namenode.FSEditLog: Edit logging is async:true\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,897 INFO namenode.FSNamesystem: KeyProvider: null\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,898 INFO namenode.FSNamesystem: fsLock is fair: true\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,901 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,906 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,906 INFO namenode.FSNamesystem: supergroup = supergroup\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,906 INFO namenode.FSNamesystem: isPermissionEnabled = true\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,906 INFO namenode.FSNamesystem: HA Enabled: false\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,939 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,952 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,952 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,956 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,960 INFO blockmanagement.BlockManager: The block deletion will start around 2023 Mar 24 13:51:50\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,962 INFO util.GSet: Computing capacity for map BlocksMap\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,962 INFO util.GSet: VM type = 64-bit\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,963 INFO util.GSet: 2.0% max memory 3.1 GB = 63.8 MB\u001b[0m\n", "\u001b[34m2023-03-24 13:51:50,963 INFO util.GSet: capacity = 2^23 = 8388608 entries\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,043 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,047 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,047 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,047 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,047 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,047 INFO blockmanagement.BlockManager: defaultReplication = 3\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,048 INFO blockmanagement.BlockManager: maxReplication = 512\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,048 INFO blockmanagement.BlockManager: minReplication = 1\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,048 INFO blockmanagement.BlockManager: maxReplicationStreams = 2\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,048 INFO blockmanagement.BlockManager: redundancyRecheckInterval = 3000ms\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,048 INFO blockmanagement.BlockManager: encryptDataTransfer = false\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,048 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,075 INFO util.GSet: Computing capacity for map INodeMap\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,075 INFO util.GSet: VM type = 64-bit\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,076 INFO util.GSet: 1.0% max memory 3.1 GB = 31.9 MB\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,076 INFO util.GSet: capacity = 2^22 = 4194304 entries\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,077 INFO namenode.FSDirectory: ACLs enabled? false\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,078 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,078 INFO namenode.FSDirectory: XAttrs enabled? true\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,078 INFO namenode.NameNode: Caching file names occurring more than 10 times\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,083 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,086 INFO util.GSet: Computing capacity for map cachedBlocks\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,086 INFO util.GSet: VM type = 64-bit\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,087 INFO util.GSet: 0.25% max memory 3.1 GB = 8.0 MB\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,087 INFO util.GSet: capacity = 2^20 = 1048576 entries\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,093 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,094 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,094 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,097 INFO namenode.FSNamesystem: Retry cache on namenode is enabled\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,097 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,098 INFO util.GSet: Computing capacity for map NameNodeRetryCache\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,098 INFO util.GSet: VM type = 64-bit\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,099 INFO util.GSet: 0.029999999329447746% max memory 3.1 GB = 979.8 KB\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,099 INFO util.GSet: capacity = 2^17 = 131072 entries\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,122 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2064403019-\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,137 INFO common.Storage: Storage directory /opt/amazon/hadoop/hdfs/namenode has been successfully formatted.\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,145 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/amazon/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,230 INFO namenode.FSImageFormatProtobuf: Image file /opt/amazon/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 386 bytes saved in 0 seconds.\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,243 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,248 INFO namenode.NameNode: SHUTDOWN_MSG: \u001b[0m\n", "\u001b[34m/************************************************************\u001b[0m\n", "\u001b[34mSHUTDOWN_MSG: Shutting down NameNode at algo-1/\u001b[0m\n", "\u001b[34m************************************************************/\u001b[0m\n", "\u001b[34m2023-03-24 13:51:51,259 - bootstrap - INFO - Running command: /usr/hadoop-3.0.0/bin/hdfs --daemon start namenode\u001b[0m\n", "\u001b[34m2023-03-24 13:51:53,325 - bootstrap - INFO - Failed to run /usr/hadoop-3.0.0/bin/hdfs --daemon start namenode, return code 1\u001b[0m\n", "\u001b[34m2023-03-24 13:51:53,325 - bootstrap - INFO - Running command: /usr/hadoop-3.0.0/bin/hdfs --daemon start datanode\u001b[0m\n", "\u001b[34m2023-03-24 13:51:55,408 - bootstrap - INFO - Failed to run /usr/hadoop-3.0.0/bin/hdfs --daemon start datanode, return code 1\u001b[0m\n", "\u001b[34m2023-03-24 13:51:55,408 - bootstrap - INFO - Running command: /usr/hadoop-3.0.0/bin/yarn --daemon start resourcemanager\u001b[0m\n", "\u001b[34mWARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.\u001b[0m\n", "\u001b[34mWARNING: /var/log/yarn/ does not exist. Creating.\u001b[0m\n", "\u001b[34m2023-03-24 13:51:57,515 - bootstrap - INFO - Failed to run /usr/hadoop-3.0.0/bin/yarn --daemon start resourcemanager, return code 1\u001b[0m\n", "\u001b[34m2023-03-24 13:51:57,515 - bootstrap - INFO - Running command: /usr/hadoop-3.0.0/bin/yarn --daemon start nodemanager\u001b[0m\n", "\u001b[34mWARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.\u001b[0m\n", "\u001b[34m2023-03-24 13:51:59,621 - bootstrap - INFO - Failed to run /usr/hadoop-3.0.0/bin/yarn --daemon start nodemanager, return code 1\u001b[0m\n", "\u001b[34m2023-03-24 13:51:59,621 - bootstrap - INFO - Running command: /usr/hadoop-3.0.0/bin/yarn --daemon start proxyserver\u001b[0m\n", "\u001b[34mWARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of YARN_LOG_DIR.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:02,197 - bootstrap - INFO - Failed to run /usr/hadoop-3.0.0/bin/yarn --daemon start proxyserver, return code 1\u001b[0m\n", "\u001b[34m2023-03-24 13:52:02,199 - DefaultDataAnalyzer - INFO - Total number of hosts in the cluster: 1\u001b[0m\n", "\u001b[34m2023-03-24 13:52:12,206 - DefaultDataAnalyzer - INFO - Running command: bin/spark-submit --master yarn --deploy-mode client --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --conf spark.serializer=org.apache.spark.serializer.KryoSerializer /opt/amazon/sagemaker-data-analyzer-1.0-jar-with-dependencies.jar --analytics_input /tmp/spark_job_config.json\u001b[0m\n", "\u001b[34m2023-03-24 13:52:14,080 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\u001b[0m\n", "\u001b[34m2023-03-24 13:52:14,556 INFO Main: Start analyzing with args: --analytics_input /tmp/spark_job_config.json\u001b[0m\n", "\u001b[34m2023-03-24 13:52:14,603 INFO Main: Analytics input path: DataAnalyzerParams(/tmp/spark_job_config.json,yarn)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:14,615 INFO FileUtil: Read file from path /tmp/spark_job_config.json.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,095 INFO spark.SparkContext: Running Spark version 3.3.0\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,123 INFO resource.ResourceUtils: ==============================================================\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,123 INFO resource.ResourceUtils: No custom resources configured for spark.driver.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,124 INFO resource.ResourceUtils: ==============================================================\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,124 INFO spark.SparkContext: Submitted application: SageMakerDataAnalyzer\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,152 INFO resource.ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 3, script: , vendor: , memory -> name: memory, amount: 11544, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,166 INFO resource.ResourceProfile: Limiting resource is cpus at 3 tasks per executor\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,168 INFO resource.ResourceProfileManager: Added ResourceProfile id: 0\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,223 INFO spark.SecurityManager: Changing view acls to: root\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,223 INFO spark.SecurityManager: Changing modify acls to: root\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,223 INFO spark.SecurityManager: Changing view acls groups to: \u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,224 INFO spark.SecurityManager: Changing modify acls groups to: \u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,224 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,609 INFO util.Utils: Successfully started service 'sparkDriver' on port 35181.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,644 INFO spark.SparkEnv: Registering MapOutputTracker\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,685 INFO spark.SparkEnv: Registering BlockManagerMaster\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,706 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,707 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,754 INFO spark.SparkEnv: Registering BlockManagerMasterHeartbeat\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,787 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-b215e912-f7c8-4847-8fd9-b2d137725536\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,811 INFO memory.MemoryStore: MemoryStore started with capacity 1458.6 MiB\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,869 INFO spark.SparkEnv: Registering OutputCommitCoordinator\u001b[0m\n", "\u001b[34m2023-03-24 13:52:15,919 INFO spark.SparkContext: Added JAR file:/opt/amazon/sagemaker-data-analyzer-1.0-jar-with-dependencies.jar at spark:// with timestamp 1679665935090\u001b[0m\n", "\u001b[34m2023-03-24 13:52:16,492 INFO client.RMProxy: Connecting to ResourceManager at /\u001b[0m\n", "\u001b[34m2023-03-24 13:52:17,427 INFO conf.Configuration: resource-types.xml not found\u001b[0m\n", "\u001b[34m2023-03-24 13:52:17,427 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:17,435 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (15743 MB per container)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:17,436 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead\u001b[0m\n", "\u001b[34m2023-03-24 13:52:17,436 INFO yarn.Client: Setting up container launch context for our AM\u001b[0m\n", "\u001b[34m2023-03-24 13:52:17,437 INFO yarn.Client: Setting up the launch environment for our AM container\u001b[0m\n", "\u001b[34m2023-03-24 13:52:17,444 INFO yarn.Client: Preparing resources for our AM container\u001b[0m\n", "\u001b[34m2023-03-24 13:52:17,553 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:19,325 INFO yarn.Client: Uploading resource file:/tmp/spark-716b77da-e42e-4441-8aca-4a00d6bde7b9/__spark_libs__3947784375249728031.zip -> hdfs://\u001b[0m\n", "\u001b[34m2023-03-24 13:52:21,582 INFO yarn.Client: Uploading resource file:/tmp/spark-716b77da-e42e-4441-8aca-4a00d6bde7b9/__spark_conf__807357486602545913.zip -> hdfs://\u001b[0m\n", "\u001b[34m2023-03-24 13:52:22,033 INFO spark.SecurityManager: Changing view acls to: root\u001b[0m\n", "\u001b[34m2023-03-24 13:52:22,033 INFO spark.SecurityManager: Changing modify acls to: root\u001b[0m\n", "\u001b[34m2023-03-24 13:52:22,034 INFO spark.SecurityManager: Changing view acls groups to: \u001b[0m\n", "\u001b[34m2023-03-24 13:52:22,034 INFO spark.SecurityManager: Changing modify acls groups to: \u001b[0m\n", "\u001b[34m2023-03-24 13:52:22,034 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:22,067 INFO yarn.Client: Submitting application application_1679665917129_0001 to ResourceManager\u001b[0m\n", "\u001b[34m2023-03-24 13:52:22,285 INFO impl.YarnClientImpl: Submitted application application_1679665917129_0001\u001b[0m\n", "\u001b[34m2023-03-24 13:52:23,293 INFO yarn.Client: Application report for application_1679665917129_0001 (state: ACCEPTED)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:23,296 INFO yarn.Client: \u001b[0m\n", "\u001b[34m#011 client token: N/A\u001b[0m\n", "\u001b[34m#011 diagnostics: [Fri Mar 24 13:52:22 +0000 2023] Scheduler has assigned a container for AM, waiting for AM container to be launched\u001b[0m\n", "\u001b[34m#011 ApplicationMaster host: N/A\u001b[0m\n", "\u001b[34m#011 ApplicationMaster RPC port: -1\u001b[0m\n", "\u001b[34m#011 queue: default\u001b[0m\n", "\u001b[34m#011 start time: 1679665942174\u001b[0m\n", "\u001b[34m#011 final status: UNDEFINED\u001b[0m\n", "\u001b[34m#011 tracking URL: http://algo-1:8088/proxy/application_1679665917129_0001/\u001b[0m\n", "\u001b[34m#011 user: root\u001b[0m\n", "\u001b[34m2023-03-24 13:52:24,300 INFO yarn.Client: Application report for application_1679665917129_0001 (state: ACCEPTED)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:25,303 INFO yarn.Client: Application report for application_1679665917129_0001 (state: ACCEPTED)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:26,306 INFO yarn.Client: Application report for application_1679665917129_0001 (state: ACCEPTED)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:27,311 INFO yarn.Client: Application report for application_1679665917129_0001 (state: ACCEPTED)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,046 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> algo-1, PROXY_URI_BASES -> http://algo-1:8088/proxy/application_1679665917129_0001), /proxy/application_1679665917129_0001\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,315 INFO yarn.Client: Application report for application_1679665917129_0001 (state: RUNNING)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,316 INFO yarn.Client: \u001b[0m\n", "\u001b[34m#011 client token: N/A\u001b[0m\n", "\u001b[34m#011 diagnostics: N/A\u001b[0m\n", "\u001b[34m#011 ApplicationMaster host:\u001b[0m\n", "\u001b[34m#011 ApplicationMaster RPC port: -1\u001b[0m\n", "\u001b[34m#011 queue: default\u001b[0m\n", "\u001b[34m#011 start time: 1679665942174\u001b[0m\n", "\u001b[34m#011 final status: UNDEFINED\u001b[0m\n", "\u001b[34m#011 tracking URL: http://algo-1:8088/proxy/application_1679665917129_0001/\u001b[0m\n", "\u001b[34m#011 user: root\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,318 INFO cluster.YarnClientSchedulerBackend: Application application_1679665917129_0001 has started running.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,331 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38407.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,331 INFO netty.NettyBlockTransferService: Server created on\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,336 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,362 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver,, 38407, None)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,367 INFO storage.BlockManagerMasterEndpoint: Registering block manager with 1458.6 MiB RAM, BlockManagerId(driver,, 38407, None)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,377 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver,, 38407, None)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,378 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver,, 38407, None)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:28,554 INFO util.log: Logging initialized @16106ms to org.sparkproject.jetty.util.log.Slf4jLog\u001b[0m\n", "\u001b[34m2023-03-24 13:52:29,627 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:32,940 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) ( with ID 1, ResourceProfileId 0\u001b[0m\n", "\u001b[34m2023-03-24 13:52:33,132 INFO storage.BlockManagerMasterEndpoint: Registering block manager algo-1:44423 with 5.8 GiB RAM, BlockManagerId(1, algo-1, 44423, None)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:46,374 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:46,484 WARN spark.SparkContext: Spark is not running in local mode, therefore the checkpoint directory must not be on the local filesystem. Directory '/tmp' appears to be on the local filesystem.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:46,529 INFO internal.SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:46,531 INFO internal.SharedState: Warehouse path is 'file:/usr/spark-3.3.0/spark-warehouse'.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:47,555 INFO datasources.InMemoryFileIndex: It took 44 ms to list leaf files for 1 paths.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:47,744 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 416.9 KiB, free 1458.2 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,051 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 39.2 KiB, free 1458.2 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,054 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on (size: 39.2 KiB, free: 1458.6 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,059 INFO spark.SparkContext: Created broadcast 0 from csv at DatasetReader.scala:99\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,407 INFO input.FileInputFormat: Total input files to process : 1\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,409 INFO input.FileInputFormat: Total input files to process : 1\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,414 INFO input.CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 375873\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,475 INFO spark.SparkContext: Starting job: csv at DatasetReader.scala:99\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,499 INFO scheduler.DAGScheduler: Got job 0 (csv at DatasetReader.scala:99) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,499 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (csv at DatasetReader.scala:99)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,500 INFO scheduler.DAGScheduler: Parents of final stage: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,502 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,509 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at csv at DatasetReader.scala:99), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,567 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.3 KiB, free 1458.1 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,571 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.2 KiB, free 1458.1 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,571 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on (size: 4.2 KiB, free: 1458.6 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,572 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,592 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at csv at DatasetReader.scala:99) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,593 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,646 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4641 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:48,941 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on algo-1:44423 (size: 4.2 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:49,894 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on algo-1:44423 (size: 39.2 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,284 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1655 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,286 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,293 INFO scheduler.DAGScheduler: ResultStage 0 (csv at DatasetReader.scala:99) finished in 1.756 s\u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,298 INFO scheduler.DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,298 INFO cluster.YarnScheduler: Killing all running tasks in stage 0: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,300 INFO scheduler.DAGScheduler: Job 0 finished: csv at DatasetReader.scala:99, took 1.825291 s\u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,574 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on in memory (size: 39.2 KiB, free: 1458.6 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,579 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on algo-1:44423 in memory (size: 39.2 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,610 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on in memory (size: 4.2 KiB, free: 1458.6 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:50,635 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on algo-1:44423 in memory (size: 4.2 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,018 INFO datasources.FileSourceStrategy: Pushed Filters: \u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,020 INFO datasources.FileSourceStrategy: Post-Scan Filters: \u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,024 INFO datasources.FileSourceStrategy: Output Data Schema: struct\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,080 WARN util.package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,353 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 416.5 KiB, free 1458.2 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,368 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 39.1 KiB, free 1458.2 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,369 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on (size: 39.1 KiB, free: 1458.6 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,372 INFO spark.SparkContext: Created broadcast 2 from head at DataAnalyzer.scala:100\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,392 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,443 INFO spark.SparkContext: Starting job: head at DataAnalyzer.scala:100\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,445 INFO scheduler.DAGScheduler: Got job 1 (head at DataAnalyzer.scala:100) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,445 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (head at DataAnalyzer.scala:100)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,445 INFO scheduler.DAGScheduler: Parents of final stage: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,448 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,449 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[11] at head at DataAnalyzer.scala:100), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,530 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 29.8 KiB, free 1458.1 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,533 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 11.2 KiB, free 1458.1 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,534 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on (size: 11.2 KiB, free: 1458.6 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,535 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,536 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[11] at head at DataAnalyzer.scala:100) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,536 INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,542 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4969 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:53,614 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on algo-1:44423 (size: 11.2 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:54,654 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on algo-1:44423 (size: 39.1 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:54,988 INFO storage.BlockManagerInfo: Added rdd_7_0 in memory on algo-1:44423 (size: 188.0 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:55,178 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 1639 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:55,179 INFO scheduler.DAGScheduler: ResultStage 1 (head at DataAnalyzer.scala:100) finished in 1.726 s\u001b[0m\n", "\u001b[34m2023-03-24 13:52:55,181 INFO scheduler.DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:52:55,183 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:52:55,183 INFO cluster.YarnScheduler: Killing all running tasks in stage 1: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:52:55,183 INFO scheduler.DAGScheduler: Job 1 finished: head at DataAnalyzer.scala:100, took 1.739630 s\u001b[0m\n", "\u001b[34m2023-03-24 13:52:55,643 INFO codegen.CodeGenerator: Code generated in 340.987124 ms\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,512 INFO scheduler.DAGScheduler: Registering RDD 16 (collect at AnalysisRunner.scala:326) as input to shuffle 0\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,517 INFO scheduler.DAGScheduler: Got map stage job 2 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,518 INFO scheduler.DAGScheduler: Final stage: ShuffleMapStage 2 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,518 INFO scheduler.DAGScheduler: Parents of final stage: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,521 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,526 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[16] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,560 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 125.9 KiB, free 1458.0 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,567 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 37.6 KiB, free 1458.0 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,568 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on (size: 37.6 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,569 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,571 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[16] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,571 INFO cluster.YarnScheduler: Adding task set 2.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,583 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4958 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:56,623 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on algo-1:44423 (size: 37.6 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,820 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 3240 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,820 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,823 INFO scheduler.DAGScheduler: ShuffleMapStage 2 (collect at AnalysisRunner.scala:326) finished in 3.292 s\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,823 INFO scheduler.DAGScheduler: looking for newly runnable stages\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,824 INFO scheduler.DAGScheduler: running: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,824 INFO scheduler.DAGScheduler: waiting: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,825 INFO scheduler.DAGScheduler: failed: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,937 INFO spark.SparkContext: Starting job: collect at AnalysisRunner.scala:326\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,939 INFO scheduler.DAGScheduler: Got job 3 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,940 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,940 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,940 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,942 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[19] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,966 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 178.7 KiB, free 1457.8 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,968 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 49.3 KiB, free 1457.7 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,969 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on (size: 49.3 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,970 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,970 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[19] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,971 INFO cluster.YarnScheduler: Adding task set 4.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,974 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 3) (algo-1, executor 1, partition 0, NODE_LOCAL, 4464 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:52:59,994 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on algo-1:44423 (size: 49.3 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,050 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,375 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 3) in 402 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,375 INFO cluster.YarnScheduler: Removed TaskSet 4.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,376 INFO scheduler.DAGScheduler: ResultStage 4 (collect at AnalysisRunner.scala:326) finished in 0.423 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,377 INFO scheduler.DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,377 INFO cluster.YarnScheduler: Killing all running tasks in stage 4: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,377 INFO scheduler.DAGScheduler: Job 3 finished: collect at AnalysisRunner.scala:326, took 0.439992 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,407 INFO codegen.CodeGenerator: Code generated in 22.972413 ms\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,726 INFO codegen.CodeGenerator: Code generated in 28.578099 ms\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,798 INFO spark.SparkContext: Starting job: treeReduce at KLLRunner.scala:107\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,799 INFO scheduler.DAGScheduler: Got job 4 (treeReduce at KLLRunner.scala:107) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,799 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (treeReduce at KLLRunner.scala:107)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,799 INFO scheduler.DAGScheduler: Parents of final stage: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,800 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,802 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[29] at treeReduce at KLLRunner.scala:107), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,830 INFO memory.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 49.2 KiB, free 1457.7 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,833 INFO memory.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 19.7 KiB, free 1457.7 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,834 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on (size: 19.7 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,835 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,836 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 5 (MapPartitionsRDD[29] at treeReduce at KLLRunner.scala:107) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,836 INFO cluster.YarnScheduler: Adding task set 5.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,838 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 4) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4969 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:00,863 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on algo-1:44423 (size: 19.7 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,212 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 4) in 375 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,213 INFO cluster.YarnScheduler: Removed TaskSet 5.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,215 INFO scheduler.DAGScheduler: ResultStage 5 (treeReduce at KLLRunner.scala:107) finished in 0.410 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,216 INFO scheduler.DAGScheduler: Job 4 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,216 INFO cluster.YarnScheduler: Killing all running tasks in stage 5: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,217 INFO scheduler.DAGScheduler: Job 4 finished: treeReduce at KLLRunner.scala:107, took 0.419225 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,697 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on algo-1:44423 in memory (size: 11.2 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,704 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on in memory (size: 11.2 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,759 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on in memory (size: 19.7 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,786 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on algo-1:44423 in memory (size: 19.7 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,879 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on in memory (size: 37.6 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,882 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on algo-1:44423 in memory (size: 37.6 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,939 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on algo-1:44423 in memory (size: 49.3 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,940 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on in memory (size: 49.3 KiB, free: 1458.6 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,971 INFO codegen.CodeGenerator: Code generated in 163.90282 ms\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,980 INFO scheduler.DAGScheduler: Registering RDD 34 (collect at AnalysisRunner.scala:326) as input to shuffle 1\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,981 INFO scheduler.DAGScheduler: Got map stage job 5 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,981 INFO scheduler.DAGScheduler: Final stage: ShuffleMapStage 6 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,982 INFO scheduler.DAGScheduler: Parents of final stage: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,983 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,986 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[34] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,993 INFO memory.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 87.0 KiB, free 1458.1 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,995 INFO memory.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 27.4 KiB, free 1458.0 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,996 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on (size: 27.4 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,997 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,997 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[34] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:01,998 INFO cluster.YarnScheduler: Adding task set 6.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,000 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 5) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4958 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,016 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on algo-1:44423 (size: 27.4 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,234 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 6.0 (TID 5) in 235 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,235 INFO cluster.YarnScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,236 INFO scheduler.DAGScheduler: ShuffleMapStage 6 (collect at AnalysisRunner.scala:326) finished in 0.249 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,237 INFO scheduler.DAGScheduler: looking for newly runnable stages\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,238 INFO scheduler.DAGScheduler: running: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,241 INFO scheduler.DAGScheduler: waiting: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,242 INFO scheduler.DAGScheduler: failed: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,618 INFO codegen.CodeGenerator: Code generated in 212.80832 ms\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,636 INFO spark.SparkContext: Starting job: collect at AnalysisRunner.scala:326\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,639 INFO scheduler.DAGScheduler: Got job 6 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,639 INFO scheduler.DAGScheduler: Final stage: ResultStage 8 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,640 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 7)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,640 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,643 INFO scheduler.DAGScheduler: Submitting ResultStage 8 (MapPartitionsRDD[37] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,647 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 67.4 KiB, free 1458.0 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,651 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 19.9 KiB, free 1458.0 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,652 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on (size: 19.9 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,653 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,654 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 8 (MapPartitionsRDD[37] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,654 INFO cluster.YarnScheduler: Adding task set 8.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,656 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 8.0 (TID 6) (algo-1, executor 1, partition 0, NODE_LOCAL, 4464 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,683 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on algo-1:44423 (size: 19.9 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,690 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,800 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 8.0 (TID 6) in 144 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,800 INFO cluster.YarnScheduler: Removed TaskSet 8.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,801 INFO scheduler.DAGScheduler: ResultStage 8 (collect at AnalysisRunner.scala:326) finished in 0.157 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,806 INFO scheduler.DAGScheduler: Job 6 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,807 INFO cluster.YarnScheduler: Killing all running tasks in stage 8: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,808 INFO scheduler.DAGScheduler: Job 6 finished: collect at AnalysisRunner.scala:326, took 0.170745 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:02,947 INFO codegen.CodeGenerator: Code generated in 91.523785 ms\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,097 INFO spark.SparkContext: Starting job: countByKey at ColumnProfiler.scala:592\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,103 INFO scheduler.DAGScheduler: Registering RDD 45 (countByKey at ColumnProfiler.scala:592) as input to shuffle 2\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,104 INFO scheduler.DAGScheduler: Got job 7 (countByKey at ColumnProfiler.scala:592) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,104 INFO scheduler.DAGScheduler: Final stage: ResultStage 10 (countByKey at ColumnProfiler.scala:592)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,105 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 9)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,105 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 9)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,109 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 9 (MapPartitionsRDD[45] at countByKey at ColumnProfiler.scala:592), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,125 INFO memory.MemoryStore: Block broadcast_9 stored as values in memory (estimated size 41.9 KiB, free 1457.9 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,131 INFO memory.MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 17.3 KiB, free 1457.9 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,140 INFO storage.BlockManagerInfo: Added broadcast_9_piece0 in memory on (size: 17.3 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,141 INFO spark.SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,142 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 9 (MapPartitionsRDD[45] at countByKey at ColumnProfiler.scala:592) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,142 INFO cluster.YarnScheduler: Adding task set 9.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,145 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 9.0 (TID 7) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4958 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:03,172 INFO storage.BlockManagerInfo: Added broadcast_9_piece0 in memory on algo-1:44423 (size: 17.3 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,867 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 9.0 (TID 7) in 1723 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,867 INFO cluster.YarnScheduler: Removed TaskSet 9.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,869 INFO scheduler.DAGScheduler: ShuffleMapStage 9 (countByKey at ColumnProfiler.scala:592) finished in 1.757 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,870 INFO scheduler.DAGScheduler: looking for newly runnable stages\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,871 INFO scheduler.DAGScheduler: running: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,871 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 10)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,872 INFO scheduler.DAGScheduler: failed: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,872 INFO scheduler.DAGScheduler: Submitting ResultStage 10 (ShuffledRDD[46] at countByKey at ColumnProfiler.scala:592), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,877 INFO memory.MemoryStore: Block broadcast_10 stored as values in memory (estimated size 5.1 KiB, free 1457.9 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,879 INFO memory.MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 3.0 KiB, free 1457.9 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,881 INFO storage.BlockManagerInfo: Added broadcast_10_piece0 in memory on (size: 3.0 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,882 INFO spark.SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,883 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 10 (ShuffledRDD[46] at countByKey at ColumnProfiler.scala:592) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,883 INFO cluster.YarnScheduler: Adding task set 10.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,885 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 10.0 (TID 8) (algo-1, executor 1, partition 0, NODE_LOCAL, 4282 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,898 INFO storage.BlockManagerInfo: Added broadcast_10_piece0 in memory on algo-1:44423 (size: 3.0 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,906 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,954 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 10.0 (TID 8) in 70 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,954 INFO cluster.YarnScheduler: Removed TaskSet 10.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,955 INFO scheduler.DAGScheduler: ResultStage 10 (countByKey at ColumnProfiler.scala:592) finished in 0.081 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,956 INFO scheduler.DAGScheduler: Job 7 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,956 INFO cluster.YarnScheduler: Killing all running tasks in stage 10: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:53:04,958 INFO scheduler.DAGScheduler: Job 7 finished: countByKey at ColumnProfiler.scala:592, took 1.860750 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,164 INFO scheduler.DAGScheduler: Registering RDD 51 (collect at AnalysisRunner.scala:326) as input to shuffle 3\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,165 INFO scheduler.DAGScheduler: Got map stage job 8 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,165 INFO scheduler.DAGScheduler: Final stage: ShuffleMapStage 11 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,165 INFO scheduler.DAGScheduler: Parents of final stage: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,165 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,166 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 11 (MapPartitionsRDD[51] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,172 INFO memory.MemoryStore: Block broadcast_11 stored as values in memory (estimated size 94.9 KiB, free 1457.8 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,173 INFO memory.MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 30.0 KiB, free 1457.8 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,174 INFO storage.BlockManagerInfo: Added broadcast_11_piece0 in memory on (size: 30.0 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,175 INFO spark.SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,175 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 11 (MapPartitionsRDD[51] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,175 INFO cluster.YarnScheduler: Adding task set 11.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,177 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 11.0 (TID 9) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4958 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,186 INFO storage.BlockManagerInfo: Added broadcast_11_piece0 in memory on algo-1:44423 (size: 30.0 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,394 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 11.0 (TID 9) in 218 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,395 INFO cluster.YarnScheduler: Removed TaskSet 11.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,395 INFO scheduler.DAGScheduler: ShuffleMapStage 11 (collect at AnalysisRunner.scala:326) finished in 0.228 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,396 INFO scheduler.DAGScheduler: looking for newly runnable stages\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,396 INFO scheduler.DAGScheduler: running: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,396 INFO scheduler.DAGScheduler: waiting: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,396 INFO scheduler.DAGScheduler: failed: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,466 INFO spark.SparkContext: Starting job: collect at AnalysisRunner.scala:326\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,472 INFO scheduler.DAGScheduler: Got job 9 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,472 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,472 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 12)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,472 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,473 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (MapPartitionsRDD[54] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,482 INFO memory.MemoryStore: Block broadcast_12 stored as values in memory (estimated size 179.7 KiB, free 1457.6 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,484 INFO memory.MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 49.3 KiB, free 1457.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,485 INFO storage.BlockManagerInfo: Added broadcast_12_piece0 in memory on (size: 49.3 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,486 INFO spark.SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,486 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 13 (MapPartitionsRDD[54] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,486 INFO cluster.YarnScheduler: Adding task set 13.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,488 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 10) (algo-1, executor 1, partition 0, NODE_LOCAL, 4464 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,505 INFO storage.BlockManagerInfo: Added broadcast_12_piece0 in memory on algo-1:44423 (size: 49.3 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,519 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,615 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 13.0 (TID 10) in 127 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,615 INFO cluster.YarnScheduler: Removed TaskSet 13.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,616 INFO scheduler.DAGScheduler: ResultStage 13 (collect at AnalysisRunner.scala:326) finished in 0.141 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,617 INFO scheduler.DAGScheduler: Job 9 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,618 INFO cluster.YarnScheduler: Killing all running tasks in stage 13: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,618 INFO scheduler.DAGScheduler: Job 9 finished: collect at AnalysisRunner.scala:326, took 0.151236 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,780 INFO codegen.CodeGenerator: Code generated in 19.797831 ms\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,815 INFO spark.SparkContext: Starting job: treeReduce at KLLRunner.scala:107\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,817 INFO scheduler.DAGScheduler: Got job 10 (treeReduce at KLLRunner.scala:107) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,817 INFO scheduler.DAGScheduler: Final stage: ResultStage 14 (treeReduce at KLLRunner.scala:107)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,817 INFO scheduler.DAGScheduler: Parents of final stage: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,817 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,818 INFO scheduler.DAGScheduler: Submitting ResultStage 14 (MapPartitionsRDD[64] at treeReduce at KLLRunner.scala:107), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,828 INFO memory.MemoryStore: Block broadcast_13 stored as values in memory (estimated size 49.3 KiB, free 1457.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,831 INFO memory.MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 19.8 KiB, free 1457.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,832 INFO storage.BlockManagerInfo: Added broadcast_13_piece0 in memory on (size: 19.8 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,832 INFO spark.SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,833 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 14 (MapPartitionsRDD[64] at treeReduce at KLLRunner.scala:107) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,833 INFO cluster.YarnScheduler: Adding task set 14.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,835 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 14.0 (TID 11) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4969 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,848 INFO storage.BlockManagerInfo: Added broadcast_13_piece0 in memory on algo-1:44423 (size: 19.8 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,963 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 14.0 (TID 11) in 128 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,963 INFO cluster.YarnScheduler: Removed TaskSet 14.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,964 INFO scheduler.DAGScheduler: ResultStage 14 (treeReduce at KLLRunner.scala:107) finished in 0.143 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,965 INFO scheduler.DAGScheduler: Job 10 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,966 INFO cluster.YarnScheduler: Killing all running tasks in stage 14: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:53:05,967 INFO scheduler.DAGScheduler: Job 10 finished: treeReduce at KLLRunner.scala:107, took 0.151521 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,293 INFO codegen.CodeGenerator: Code generated in 125.622282 ms\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,300 INFO scheduler.DAGScheduler: Registering RDD 69 (collect at AnalysisRunner.scala:326) as input to shuffle 4\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,301 INFO scheduler.DAGScheduler: Got map stage job 11 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,301 INFO scheduler.DAGScheduler: Final stage: ShuffleMapStage 15 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,301 INFO scheduler.DAGScheduler: Parents of final stage: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,302 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,303 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 15 (MapPartitionsRDD[69] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,313 INFO memory.MemoryStore: Block broadcast_14 stored as values in memory (estimated size 86.1 KiB, free 1457.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,319 INFO memory.MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 27.0 KiB, free 1457.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,320 INFO storage.BlockManagerInfo: Added broadcast_14_piece0 in memory on (size: 27.0 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,321 INFO spark.SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,321 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 15 (MapPartitionsRDD[69] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,321 INFO cluster.YarnScheduler: Adding task set 15.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,323 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 15.0 (TID 12) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4958 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,342 INFO storage.BlockManagerInfo: Added broadcast_14_piece0 in memory on algo-1:44423 (size: 27.0 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,575 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 15.0 (TID 12) in 252 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,575 INFO cluster.YarnScheduler: Removed TaskSet 15.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,578 INFO scheduler.DAGScheduler: ShuffleMapStage 15 (collect at AnalysisRunner.scala:326) finished in 0.269 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,578 INFO scheduler.DAGScheduler: looking for newly runnable stages\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,578 INFO scheduler.DAGScheduler: running: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,578 INFO scheduler.DAGScheduler: waiting: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,578 INFO scheduler.DAGScheduler: failed: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,870 INFO storage.BlockManagerInfo: Removed broadcast_10_piece0 on in memory (size: 3.0 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,875 INFO codegen.CodeGenerator: Code generated in 193.234663 ms\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,876 INFO storage.BlockManagerInfo: Removed broadcast_10_piece0 on algo-1:44423 in memory (size: 3.0 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,897 INFO spark.SparkContext: Starting job: collect at AnalysisRunner.scala:326\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,905 INFO scheduler.DAGScheduler: Got job 12 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,906 INFO scheduler.DAGScheduler: Final stage: ResultStage 17 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,907 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 16)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,907 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,908 INFO scheduler.DAGScheduler: Submitting ResultStage 17 (MapPartitionsRDD[72] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,910 INFO memory.MemoryStore: Block broadcast_15 stored as values in memory (estimated size 66.8 KiB, free 1457.3 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,915 INFO memory.MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 19.7 KiB, free 1457.3 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,917 INFO storage.BlockManagerInfo: Added broadcast_15_piece0 in memory on (size: 19.7 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,917 INFO spark.SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,918 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 17 (MapPartitionsRDD[72] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,918 INFO cluster.YarnScheduler: Adding task set 17.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,919 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 17.0 (TID 13) (algo-1, executor 1, partition 0, NODE_LOCAL, 4464 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,944 INFO storage.BlockManagerInfo: Added broadcast_15_piece0 in memory on algo-1:44423 (size: 19.7 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,959 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 4 to\u001b[0m\n", "\u001b[34m2023-03-24 13:53:06,996 INFO storage.BlockManagerInfo: Removed broadcast_11_piece0 on in memory (size: 30.0 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,017 INFO storage.BlockManagerInfo: Removed broadcast_11_piece0 on algo-1:44423 in memory (size: 30.0 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,086 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 17.0 (TID 13) in 166 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,086 INFO cluster.YarnScheduler: Removed TaskSet 17.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,087 INFO scheduler.DAGScheduler: ResultStage 17 (collect at AnalysisRunner.scala:326) finished in 0.178 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,087 INFO scheduler.DAGScheduler: Job 12 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,087 INFO cluster.YarnScheduler: Killing all running tasks in stage 17: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,088 INFO scheduler.DAGScheduler: Job 12 finished: collect at AnalysisRunner.scala:326, took 0.183827 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,154 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on algo-1:44423 in memory (size: 27.4 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,156 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on in memory (size: 27.4 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,197 INFO spark.SparkContext: Starting job: countByKey at ColumnProfiler.scala:592\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,199 INFO scheduler.DAGScheduler: Registering RDD 80 (countByKey at ColumnProfiler.scala:592) as input to shuffle 5\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,201 INFO scheduler.DAGScheduler: Got job 13 (countByKey at ColumnProfiler.scala:592) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,201 INFO scheduler.DAGScheduler: Final stage: ResultStage 19 (countByKey at ColumnProfiler.scala:592)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,201 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 18)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,202 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 18)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,204 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 18 (MapPartitionsRDD[80] at countByKey at ColumnProfiler.scala:592), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,222 INFO memory.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 41.9 KiB, free 1457.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,225 INFO memory.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 17.4 KiB, free 1457.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,230 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on (size: 17.4 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,231 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,232 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 18 (MapPartitionsRDD[80] at countByKey at ColumnProfiler.scala:592) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,233 INFO cluster.YarnScheduler: Adding task set 18.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,234 INFO storage.BlockManagerInfo: Removed broadcast_13_piece0 on algo-1:44423 in memory (size: 19.8 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,236 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 18.0 (TID 14) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4958 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,237 INFO storage.BlockManagerInfo: Removed broadcast_13_piece0 on in memory (size: 19.8 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,249 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on algo-1:44423 (size: 17.4 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,258 INFO storage.BlockManagerInfo: Removed broadcast_14_piece0 on in memory (size: 27.0 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,263 INFO storage.BlockManagerInfo: Removed broadcast_14_piece0 on algo-1:44423 in memory (size: 27.0 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,331 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on in memory (size: 19.9 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,332 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 18.0 (TID 14) in 97 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,334 INFO scheduler.DAGScheduler: ShuffleMapStage 18 (countByKey at ColumnProfiler.scala:592) finished in 0.126 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,335 INFO scheduler.DAGScheduler: looking for newly runnable stages\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,335 INFO scheduler.DAGScheduler: running: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,335 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 19)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,336 INFO scheduler.DAGScheduler: failed: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,336 INFO scheduler.DAGScheduler: Submitting ResultStage 19 (ShuffledRDD[81] at countByKey at ColumnProfiler.scala:592), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,338 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 5.1 KiB, free 1457.7 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,341 INFO memory.MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 3.0 KiB, free 1457.7 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,335 INFO cluster.YarnScheduler: Removed TaskSet 18.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,342 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on (size: 3.0 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,346 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on algo-1:44423 in memory (size: 19.9 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,349 INFO spark.SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,351 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 19 (ShuffledRDD[81] at countByKey at ColumnProfiler.scala:592) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,351 INFO cluster.YarnScheduler: Adding task set 19.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,353 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 19.0 (TID 15) (algo-1, executor 1, partition 0, NODE_LOCAL, 4282 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,372 INFO storage.BlockManagerInfo: Removed broadcast_12_piece0 on in memory (size: 49.3 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,374 INFO storage.BlockManagerInfo: Removed broadcast_12_piece0 on algo-1:44423 in memory (size: 49.3 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,375 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on algo-1:44423 (size: 3.0 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,381 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 5 to\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,401 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on in memory (size: 17.3 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,403 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on algo-1:44423 in memory (size: 17.3 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,427 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 19.0 (TID 15) in 74 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,427 INFO cluster.YarnScheduler: Removed TaskSet 19.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,428 INFO scheduler.DAGScheduler: ResultStage 19 (countByKey at ColumnProfiler.scala:592) finished in 0.090 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,428 INFO scheduler.DAGScheduler: Job 13 is finished. Cancelling potential speculative or zombie tasks for this job\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,428 INFO cluster.YarnScheduler: Killing all running tasks in stage 19: Stage finished\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,429 INFO scheduler.DAGScheduler: Job 13 finished: countByKey at ColumnProfiler.scala:592, took 0.230575 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,606 INFO scheduler.DAGScheduler: Registering RDD 86 (collect at AnalysisRunner.scala:326) as input to shuffle 6\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,606 INFO scheduler.DAGScheduler: Got map stage job 14 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,606 INFO scheduler.DAGScheduler: Final stage: ShuffleMapStage 20 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,606 INFO scheduler.DAGScheduler: Parents of final stage: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,607 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,607 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 20 (MapPartitionsRDD[86] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,612 INFO memory.MemoryStore: Block broadcast_18 stored as values in memory (estimated size 94.9 KiB, free 1457.9 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,614 INFO memory.MemoryStore: Block broadcast_18_piece0 stored as bytes in memory (estimated size 30.0 KiB, free 1457.9 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,615 INFO storage.BlockManagerInfo: Added broadcast_18_piece0 in memory on (size: 30.0 KiB, free: 1458.5 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,616 INFO spark.SparkContext: Created broadcast 18 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,616 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 20 (MapPartitionsRDD[86] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,616 INFO cluster.YarnScheduler: Adding task set 20.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,618 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 20.0 (TID 16) (algo-1, executor 1, partition 0, PROCESS_LOCAL, 4958 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,634 INFO storage.BlockManagerInfo: Added broadcast_18_piece0 in memory on algo-1:44423 (size: 30.0 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,801 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 20.0 (TID 16) in 183 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,801 INFO cluster.YarnScheduler: Removed TaskSet 20.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,802 INFO scheduler.DAGScheduler: ShuffleMapStage 20 (collect at AnalysisRunner.scala:326) finished in 0.194 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,803 INFO scheduler.DAGScheduler: looking for newly runnable stages\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,803 INFO scheduler.DAGScheduler: running: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,803 INFO scheduler.DAGScheduler: waiting: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,804 INFO scheduler.DAGScheduler: failed: Set()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,865 INFO spark.SparkContext: Starting job: collect at AnalysisRunner.scala:326\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,866 INFO scheduler.DAGScheduler: Got job 15 (collect at AnalysisRunner.scala:326) with 1 output partitions\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,866 INFO scheduler.DAGScheduler: Final stage: ResultStage 22 (collect at AnalysisRunner.scala:326)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,866 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 21)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,866 INFO scheduler.DAGScheduler: Missing parents: List()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,867 INFO scheduler.DAGScheduler: Submitting ResultStage 22 (MapPartitionsRDD[89] at collect at AnalysisRunner.scala:326), which has no missing parents\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,874 INFO memory.MemoryStore: Block broadcast_19 stored as values in memory (estimated size 179.8 KiB, free 1457.7 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,876 INFO memory.MemoryStore: Block broadcast_19_piece0 stored as bytes in memory (estimated size 49.3 KiB, free 1457.7 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,877 INFO storage.BlockManagerInfo: Added broadcast_19_piece0 in memory on (size: 49.3 KiB, free: 1458.4 MiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,879 INFO spark.SparkContext: Created broadcast 19 from broadcast at DAGScheduler.scala:1513\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,879 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 22 (MapPartitionsRDD[89] at collect at AnalysisRunner.scala:326) (first 15 tasks are for partitions Vector(0))\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,879 INFO cluster.YarnScheduler: Adding task set 22.0 with 1 tasks resource profile 0\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,881 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 22.0 (TID 17) (algo-1, executor 1, partition 0, NODE_LOCAL, 4464 bytes) taskResourceAssignments Map()\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,894 INFO storage.BlockManagerInfo: Added broadcast_19_piece0 in memory on algo-1:44423 (size: 49.3 KiB, free: 5.8 GiB)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:07,908 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 6 to\u001b[0m\n", "\u001b[34m2023-03-24 13:53:08,033 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 22.0 (TID 17) in 152 ms on algo-1 (executor 1) (1/1)\u001b[0m\n", "\u001b[34m2023-03-24 13:53:08,033 INFO cluster.YarnScheduler: Removed TaskSet 22.0, whose tasks have all completed, from pool \u001b[0m\n", "\u001b[34m2023-03-24 13:53:08,034 INFO scheduler.DAGScheduler: ResultStage 22 (collect at AnalysisRunner.scala:326) finished in 0.166 s\u001b[0m\n", "\u001b[34m2023-03-24 13:53:08,037 INFO scheduler.DAGScheduler: Job 15 is finished. \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0ChurnIntegral233300.139306325.00.3462650.01.0[{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou...0.642048.0[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0,...
1Account LengthIntegral23330101.276897236279.039.5524421.0243.0[{'lower_bound': 1.0, 'upper_bound': 25.2, 'co...0.642048.0[[119.0, 100.0, 111.0, 181.0, 95.0, 104.0, 70....
2VMail MessageIntegral233308.21431619164.013.7769080.051.0[{'lower_bound': 0.0, 'upper_bound': 5.1, 'cou...0.642048.0[[19.0, 0.0, 0.0, 40.0, 36.0, 0.0, 0.0, 24.0, ...
3Day MinsFractional23330180.226489420468.453.9871790.0350.8[{'lower_bound': 0.0, 'upper_bound': 35.08, 'c...0.642048.0[[178.1, 160.3, 197.1, 105.2, 283.1, 113.6, 23...
4Day CallsIntegral23330100.259323233905.020.1650080.0165.0[{'lower_bound': 0.0, 'upper_bound': 16.5, 'co...0.642048.0[[110.0, 138.0, 117.0, 61.0, 112.0, 87.0, 122....
5Eve MinsFractional23330200.050107466716.950.01592831.2361.8[{'lower_bound': 31.2, 'upper_bound': 64.26, '...0.642048.0[[212.8, 221.3, 227.8, 341.3, 286.2, 158.6, 29...
6Eve CallsIntegral2333099.573939232306.019.67557812.0170.0[{'lower_bound': 12.0, 'upper_bound': 27.8, 'c...0.642048.0[[100.0, 92.0, 128.0, 79.0, 86.0, 98.0, 112.0,...
7Night MinsFractional23330201.388598469839.650.62796123.2395.0[{'lower_bound': 23.2, 'upper_bound': 60.37999...0.642048.0[[226.3, 150.4, 214.0, 165.7, 261.7, 187.7, 20...
8Night CallsIntegral23330100.227175233830.019.28202942.0175.0[{'lower_bound': 42.0, 'upper_bound': 55.3, 'c...0.642048.0[[123.0, 120.0, 101.0, 97.0, 129.0, 87.0, 112....
9Intl MinsFractional2333010.25306523920.42.7787660.018.4[{'lower_bound': 0.0, 'upper_bound': 1.8399999...0.642048.0[[10.0, 11.2, 9.3, 6.3, 11.3, 10.5, 0.0, 9.7, ...
\n", "
" ], "text/plain": [ " name inferred_type numerical_statistics.common.num_present \\\n", "0 Churn Integral 2333 \n", "1 Account Length Integral 2333 \n", "2 VMail Message Integral 2333 \n", "3 Day Mins Fractional 2333 \n", "4 Day Calls Integral 2333 \n", "5 Eve Mins Fractional 2333 \n", "6 Eve Calls Integral 2333 \n", "7 Night Mins Fractional 2333 \n", "8 Night Calls Integral 2333 \n", "9 Intl Mins Fractional 2333 \n", "\n", " numerical_statistics.common.num_missing numerical_statistics.mean \\\n", "0 0 0.139306 \n", "1 0 101.276897 \n", "2 0 8.214316 \n", "3 0 180.226489 \n", "4 0 100.259323 \n", "5 0 200.050107 \n", "6 0 99.573939 \n", "7 0 201.388598 \n", "8 0 100.227175 \n", "9 0 10.253065 \n", "\n", " numerical_statistics.sum numerical_statistics.std_dev \\\n", "0 325.0 0.346265 \n", "1 236279.0 39.552442 \n", "2 19164.0 13.776908 \n", "3 420468.4 53.987179 \n", "4 233905.0 20.165008 \n", "5 466716.9 50.015928 \n", "6 232306.0 19.675578 \n", "7 469839.6 50.627961 \n", "8 233830.0 19.282029 \n", "9 23920.4 2.778766 \n", "\n", " numerical_statistics.min numerical_statistics.max \\\n", "0 0.0 1.0 \n", "1 1.0 243.0 \n", "2 0.0 51.0 \n", "3 0.0 350.8 \n", "4 0.0 165.0 \n", "5 31.2 361.8 \n", "6 12.0 170.0 \n", "7 23.2 395.0 \n", "8 42.0 175.0 \n", "9 0.0 18.4 \n", "\n", " numerical_statistics.distribution.kll.buckets \\\n", "0 [{'lower_bound': 0.0, 'upper_bound': 0.1, 'cou... \n", "1 [{'lower_bound': 1.0, 'upper_bound': 25.2, 'co... \n", "2 [{'lower_bound': 0.0, 'upper_bound': 5.1, 'cou... \n", "3 [{'lower_bound': 0.0, 'upper_bound': 35.08, 'c... \n", "4 [{'lower_bound': 0.0, 'upper_bound': 16.5, 'co... \n", "5 [{'lower_bound': 31.2, 'upper_bound': 64.26, '... \n", "6 [{'lower_bound': 12.0, 'upper_bound': 27.8, 'c... \n", "7 [{'lower_bound': 23.2, 'upper_bound': 60.37999... \n", "8 [{'lower_bound': 42.0, 'upper_bound': 55.3, 'c... \n", "9 [{'lower_bound': 0.0, 'upper_bound': 1.8399999... \n", "\n", " numerical_statistics.distribution.kll.sketch.parameters.c \\\n", "0 0.64 \n", "1 0.64 \n", "2 0.64 \n", "3 0.64 \n", "4 0.64 \n", "5 0.64 \n", "6 0.64 \n", "7 0.64 \n", "8 0.64 \n", "9 0.64 \n", "\n", " numerical_statistics.distribution.kll.sketch.parameters.k \\\n", "0 2048.0 \n", "1 2048.0 \n", "2 2048.0 \n", "3 2048.0 \n", "4 2048.0 \n", "5 2048.0 \n", "6 2048.0 \n", "7 2048.0 \n", "8 2048.0 \n", "9 2048.0 \n", "\n", " numerical_statistics.distribution.kll.sketch.data \n", "0 [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0,... \n", "1 [[119.0, 100.0, 111.0, 181.0, 95.0, 104.0, 70.... \n", "2 [[19.0, 0.0, 0.0, 40.0, 36.0, 0.0, 0.0, 24.0, ... \n", "3 [[178.1, 160.3, 197.1, 105.2, 283.1, 113.6, 23... \n", "4 [[110.0, 138.0, 117.0, 61.0, 112.0, 87.0, 122.... \n", "5 [[212.8, 221.3, 227.8, 341.3, 286.2, 158.6, 29... \n", "6 [[100.0, 92.0, 128.0, 79.0, 86.0, 98.0, 112.0,... \n", "7 [[226.3, 150.4, 214.0, 165.7, 261.7, 187.7, 20... \n", "8 [[123.0, 120.0, 101.0, 97.0, 129.0, 87.0, 112.... \n", "9 [[10.0, 11.2, 9.3, 6.3, 11.3, 10.5, 0.0, 9.7, ... " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "baseline_job = my_default_monitor.latest_baselining_job\n", "schema_df = pd.json_normalize(baseline_job.baseline_statistics().body_dict[\"features\"])\n", "schema_df.head(10)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1Account LengthIntegral1.0True
2VMail MessageIntegral1.0True
3Day MinsFractional1.0True
4Day CallsIntegral1.0True
5Eve MinsFractional1.0True
6Eve CallsIntegral1.0True
7Night MinsFractional1.0True
8Night CallsIntegral1.0True
9Intl MinsFractional1.0True
\n", "
" ], "text/plain": [ " name inferred_type completeness num_constraints.is_non_negative\n", "0 Churn Integral 1.0 True\n", "1 Account Length Integral 1.0 True\n", "2 VMail Message Integral 1.0 True\n", "3 Day Mins Fractional 1.0 True\n", "4 Day Calls Integral 1.0 True\n", "5 Eve Mins Fractional 1.0 True\n", "6 Eve Calls Integral 1.0 True\n", "7 Night Mins Fractional 1.0 True\n", "8 Night Calls Integral 1.0 True\n", "9 Intl Mins Fractional 1.0 True" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "constraints_df = pd.json_normalize(\n", " baseline_job.suggested_constraints().body_dict[\"features\"]\n", ")\n", "constraints_df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6) Monitoring Schedule\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a schedule" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create a model monitoring schedule. Use the baseline resources (constraints and statistics) to compare against the batch transform inference inputs and outputs." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:sagemaker.model_monitor.model_monitoring:Creating Monitoring Schedule with name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n" ] } ], "source": [ "from sagemaker.model_monitor import CronExpressionGenerator\n", "from sagemaker.model_monitor import BatchTransformInput\n", "from sagemaker.model_monitor import MonitoringDatasetFormat\n", "from time import gmtime, strftime\n", "\n", "statistics_path = \"{}/statistics.json\".format(baseline_results_uri)\n", "constraints_path = \"{}/constraints.json\".format(baseline_results_uri)\n", "\n", "mon_schedule_name = \"DEMO-xgb-churn-pred-model-monitor-schedule-\" + strftime(\n", " \"%Y-%m-%d-%H-%M-%S\", gmtime()\n", ")\n", "my_default_monitor.create_monitoring_schedule(\n", " monitor_schedule_name=mon_schedule_name,\n", " batch_transform_input=BatchTransformInput(\n", " data_captured_destination_s3_uri=s3_capture_upload_path,\n", " destination=\"/opt/ml/processing/input\",\n", " dataset_format=MonitoringDatasetFormat.csv(header=False),\n", " ),\n", " output_s3_uri=s3_report_path,\n", " statistics=statistics_path,\n", " constraints=constraints_path,\n", " schedule_cron_expression=CronExpressionGenerator.hourly(),\n", " enable_cloudwatch_metrics=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7) Describe and inspect the schedule\n", "\n", "Once you describe, observe that the MonitoringScheduleStatus changes to Scheduled." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Schedule status: Pending\n" ] } ], "source": [ "desc_schedule_result = my_default_monitor.describe_schedule()\n", "print(\"Schedule status: {}\".format(desc_schedule_result[\"MonitoringScheduleStatus\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List executions\n", "The schedule starts jobs at the previously specified intervals. Here, you list the latest five executions. Note that if you are kicking this off after creating the hourly schedule, you might find the executions empty. You might have to wait until you cross the hour boundary (in UTC) to see executions kick off. The code below has the logic for waiting.\n", "\n", "Note: Even for an hourly schedule, Amazon SageMaker has a buffer period of 20 minutes to schedule your execution. You might see your execution start in anywhere from zero to ~20 minutes from the hour boundary. This is expected and done for load balancing in the backend." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "We created a hourly schedule above and it will kick off executions ON the hour (plus 0 - 20 min buffer.\n", "We will have to wait till we hit the hour...\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "No executions found for schedule. monitoring_schedule_name: DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13\n", "Waiting for the 1st execution to happen...\n", "Waiting for the 1st execution to happen...\n", "Waiting for the 1st execution to happen...\n", "Waiting for the 1st execution to happen...\n" ] } ], "source": [ "import time\n", "\n", "mon_executions = my_default_monitor.list_executions()\n", "print(\n", " \"We created a hourly schedule above and it will kick off executions ON the hour (plus 0 - 20 min buffer.\\nWe will have to wait till we hit the hour...\"\n", ")\n", "\n", "while len(mon_executions) == 0:\n", " print(\"Waiting for the 1st execution to happen...\")\n", " time.sleep(60)\n", " mon_executions = my_default_monitor.list_executions()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspect a specific execution (latest execution)\n", "In the previous cell, you picked up the latest completed or failed scheduled execution. Here are the possible terminal states and what each of them mean: \n", "* Completed - This means the monitoring execution completed and no issues were found in the violations report.\n", "* CompletedWithViolations - This means the execution completed, but constraint violations were detected.\n", "* Failed - The monitoring execution failed, maybe due to client error (perhaps incorrect role permissions) or infrastructure issues. Further examination of FailureReason and ExitMessage is necessary to identify what exactly happened.\n", "* Stopped - job exceeded max runtime or was manually stopped." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".........................................................!Latest execution status: Completed\n", "Latest execution result: CompletedWithViolations: Job completed successfully with 1 violations.\n" ] } ], "source": [ "latest_execution = mon_executions[\n", " -1\n", "] # latest execution's index is -1, second to last is -2 and so on..\n", "# time.sleep(60)\n", "latest_execution.wait(logs=False)\n", "\n", "print(\"Latest execution status: {}\".format(latest_execution.describe()[\"ProcessingJobStatus\"]))\n", "print(\"Latest execution result: {}\".format(latest_execution.describe()[\"ExitMessage\"]))\n", "\n", "latest_job = latest_execution.describe()\n", "if latest_job[\"ProcessingJobStatus\"] != \"Completed\":\n", " print(\n", " \"====STOP==== \\n No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures.\"\n", " )" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Report Uri: s3://sagemaker-us-east-1-609196052567/sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13/2023/03/24/14\n" ] } ], "source": [ "report_uri = latest_execution.output.destination\n", "print(\"Report Uri: {}\".format(report_uri))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List the generated reports" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Report bucket: sagemaker-us-east-1-609196052567\n", "Report key: sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13/2023/03/24/14\n", "Found Report Files:\n", "sagemaker/DEMO-ModelMonitor/reports/DEMO-xgb-churn-pred-model-monitor-schedule-2023-03-24-13-54-13/2023/03/24/14/constraint_violations.json\n" ] } ], "source": [ "from urllib.parse import urlparse\n", "\n", "s3uri = urlparse(report_uri)\n", "report_bucket = s3uri.netloc\n", "report_key = s3uri.path.lstrip(\"/\")\n", "print(\"Report bucket: {}\".format(report_bucket))\n", "print(\"Report key: {}\".format(report_key))\n", "\n", "s3_client = boto3.Session().client(\"s3\")\n", "result = s3_client.list_objects(Bucket=report_bucket, Prefix=report_key)\n", "report_files = [report_file.get(\"Key\") for report_file in result.get(\"Contents\")]\n", "print(\"Found Report Files:\")\n", "print(\"\\n \".join(report_files))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Violations report" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If there are any violations compared to the baseline, they will be listed here." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.\n", " \n", "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead\n", " This is separate from the ipykernel package so we can avoid doing imports until\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0Missing columnsmissing_column_checkThere are missing columns in current dataset. Number of columns in current dataset: 69, Number of columns in baseline constraints: 70
\n", "
" ], "text/plain": [ " feature_name constraint_check_type \\\n", "0 Missing columns missing_column_check \n", "\n", " description \n", "0 There are missing columns in current dataset. Number of columns in current dataset: 69, Number of columns in baseline constraints: 70 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "violations = my_default_monitor.latest_monitoring_constraint_violations()\n", "pd.set_option(\"display.max_colwidth\", -1)\n", "constraints_df = pd.io.json.json_normalize(violations.body_dict[\"violations\"])\n", "constraints_df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Other commands\n", "We can also start and stop the monitoring schedules." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "# my_default_monitor.stop_monitoring_schedule()\n", "# my_default_monitor.start_monitoring_schedule()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8) Delete the resources\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# my_default_monitor.stop_monitoring_schedule()\n", "# my_default_monitor.delete_monitoring_schedule()\n", "# time.sleep(60) # actually wait for the deletion" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# predictor.delete_model()" ] } ], "metadata": { "anaconda-cloud": {}, "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 21, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 28, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 29, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." }, "nbformat": 4, "nbformat_minor": 4 }