{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimizing data for analysis with Amazon Athena and AWS Glue" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will continue our open data analytics workflow starting with the AWS Console then moving to using the notebook. Using [AWS Glue](https://aws.amazon.com/glue/) we can automate creating a metadata catalog based on flat files stored on Amazon S3. Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL.\n", "\n", "### Glue Data Catalog\n", "\n", "We have sourced the open dataset from the [Registry of Open Data on AWS](https://registry.opendata.aws/). We also stored the data on S3. Now we are ready to extract, transform, and load the data for analytics. We will use AWS Glue service to do this. First step is to create a logical database entry in the data catalog. Note that we are not creating a physical database which requires resources. This is just a metadata placeholder for the flat file we copied into S3.\n", "\n", "> When creating the data catalog name try choosing a name without hyphens and few characters long. This will make SQL queries more readable and also avoid certain errors when running these queries.\n", "\n", "![Glue Data Catalog](https://s3.amazonaws.com/cloudstory/notebooks-media/glue-data-catalog.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also setup the notebook for accessing AWS Glue service using the ``Boto3`` Python SDK. The ``pandas`` and ``IPython`` dependencies are imported for output formatting purposes only. We also import ``numpy`` a popular statistical analysis library. Charts and visualizations will be supported by ``seaborn`` and ``matplotlib`` libraries. To access the Glue service API we create a Glue client. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import pandas as pd\n", "import numpy as np\n", "from IPython.display import display, Markdown\n", "import seaborn as sns\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "glue = boto3.client('glue')\n", "s3 = boto3.client('s3')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List Glue Databases\n", "We will recreate the AWS Console GUI experience using SDK calls by creating the ``list_glue_databases`` function. We simply get the data catalogs in one statement and iterate over the results in the next one." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def list_glue_databases():\n", " glue_database = glue.get_databases()\n", "\n", " for db in glue_database['DatabaseList']:\n", " print(db['Name'])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "default\n", "odoc\n", "sampledb\n", "taxicatalog\n" ] } ], "source": [ "list_glue_databases()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Glue Crawler\n", "Next, we create a logical table using Glue crawler. This is again just table metadata definition while the actual data is still stored only in the flat file on S3. For this notebook we will define and run the default Glue Crawler to extract and load the metadata schema from our flat file. This requires selection of a data store which is S3 in this case, defining an IAM role for access from Glue to S3, selecting a schedule for the crawler to run repeatedly if required, and output destination of the crawler results. \n", "\n", "> Please ensure that the flat file is stored on S3 within its own folder and you point at the folder when picking the data source during crawler definition. If you point directly to a flat file when running the crawler, it may return zero results when querying using Amazon Athena.\n", "\n", "Glue will pick up folder name for the logical table name. Keeping our data source files in a folder has the added advantage of incremntally updating the folder with updates to the data with more files or updating the original file. Glue will pick up these changes based on crawler run schedule.\n", "\n", "![Glue Crawler](https://s3.amazonaws.com/cloudstory/notebooks-media/glue-crawler.png)\n", "\n", "### Glue Table Metadata\n", "This results in extraction of table metadata stored within our data catalog. The schema with data types is extracted and stored in Glue Data Catalog. Note that the default Glue Crawler understands well-formed CSV files with first row as comma-separated list of column names, and next set of rows representing ordered data records. The Glue Crawler automatically guesses data types based on the contents of the flat file.\n", "\n", "![Table Metadata](https://s3.amazonaws.com/cloudstory/notebooks-media/table-metadata.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transform Data Using Athena\n", "\n", "Transforming big data in notebook environment is not viable. Instead we can use Amazon Athena for large data transforms and bring the results back into our notebook.\n", "\n", "![Athena Transform Data](https://s3.amazonaws.com/cloudstory/notebooks-media/athena-transform-data.png)\n", "\n", "We will use following query to create a well formed table transformed from our original table. Note that we specify the output location so that Athena defined WorkGroup location is not used by default. We also specify the format as ``TEXTFILE`` otherwise default ``PARQUET`` format is used which may generate errors when sampling this data.\n", "\n", "```SQL\n", "CREATE TABLE \n", "IF NOT EXISTS \"taxicatalog\".\"many_trips_well_formed\" \n", "WITH (\n", " external_location = 's3://open-data-analytics-taxi-trips/many-trips-well-formed/',\n", " format = 'TEXTFILE',\n", " field_delimiter = ','\n", ")\n", "AS SELECT vendorid AS vendor,\n", " passenger_count AS passengers,\n", " trip_distance AS distance,\n", " ratecodeid AS rate,\n", " pulocationid AS pick_location,\n", " dolocationid AS drop_location,\n", " payment_type AS payment_type,\n", " fare_amount AS fare,\n", " extra AS extra_fare,\n", " mta_tax AS tax,\n", " tip_amount AS tip,\n", " tolls_amount AS toll,\n", " improvement_surcharge AS surcharge,\n", " total_amount AS total_fare,\n", " tpep_pickup_datetime AS pick_when,\n", " tpep_dropoff_datetime AS drop_when\n", "FROM \"taxicatalog\".\"many_trips\";\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List Glue Tables\n", "In the spirit of AWS Open Data Analytics API we will recreate the AWS Console feature which lists the tables and displays the metadata within one single reusable function. We get the list of table metadata stored within our data catalog by passing the ``database`` parameter. Next we iterate over each table object and display the name, source data file, number of records (estimate), average record size, data size in MB, and the name of the crawler used to extract the table metadata. We also display the list of column names and data types extracted as schema from the flat file stored on S3." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def list_glue_tables(database, verbose=True):\n", " glue_tables = glue.get_tables(DatabaseName=database)\n", " \n", " for table in glue_tables['TableList']:\n", " display(Markdown('**Table: ' + table['Name'] + '**'))\n", " display(Markdown('Location: ' + table['StorageDescriptor']['Location']))\n", " created = table['CreatedBy'].split('/')\n", " display(Markdown('Created by: ' + created[-1]))\n", " if verbose and created[-1] == 'AWS Crawler':\n", " display(Markdown(f'Records: {int(table[\"Parameters\"][\"recordCount\"]):,}'))\n", " display(Markdown(f'Average Record Size: {table[\"Parameters\"][\"averageRecordSize\"]} Bytes'))\n", " display(Markdown(f'Dataset Size: {float(table[\"Parameters\"][\"sizeKey\"])/1024/1024:3.0f} MB'))\n", " display(Markdown(f'Crawler: {table[\"Parameters\"][\"UPDATED_BY_CRAWLER\"]}'))\n", " if verbose:\n", " df_columns = pd.DataFrame.from_dict(table[\"StorageDescriptor\"][\"Columns\"])\n", " display(df_columns[['Name', 'Type']])\n", " display(Markdown('---'))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "**Table: many_trips**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Location: s3://open-data-analytics-taxi-trips/many-trips/" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Created by: AWS-Crawler" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "**Table: many_trips_well_formed**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Location: s3://open-data-analytics-taxi-trips/many-trips-well-formed" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "Created by: manav" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "list_glue_tables('taxicatalog', verbose=False)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "athena = boto3.client('athena')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Athena Query\n", "Our next action is to bring the data created within Athena into the notebook environment using a ``pandas`` DataFrame. This can be done using the ``athena_query`` function which calls the Amazon Athena API to execute a query and store the output within a bucket and folder. This output is then read by a DataFrame which is returned by the function." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def athena_query(query, bucket, folder):\n", " output = 's3://' + bucket + '/' + folder + '/'\n", " response = athena.start_query_execution(QueryString=query, \n", " ResultConfiguration={'OutputLocation': output})\n", " qid = response['QueryExecutionId']\n", " response = athena.get_query_execution(QueryExecutionId=qid)\n", " state = response['QueryExecution']['Status']['State']\n", " while state == 'RUNNING':\n", " response = athena.get_query_execution(QueryExecutionId=qid)\n", " state = response['QueryExecution']['Status']['State']\n", " key = folder + '/' + qid + '.csv'\n", " data_source = {'Bucket': bucket, 'Key': key}\n", " url = s3.generate_presigned_url(ClientMethod = 'get_object', Params = data_source)\n", " data = pd.read_csv(url)\n", " return data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To explore the data within Athena we will query returning thousand random samples." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
vendorpassengersdistanceratepick_locationdrop_locationpayment_typefareextra_faretaxtiptollsurchargetotal_farepick_whendrop_when
0211.25123723619.00.00.50.000.00.39.802018-06-06 10:43:342018-06-06 10:54:58
1111.2011589027.50.00.50.000.00.38.302018-06-06 10:06:222018-06-06 10:15:21
2113.301234236117.00.00.53.550.00.321.352018-06-06 10:17:202018-06-06 10:43:07
3110.90123614017.00.00.51.550.00.39.352018-06-06 10:48:282018-06-06 10:57:08
4111.00114116217.00.00.51.950.00.39.752018-06-06 10:59:282018-06-06 11:08:05
\n", "
" ], "text/plain": [ " vendor passengers distance rate pick_location drop_location \\\n", "0 2 1 1.25 1 237 236 \n", "1 1 1 1.20 1 158 90 \n", "2 1 1 3.30 1 234 236 \n", "3 1 1 0.90 1 236 140 \n", "4 1 1 1.00 1 141 162 \n", "\n", " payment_type fare extra_fare tax tip toll surcharge total_fare \\\n", "0 1 9.0 0.0 0.5 0.00 0.0 0.3 9.80 \n", "1 2 7.5 0.0 0.5 0.00 0.0 0.3 8.30 \n", "2 1 17.0 0.0 0.5 3.55 0.0 0.3 21.35 \n", "3 1 7.0 0.0 0.5 1.55 0.0 0.3 9.35 \n", "4 1 7.0 0.0 0.5 1.95 0.0 0.3 9.75 \n", "\n", " pick_when drop_when \n", "0 2018-06-06 10:43:34 2018-06-06 10:54:58 \n", "1 2018-06-06 10:06:22 2018-06-06 10:15:21 \n", "2 2018-06-06 10:17:20 2018-06-06 10:43:07 \n", "3 2018-06-06 10:48:28 2018-06-06 10:57:08 \n", "4 2018-06-06 10:59:28 2018-06-06 11:08:05 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bucket = 'open-data-analytics-taxi-trips'\n", "folder = 'queries'\n", "query = 'SELECT * FROM \"taxicatalog\".\"many_trips_well_formed\" TABLESAMPLE BERNOULLI(100) LIMIT 1000;'\n", "\n", "df = athena_query(query, bucket, folder)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we will determine statistical correlation between various features (columns) within the given set of samples (records)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
vendorpassengersdistanceratepick_locationdrop_locationpayment_typefareextra_faretaxtiptollsurchargetotal_fare
vendor1.0000000.2836190.0154010.055897-0.024097-0.0051150.013634-0.014083NaN-0.009308-0.0244360.025840NaN-0.015078
passengers0.2836191.0000000.0330530.051624-0.0211660.0037830.0202000.035106NaN-0.0227360.0089360.003765NaN0.033289
distance0.0154010.0330531.0000000.119010-0.119491-0.148011-0.0687320.917127NaN-0.0808280.3897730.401863NaN0.903529
rate0.0558970.0516240.1190101.000000-0.042557-0.053956-0.0077740.185992NaN-0.5012560.0837780.246460NaN0.184445
pick_location-0.024097-0.021166-0.119491-0.0425571.0000000.150656-0.009998-0.129692NaN0.010869-0.028087-0.153488NaN-0.127936
drop_location-0.0051150.003783-0.148011-0.0539560.1506561.0000000.003079-0.162211NaN0.090225-0.042135-0.087721NaN-0.154017
payment_type0.0136340.020200-0.068732-0.007774-0.0099980.0030791.000000-0.073051NaN-0.015087-0.776507-0.068458NaN-0.212893
fare-0.0140830.0351060.9171270.185992-0.129692-0.162211-0.0730511.000000NaN-0.0915080.4252160.395950NaN0.983444
extra_fareNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
tax-0.009308-0.022736-0.080828-0.5012560.0108690.090225-0.015087-0.091508NaN1.0000000.012988-0.148891NaN-0.089695
tip-0.0244360.0089360.3897730.083778-0.028087-0.042135-0.7765070.425216NaN0.0129881.0000000.267483NaN0.555170
toll0.0258400.0037650.4018630.246460-0.153488-0.087721-0.0684580.395950NaN-0.1488910.2674831.000000NaN0.403146
surchargeNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
total_fare-0.0150780.0332890.9035290.184445-0.127936-0.154017-0.2128930.983444NaN-0.0896950.5551700.403146NaN1.000000
\n", "
" ], "text/plain": [ " vendor passengers distance rate pick_location \\\n", "vendor 1.000000 0.283619 0.015401 0.055897 -0.024097 \n", "passengers 0.283619 1.000000 0.033053 0.051624 -0.021166 \n", "distance 0.015401 0.033053 1.000000 0.119010 -0.119491 \n", "rate 0.055897 0.051624 0.119010 1.000000 -0.042557 \n", "pick_location -0.024097 -0.021166 -0.119491 -0.042557 1.000000 \n", "drop_location -0.005115 0.003783 -0.148011 -0.053956 0.150656 \n", "payment_type 0.013634 0.020200 -0.068732 -0.007774 -0.009998 \n", "fare -0.014083 0.035106 0.917127 0.185992 -0.129692 \n", "extra_fare NaN NaN NaN NaN NaN \n", "tax -0.009308 -0.022736 -0.080828 -0.501256 0.010869 \n", "tip -0.024436 0.008936 0.389773 0.083778 -0.028087 \n", "toll 0.025840 0.003765 0.401863 0.246460 -0.153488 \n", "surcharge NaN NaN NaN NaN NaN \n", "total_fare -0.015078 0.033289 0.903529 0.184445 -0.127936 \n", "\n", " drop_location payment_type fare extra_fare tax \\\n", "vendor -0.005115 0.013634 -0.014083 NaN -0.009308 \n", "passengers 0.003783 0.020200 0.035106 NaN -0.022736 \n", "distance -0.148011 -0.068732 0.917127 NaN -0.080828 \n", "rate -0.053956 -0.007774 0.185992 NaN -0.501256 \n", "pick_location 0.150656 -0.009998 -0.129692 NaN 0.010869 \n", "drop_location 1.000000 0.003079 -0.162211 NaN 0.090225 \n", "payment_type 0.003079 1.000000 -0.073051 NaN -0.015087 \n", "fare -0.162211 -0.073051 1.000000 NaN -0.091508 \n", "extra_fare NaN NaN NaN NaN NaN \n", "tax 0.090225 -0.015087 -0.091508 NaN 1.000000 \n", "tip -0.042135 -0.776507 0.425216 NaN 0.012988 \n", "toll -0.087721 -0.068458 0.395950 NaN -0.148891 \n", "surcharge NaN NaN NaN NaN NaN \n", "total_fare -0.154017 -0.212893 0.983444 NaN -0.089695 \n", "\n", " tip toll surcharge total_fare \n", "vendor -0.024436 0.025840 NaN -0.015078 \n", "passengers 0.008936 0.003765 NaN 0.033289 \n", "distance 0.389773 0.401863 NaN 0.903529 \n", "rate 0.083778 0.246460 NaN 0.184445 \n", "pick_location -0.028087 -0.153488 NaN -0.127936 \n", "drop_location -0.042135 -0.087721 NaN -0.154017 \n", "payment_type -0.776507 -0.068458 NaN -0.212893 \n", "fare 0.425216 0.395950 NaN 0.983444 \n", "extra_fare NaN NaN NaN NaN \n", "tax 0.012988 -0.148891 NaN -0.089695 \n", "tip 1.000000 0.267483 NaN 0.555170 \n", "toll 0.267483 1.000000 NaN 0.403146 \n", "surcharge NaN NaN NaN NaN \n", "total_fare 0.555170 0.403146 NaN 1.000000 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corr = df.corr(method ='spearman')\n", "corr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can drop features which show ``NaN`` correlation." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "df = df.drop(columns=['surcharge'])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "corr = df.corr(method ='spearman')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Heatmap\n", "Completing the data science workflow from sourcing big data, wrangling it using Amazon Athena to well formed schema, bringing adequate sample data from Athena to notebook environment, conducting exploratory data analysis, and finally visualizing the results." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def heatmap(corr):\n", " sns.set(style=\"white\")\n", "\n", " # Generate a mask for the upper triangle\n", " mask = np.zeros_like(corr, dtype=np.bool)\n", " mask[np.triu_indices_from(mask)] = True\n", "\n", " # Set up the matplotlib figure\n", " f, ax = plt.subplots(figsize=(11, 9))\n", "\n", " # Generate a custom diverging colormap\n", " cmap = sns.diverging_palette(220, 10, as_cmap=True)\n", "\n", " # Draw the heatmap with the mask and correct aspect ratio\n", " sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0, annot=True, fmt=\"3.2f\",\n", " square=True, linewidths=.5, cbar_kws={\"shrink\": .5})" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "heatmap(corr)" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }