{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "[![AWS SDK for pandas](_static/logo.png \"AWS SDK for pandas\")](https://github.com/aws/aws-sdk-pandas)\n", "\n", "# 18 - QuickSight\n", "\n", "For this tutorial we will use the public AWS COVID-19 data lake.\n", "\n", "References:\n", "\n", "* [A public data lake for analysis of COVID-19 data](https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/)\n", "* [Exploring the public AWS COVID-19 data lake](https://aws.amazon.com/blogs/big-data/exploring-the-public-aws-covid-19-data-lake/)\n", "* [CloudFormation template](https://covid19-lake.s3.us-east-2.amazonaws.com/cfn/CovidLakeStack.template.json)\n", "\n", "*Please, install the CloudFormation template above to have access to the public data lake.*\n", "\n", "*P.S. To be able to access the public data lake, you must allow explicitly QuickSight to access the related external bucket.*" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "import awswrangler as wr\n", "from time import sleep" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "List users of QuickSight account" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/plain": [ "[{'username': 'dev', 'role': 'ADMIN'}]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[{\"username\": user[\"UserName\"], \"role\": user[\"Role\"]} for user in wr.quicksight.list_users('default')]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DatabaseDescription
0aws_sdk_pandasAWS SDK for pandas Test Arena - Glue Database
1awswrangler_test
2covid-19
3defaultDefault Hive database
\n", "
" ], "text/plain": [ " Database Description\n", "0 aws_sdk_pandas AWS SDK for pandas Test Arena - Glue Database\n", "1 awswrangler_test \n", "2 covid-19 \n", "3 default Default Hive database" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.catalog.databases()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DatabaseTableDescriptionColumnsPartitions
0covid-19alleninstitute_comprehend_medicalComprehend Medical results run against Allen I...paper_id, date, dx_name, test_name, procedure_...
1covid-19alleninstitute_metadataMetadata on papers pulled from the Allen Insti...cord_uid, sha, source_x, title, doi, pmcid, pu...
2covid-19country_codesLookup table for country codescountry, alpha-2 code, alpha-3 code, numeric c...
3covid-19county_populationsLookup table for population for each county ba...id, id2, county, state, population estimate 2018
4covid-19covid_knowledge_graph_edgesAWS Knowledge Graph for COVID-19 dataid, label, from, to, score
5covid-19covid_knowledge_graph_nodes_authorAWS Knowledge Graph for COVID-19 dataid, label, first, last, full_name
6covid-19covid_knowledge_graph_nodes_conceptAWS Knowledge Graph for COVID-19 dataid, label, entity, concept
7covid-19covid_knowledge_graph_nodes_institutionAWS Knowledge Graph for COVID-19 dataid, label, institution, country, settlement
8covid-19covid_knowledge_graph_nodes_paperAWS Knowledge Graph for COVID-19 dataid, label, doi, sha_code, publish_time, source...
9covid-19covid_knowledge_graph_nodes_topicAWS Knowledge Graph for COVID-19 dataid, label, topic, topic_num
10covid-19covid_testing_states_dailyUSA total test daily trend by state. Sourced ...date, state, positive, negative, pending, hosp...
11covid-19covid_testing_us_dailyUSA total test daily trend. Sourced from covi...date, states, positive, negative, posneg, pend...
12covid-19covid_testing_us_totalUSA total tests. Sourced from covidtracking.c...positive, negative, posneg, hospitalized, deat...
13covid-19covidcast_dataCMU Delphi's COVID-19 Surveillance Datadata_source, signal, geo_type, time_value, geo...
14covid-19covidcast_metadataCMU Delphi's COVID-19 Surveillance Metadatadata_source, signal, time_type, geo_type, min_...
15covid-19enigma_jhuJohns Hopkins University Consolidated data on ...fips, admin2, province_state, country_region, ...
16covid-19enigma_jhu_timeseriesJohns Hopkins University data on COVID-19 case...uid, fips, iso2, iso3, code3, admin2, latitude...
17covid-19hospital_bedsData on hospital beds and their utilization in...objectid, hospital_name, hospital_type, hq_add...
18covid-19nytimes_countiesData on COVID-19 cases from NY Times at US cou...date, county, state, fips, cases, deaths
19covid-19nytimes_statesData on COVID-19 cases from NY Times at US sta...date, state, fips, cases, deaths
20covid-19prediction_models_county_predictionsCounty-level Predictions Data. Sourced from Yu...countyfips, countyname, statename, severity_co...
21covid-19prediction_models_severity_indexSeverity Index models. Sourced from Yu Group a...severity_1-day, severity_2-day, severity_3-day...
22covid-19tableau_covid_datahubCOVID-19 data that has been gathered and unifi...country_short_name, country_alpha_3_code, coun...
23covid-19tableau_jhuJohns Hopkins University data on COVID-19 case...case_type, cases, difference, date, country_re...
24covid-19us_state_abbreviationsLookup table for US state abbreviationsstate, abbreviation
25covid-19world_cases_deaths_testingData on confirmed cases, deaths, and testing. ...iso_code, location, date, total_cases, new_cas...
\n", "
" ], "text/plain": [ " Database Table \\\n", "0 covid-19 alleninstitute_comprehend_medical \n", "1 covid-19 alleninstitute_metadata \n", "2 covid-19 country_codes \n", "3 covid-19 county_populations \n", "4 covid-19 covid_knowledge_graph_edges \n", "5 covid-19 covid_knowledge_graph_nodes_author \n", "6 covid-19 covid_knowledge_graph_nodes_concept \n", "7 covid-19 covid_knowledge_graph_nodes_institution \n", "8 covid-19 covid_knowledge_graph_nodes_paper \n", "9 covid-19 covid_knowledge_graph_nodes_topic \n", "10 covid-19 covid_testing_states_daily \n", "11 covid-19 covid_testing_us_daily \n", "12 covid-19 covid_testing_us_total \n", "13 covid-19 covidcast_data \n", "14 covid-19 covidcast_metadata \n", "15 covid-19 enigma_jhu \n", "16 covid-19 enigma_jhu_timeseries \n", "17 covid-19 hospital_beds \n", "18 covid-19 nytimes_counties \n", "19 covid-19 nytimes_states \n", "20 covid-19 prediction_models_county_predictions \n", "21 covid-19 prediction_models_severity_index \n", "22 covid-19 tableau_covid_datahub \n", "23 covid-19 tableau_jhu \n", "24 covid-19 us_state_abbreviations \n", "25 covid-19 world_cases_deaths_testing \n", "\n", " Description \\\n", "0 Comprehend Medical results run against Allen I... \n", "1 Metadata on papers pulled from the Allen Insti... \n", "2 Lookup table for country codes \n", "3 Lookup table for population for each county ba... \n", "4 AWS Knowledge Graph for COVID-19 data \n", "5 AWS Knowledge Graph for COVID-19 data \n", "6 AWS Knowledge Graph for COVID-19 data \n", "7 AWS Knowledge Graph for COVID-19 data \n", "8 AWS Knowledge Graph for COVID-19 data \n", "9 AWS Knowledge Graph for COVID-19 data \n", "10 USA total test daily trend by state. Sourced ... \n", "11 USA total test daily trend. Sourced from covi... \n", "12 USA total tests. Sourced from covidtracking.c... \n", "13 CMU Delphi's COVID-19 Surveillance Data \n", "14 CMU Delphi's COVID-19 Surveillance Metadata \n", "15 Johns Hopkins University Consolidated data on ... \n", "16 Johns Hopkins University data on COVID-19 case... \n", "17 Data on hospital beds and their utilization in... \n", "18 Data on COVID-19 cases from NY Times at US cou... \n", "19 Data on COVID-19 cases from NY Times at US sta... \n", "20 County-level Predictions Data. Sourced from Yu... \n", "21 Severity Index models. Sourced from Yu Group a... \n", "22 COVID-19 data that has been gathered and unifi... \n", "23 Johns Hopkins University data on COVID-19 case... \n", "24 Lookup table for US state abbreviations \n", "25 Data on confirmed cases, deaths, and testing. ... \n", "\n", " Columns Partitions \n", "0 paper_id, date, dx_name, test_name, procedure_... \n", "1 cord_uid, sha, source_x, title, doi, pmcid, pu... \n", "2 country, alpha-2 code, alpha-3 code, numeric c... \n", "3 id, id2, county, state, population estimate 2018 \n", "4 id, label, from, to, score \n", "5 id, label, first, last, full_name \n", "6 id, label, entity, concept \n", "7 id, label, institution, country, settlement \n", "8 id, label, doi, sha_code, publish_time, source... \n", "9 id, label, topic, topic_num \n", "10 date, state, positive, negative, pending, hosp... \n", "11 date, states, positive, negative, posneg, pend... \n", "12 positive, negative, posneg, hospitalized, deat... \n", "13 data_source, signal, geo_type, time_value, geo... \n", "14 data_source, signal, time_type, geo_type, min_... \n", "15 fips, admin2, province_state, country_region, ... \n", "16 uid, fips, iso2, iso3, code3, admin2, latitude... \n", "17 objectid, hospital_name, hospital_type, hq_add... \n", "18 date, county, state, fips, cases, deaths \n", "19 date, state, fips, cases, deaths \n", "20 countyfips, countyname, statename, severity_co... \n", "21 severity_1-day, severity_2-day, severity_3-day... \n", "22 country_short_name, country_alpha_3_code, coun... \n", "23 case_type, cases, difference, date, country_re... \n", "24 state, abbreviation \n", "25 iso_code, location, date, total_cases, new_cas... " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.catalog.tables(database=\"covid-19\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Create data source of QuickSight\n", "Note: data source stores the connection information." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "wr.quicksight.create_athena_data_source(\n", " name=\"covid-19\",\n", " workgroup=\"primary\",\n", " allowed_to_manage={\"users\": [\"dev\"]},\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DatabaseTableDescriptionColumnsPartitions
0covid-19nytimes_countiesData on COVID-19 cases from NY Times at US cou...date, county, state, fips, cases, deaths
1covid-19nytimes_statesData on COVID-19 cases from NY Times at US sta...date, state, fips, cases, deaths
\n", "
" ], "text/plain": [ " Database Table \\\n", "0 covid-19 nytimes_counties \n", "1 covid-19 nytimes_states \n", "\n", " Description \\\n", "0 Data on COVID-19 cases from NY Times at US cou... \n", "1 Data on COVID-19 cases from NY Times at US sta... \n", "\n", " Columns Partitions \n", "0 date, county, state, fips, cases, deaths \n", "1 date, state, fips, cases, deaths " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.catalog.tables(database=\"covid-19\", name_contains=\"nyt\")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecountystatefipscasesdeaths
02020-01-21SnohomishWashington5306110
12020-01-22SnohomishWashington5306110
22020-01-23SnohomishWashington5306110
32020-01-24CookIllinois1703110
42020-01-24SnohomishWashington5306110
52020-01-25OrangeCalifornia0605910
62020-01-25CookIllinois1703110
72020-01-25SnohomishWashington5306110
82020-01-26MaricopaArizona0401310
92020-01-26Los AngelesCalifornia0603710
\n", "
" ], "text/plain": [ " date county state fips cases deaths\n", "0 2020-01-21 Snohomish Washington 53061 1 0\n", "1 2020-01-22 Snohomish Washington 53061 1 0\n", "2 2020-01-23 Snohomish Washington 53061 1 0\n", "3 2020-01-24 Cook Illinois 17031 1 0\n", "4 2020-01-24 Snohomish Washington 53061 1 0\n", "5 2020-01-25 Orange California 06059 1 0\n", "6 2020-01-25 Cook Illinois 17031 1 0\n", "7 2020-01-25 Snohomish Washington 53061 1 0\n", "8 2020-01-26 Maricopa Arizona 04013 1 0\n", "9 2020-01-26 Los Angeles California 06037 1 0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.athena.read_sql_query(\"SELECT * FROM nytimes_counties limit 10\", database=\"covid-19\", ctas_approach=False)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
datecountystatefipsconfirmeddeathspopulationcounty2Hospitalhospital_fipslicensed_bedsstaffed_bedsicu_bedsbed_utilizationpotential_increase_bed_capacity
02020-04-12ParkMontana300677016736Park030067252540.4325480
12020-04-12RavalliMontana300813043172Ravalli030081252550.5677810
22020-04-12Silver BowMontana3009311034993Silver Bow0300939871110.55145727
32020-04-12ClayNebraska31035206214Clay<NA><NA><NA><NA><NA>NaN<NA>
42020-04-12CumingNebraska31039208940Cuming031039252540.2044930
................................................
2276842020-06-11HockleyTexas4821928122980Hockley048219484880.1206050
2276852020-06-11HudspethTexas482291104795Hudspeth<NA><NA><NA><NA><NA>NaN<NA>
2276862020-06-11JonesTexas48253633019817Jones04825345710.71859138
2276872020-06-11La SalleTexas48283407531La Salle<NA><NA><NA><NA><NA>NaN<NA>
2276882020-06-11LimestoneTexas4829336123519Limestone048293786990.1639409
\n", "

227689 rows × 15 columns

\n", "
" ], "text/plain": [ " date county state fips confirmed deaths population \\\n", "0 2020-04-12 Park Montana 30067 7 0 16736 \n", "1 2020-04-12 Ravalli Montana 30081 3 0 43172 \n", "2 2020-04-12 Silver Bow Montana 30093 11 0 34993 \n", "3 2020-04-12 Clay Nebraska 31035 2 0 6214 \n", "4 2020-04-12 Cuming Nebraska 31039 2 0 8940 \n", "... ... ... ... ... ... ... ... \n", "227684 2020-06-11 Hockley Texas 48219 28 1 22980 \n", "227685 2020-06-11 Hudspeth Texas 48229 11 0 4795 \n", "227686 2020-06-11 Jones Texas 48253 633 0 19817 \n", "227687 2020-06-11 La Salle Texas 48283 4 0 7531 \n", "227688 2020-06-11 Limestone Texas 48293 36 1 23519 \n", "\n", " county2 Hospital hospital_fips licensed_beds staffed_beds \\\n", "0 Park 0 30067 25 25 \n", "1 Ravalli 0 30081 25 25 \n", "2 Silver Bow 0 30093 98 71 \n", "3 Clay \n", "4 Cuming 0 31039 25 25 \n", "... ... ... ... ... ... \n", "227684 Hockley 0 48219 48 48 \n", "227685 Hudspeth \n", "227686 Jones 0 48253 45 7 \n", "227687 La Salle \n", "227688 Limestone 0 48293 78 69 \n", "\n", " icu_beds bed_utilization potential_increase_bed_capacity \n", "0 4 0.432548 0 \n", "1 5 0.567781 0 \n", "2 11 0.551457 27 \n", "3 NaN \n", "4 4 0.204493 0 \n", "... ... ... ... \n", "227684 8 0.120605 0 \n", "227685 NaN \n", "227686 1 0.718591 38 \n", "227687 NaN \n", "227688 9 0.163940 9 \n", "\n", "[227689 rows x 15 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = \"\"\"\n", "SELECT \n", " j.*, \n", " co.Population, \n", " co.county AS county2, \n", " hb.* \n", "FROM \n", " (\n", " SELECT \n", " date, \n", " county, \n", " state, \n", " fips, \n", " cases as confirmed, \n", " deaths \n", " FROM \"covid-19\".nytimes_counties\n", " ) j \n", " LEFT OUTER JOIN (\n", " SELECT \n", " DISTINCT county, \n", " state, \n", " \"population estimate 2018\" AS Population \n", " FROM \n", " \"covid-19\".county_populations \n", " WHERE \n", " state IN (\n", " SELECT \n", " DISTINCT state \n", " FROM \n", " \"covid-19\".nytimes_counties\n", " ) \n", " AND county IN (\n", " SELECT \n", " DISTINCT county as county \n", " FROM \"covid-19\".nytimes_counties\n", " )\n", " ) co ON co.county = j.county \n", " AND co.state = j.state \n", " LEFT OUTER JOIN (\n", " SELECT \n", " count(objectid) as Hospital, \n", " fips as hospital_fips, \n", " sum(num_licensed_beds) as licensed_beds, \n", " sum(num_staffed_beds) as staffed_beds, \n", " sum(num_icu_beds) as icu_beds, \n", " avg(bed_utilization) as bed_utilization, \n", " sum(\n", " potential_increase_in_bed_capac\n", " ) as potential_increase_bed_capacity \n", " FROM \"covid-19\".hospital_beds \n", " WHERE \n", " fips in (\n", " SELECT \n", " DISTINCT fips \n", " FROM \n", " \"covid-19\".nytimes_counties\n", " ) \n", " GROUP BY \n", " 2\n", " ) hb ON hb.hospital_fips = j.fips\n", "\"\"\"\n", "\n", "wr.athena.read_sql_query(sql, database=\"covid-19\", ctas_approach=False)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Create Dataset with custom SQL option" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "wr.quicksight.create_athena_dataset(\n", " name=\"covid19-nytimes-usa\",\n", " sql=sql,\n", " sql_name='CustomSQL',\n", " data_source_name=\"covid-19\",\n", " import_mode='SPICE',\n", " allowed_to_manage={\"users\": [\"dev\"]},\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "ingestion_id = wr.quicksight.create_ingestion(\"covid19-nytimes-usa\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Wait ingestion" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "while wr.quicksight.describe_ingestion(ingestion_id=ingestion_id, dataset_name=\"covid19-nytimes-usa\")[\"IngestionStatus\"] not in [\"COMPLETED\", \"FAILED\"]:\n", " sleep(1)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Describe last ingestion" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'RowsIngested': 227689, 'RowsDropped': 0}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.quicksight.describe_ingestion(ingestion_id=ingestion_id, dataset_name=\"covid19-nytimes-usa\")[\"RowInfo\"]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "List all ingestions" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'time': datetime.datetime(2020, 6, 12, 15, 13, 46, 996000, tzinfo=tzlocal()),\n", " 'source': 'MANUAL'},\n", " {'time': datetime.datetime(2020, 6, 12, 15, 13, 42, 344000, tzinfo=tzlocal()),\n", " 'source': 'MANUAL'}]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[{\"time\": user[\"CreatedTime\"], \"source\": user[\"RequestSource\"]} for user in wr.quicksight.list_ingestions(\"covid19-nytimes-usa\")]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Create new dataset from a table directly" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "wr.quicksight.create_athena_dataset(\n", " name=\"covid-19-tableau_jhu\",\n", " table=\"tableau_jhu\",\n", " data_source_name=\"covid-19\",\n", " database=\"covid-19\",\n", " import_mode='DIRECT_QUERY',\n", " rename_columns={\n", " \"cases\": \"Count_of_Cases\", \n", " \"combined_key\": \"County\"\n", " },\n", " cast_columns_types={\n", " \"Count_of_Cases\": \"INTEGER\"\n", " },\n", " tag_columns={\n", " \"combined_key\": [{\"ColumnGeographicRole\": \"COUNTY\"}]\n", " },\n", " allowed_to_manage={\"users\": [\"dev\"]},\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Cleaning up" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "wr.quicksight.delete_data_source(\"covid-19\")\n", "wr.quicksight.delete_dataset(\"covid19-nytimes-usa\")\n", "wr.quicksight.delete_dataset(\"covid-19-tableau_jhu\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.14", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.14" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 4 }