# How to use SQL to query data in S3 Bucket with Amazon Athena and AWS SDK for .NET This Project provides a sample implementation that will show how to leverage [Amazon Athena](https://aws.amazon.com/athena/) from .NET Core Application using [AWS SDK for .NET](https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/welcome.html) to run standard SQL to analyze a large amount of data in [Amazon S3](https://aws.amazon.com/s3/). To showcase a more realistic use-case, it includes a WebApp UI developed using [ReactJs](https://reactjs.org/). this WebApp contains components to demonstrate fetching COVID-19 data from API Server that uses AWS SDK for .NET to connect to Amazon Athena and run SQL Standard query from datasets on Amazon S3 files from a Data Lake account. This Data Lake account is the [aws-covid19-lake](https://registry.opendata.aws/aws-covid19-lake/) account, made available on [Registry of Open Data on AWS](https://registry.opendata.aws/) Those ReatJs Components call .NET Core API that runs Amazon Athena Query, check the execution status, and list results. Each menu presents different views. **Menu option _Testing By Date_**: Shows a filter by Date that presents a table with the following data: Date, State, Positive, Negative, Pending, Hospitalized, Death, Positive Increase **Menu option _Testing By State_**: Shows a filter by State that presents a table with the following data: Date, State, Positive, Negative, Pending, Hospitalized, Death Positive Increase **Menu option _Hospitals (Run&Go)_**: Run a request to the API server, get 200 with the Query ID, check the status of the execution; when the execution it's completed, it presents a table with the following data: Name, State, Type, ZipCode, Licensed Beds, Staffed Beds, Potential Increase in Beds **Menu option _Hospitals (Run&Go)_**: Run request to the API server, wait for the result and presents a table with the following data: Name, State, Type, Zip Code, Licensed Beds, Staffed Beds, Potential Increase in Beds # Steps To run this project follow the instructions bellow: ## 1) Deploy Glue Catalog & Athena Database/Tables ```bash #1) Deploy aws cloudformation create-stack --stack-name covid-lake-stack --template-url https://covid19-lake.s3.us-east-2.amazonaws.com/cfn/CovidLakeStack.template.json --region us-west-2 #2) Check deployment Status aws cloudformation describe-stacks --stack-name covid-lake-stack --region us-west-2 ``` Below the result of status check, wait for **"StackStatus": "CREATE_COMPLETE"** ```json { "Stacks": [ { "StackId": "arn:aws:cloudformation:us-west-2:XXXXXXXX9152:stack/covid-lake-stack/xxxxxxxx-100d-11eb-87ef-xxxxxxxxxxx", "StackName": "covid-lake-stack", "CreationTime": "2020-10-17T00:12:09.151Z", "RollbackConfiguration": {}, "StackStatus": "CREATE_COMPLETE", "DisableRollback": false, "NotificationARNs": [], "Tags": [], "EnableTerminationProtection": false, "DriftInformation": { "StackDriftStatus": "NOT_CHECKED" } } ] } ``` ## 2) Create S3 bucket for Athena Result ```bash #1) Deploy S3 Bucket aws cloudformation deploy --stack-name athena-results-netcore --template-file ./src/cloud-formation-templates/s3-athena-result.template.yaml --region us-west-2 #2) Check deployment Status aws cloudformation describe-stacks --stack-name athena-results-netcore --region us-west-2 ``` Below the result of status check, wait for **"StackStatus": "CREATE_COMPLETE"** and copy output Bucket Name **"OutputValue": "s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/",** you will need this to run your code ```json { "Stacks": [ { "StackId": "arn:aws:cloudformation:us-west-2:XXXXXXXX9152:stack/athena-results-netcore/xxxxxxxx-100c-11eb-889f-xxxxxxxxxxx", "StackName": "athena-results-netcore", "Description": "Amazon S3 bucket to store Athena query results", "CreationTime": "2020-10-17T00:02:44.968Z", "LastUpdatedTime": "2020-10-17T00:21:13.692Z", "RollbackConfiguration": { "RollbackTriggers": [] }, "StackStatus": "CREATE_COMPLETE", "DisableRollback": false, "NotificationARNs": [], "Outputs": [ { "OutputKey": "BucketName", "OutputValue": "s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/", "Description": "Name of the Amazon S3 bucket to store Athena query results" } ], "Tags": [], "EnableTerminationProtection": false, "DriftInformation": { "StackDriftStatus": "NOT_CHECKED" } } ] } ``` ## 3) COVID-19 Analisys (optional) Some SQL Query that you can try on your own using [Amazon Athena Console UI]((https://us-west-2.console.aws.amazon.com/athena/home?region=us-west-2#query/)). This step is optional for this demo, but it helps you explore and learn more about Amazon Athena using Console UI ```sql -- The following query returns the growth of confirmed cases for the past 7 days joined side-by-side with hospital bed availability, broken down by US county: SELECT cases.fips, admin2 as county, province_state, confirmed, growth_count, sum(num_licensed_beds) as num_licensed_beds, sum(num_staffed_beds) as num_staffed_beds, sum(num_icu_beds) as num_icu_beds FROM "covid-19"."hospital_beds" beds, ( SELECT fips, admin2, province_state, confirmed, last_value(confirmed) over (partition by fips order by last_update) - first_value(confirmed) over (partition by fips order by last_update) as growth_count, first_value(last_update) over (partition by fips order by last_update desc) as most_recent, last_update FROM "covid-19"."enigma_jhu" WHERE from_iso8601_timestamp(last_update) > now() - interval '200' day AND country_region = 'US') cases WHERE beds.fips = cases.fips AND last_update = most_recent GROUP BY cases.fips, confirmed, growth_count, admin2, province_state ORDER BY growth_count desc --Last 10 records regarding Testing and deaths SELECT * FROM "covid-19"."world_cases_deaths_testing" order by "date" desc limit 10; -- Last 10 records regarding Testing and deaths with JOIN on us_state_abbreviations to list State name SELECT date, positive, negative, pending, hospitalized, death, total, deathincrease, hospitalizedincrease, negativeincrease, positiveincrease, sta.state AS state_abbreviation, abb.state FROM "covid-19"."covid_testing_states_daily" sta JOIN "covid-19"."us_state_abbreviations" abb ON sta.state = abb.abbreviation limit 500; ``` ## 4) Build & Run .NET Web Application 1) Go to the app root dir ```bash cd ./src/app/AthenaNetCore/ ``` 2) Create AWS Credential file, **_for security precaution the file extension *.env is added to .gitignore to avoid accidental commit_** ```bash code aws-credentials-do-not-commit.env #You can use any text editor eg: vi -> vi aws-credentials-do-not-commit.env ``` Below example of env file content, replace the XXXX... with your real AWS Credential, and add to S3_RESULT the output result you got from steep 2) ```ini AWS_DEFAULT_REGION=us-west-2 AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXXXXXX AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX AWS_SESSION_TOKEN=XXXXX #(Optional, used only in case of temporary token, you'll need to remove this comment on the .env file) S3_RESULT_BUCKET_NAME=s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/ #paste the bucket name you've copied on the step 2, you'll need to remove this comment on the .env file) ``` 3) Build .NET APP using docker-compose ```bash docker-compose -f ./docker-compose.yml build ``` 4) Run .NET APP docker-compose ```bash docker-compose -f ./docker-compose.yml up ``` 5) Test .NET APP via URL <http://localhost:8089/> 6) Clean up ```bash # 1) Clean local resources docker-compose down -v # 2) Clean s3 objects created by Athena to store Results metadata aws s3 rm --recursive s3://athena-results-netcore-s3bucket-xxxxxxxxxxxx/athena/results/ # 3) Delete S3 bucket aws cloudformation delete-stack --stack-name athena-results-netcore --region us-west-2 # 4) Delete Athena Tables aws cloudformation delete-stack --stack-name covid-lake-stack ``` # References <https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/> <https://docs.aws.amazon.com/athena/latest/ug/code-samples.html> <https://aws.amazon.com/blogs/apn/using-athena-express-to-simplify-sql-queries-on-amazon-athena/> <https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/net-dg-config-creds.html> <https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html> <https://docs.aws.amazon.com/sdk-for-net/latest/developer-guide/creds-assign.html> <https://github.com/awsdocs/aws-cloud9-user-guide/blob/master/LICENSE-SAMPLECODE>