{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[](https://github.com/aws/aws-sdk-pandas)\n", "\n", "# 8 - Redshift - COPY & UNLOAD\n", "\n", "`Amazon Redshift` has two SQL command that help to load and unload large amount of data staging it on `Amazon S3`:\n", "\n", "1 - [COPY](https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html)\n", "\n", "2 - [UNLOAD](https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html)\n", "\n", "Let's take a look and how awswrangler can use it." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install the optional modules first\n", "!pip install 'awswrangler[redshift]'" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import awswrangler as wr\n", "\n", "con = wr.redshift.connect(\"aws-sdk-pandas-redshift\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enter your bucket name:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " ···········································\n" ] } ], "source": [ "import getpass\n", "bucket = getpass.getpass()\n", "path = f\"s3://{bucket}/stage/\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enter your IAM ROLE ARN:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " ····················································································\n" ] } ], "source": [ "iam_role = getpass.getpass()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Creating a DataFrame from the NOAA's CSV files\n", "\n", "[Reference](https://registry.opendata.aws/noaa-ghcn/)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "dt | \n", "element | \n", "value | \n", "m_flag | \n", "q_flag | \n", "s_flag | \n", "obs_time | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "AG000060590 | \n", "1897-01-01 | \n", "TMAX | \n", "170 | \n", "NaN | \n", "NaN | \n", "E | \n", "NaN | \n", "
1 | \n", "AG000060590 | \n", "1897-01-01 | \n", "TMIN | \n", "-14 | \n", "NaN | \n", "NaN | \n", "E | \n", "NaN | \n", "
2 | \n", "AG000060590 | \n", "1897-01-01 | \n", "PRCP | \n", "0 | \n", "NaN | \n", "NaN | \n", "E | \n", "NaN | \n", "
3 | \n", "AGE00135039 | \n", "1897-01-01 | \n", "TMAX | \n", "140 | \n", "NaN | \n", "NaN | \n", "E | \n", "NaN | \n", "
4 | \n", "AGE00135039 | \n", "1897-01-01 | \n", "TMIN | \n", "40 | \n", "NaN | \n", "NaN | \n", "E | \n", "NaN | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
3923594 | \n", "UZM00038457 | \n", "1897-12-31 | \n", "TMIN | \n", "-145 | \n", "NaN | \n", "NaN | \n", "r | \n", "NaN | \n", "
3923595 | \n", "UZM00038457 | \n", "1897-12-31 | \n", "PRCP | \n", "4 | \n", "NaN | \n", "NaN | \n", "r | \n", "NaN | \n", "
3923596 | \n", "UZM00038457 | \n", "1897-12-31 | \n", "TAVG | \n", "-95 | \n", "NaN | \n", "NaN | \n", "r | \n", "NaN | \n", "
3923597 | \n", "UZM00038618 | \n", "1897-12-31 | \n", "PRCP | \n", "66 | \n", "NaN | \n", "NaN | \n", "r | \n", "NaN | \n", "
3923598 | \n", "UZM00038618 | \n", "1897-12-31 | \n", "TAVG | \n", "-45 | \n", "NaN | \n", "NaN | \n", "r | \n", "NaN | \n", "
3923599 rows × 8 columns
\n", "\n", " | id | \n", "dt | \n", "element | \n", "value | \n", "m_flag | \n", "q_flag | \n", "s_flag | \n", "obs_time | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "AG000060590 | \n", "1897-01-01 | \n", "TMAX | \n", "170 | \n", "<NA> | \n", "<NA> | \n", "E | \n", "<NA> | \n", "
1 | \n", "AG000060590 | \n", "1897-01-01 | \n", "PRCP | \n", "0 | \n", "<NA> | \n", "<NA> | \n", "E | \n", "<NA> | \n", "
2 | \n", "AGE00135039 | \n", "1897-01-01 | \n", "TMIN | \n", "40 | \n", "<NA> | \n", "<NA> | \n", "E | \n", "<NA> | \n", "
3 | \n", "AGE00147705 | \n", "1897-01-01 | \n", "TMAX | \n", "164 | \n", "<NA> | \n", "<NA> | \n", "E | \n", "<NA> | \n", "
4 | \n", "AGE00147705 | \n", "1897-01-01 | \n", "PRCP | \n", "0 | \n", "<NA> | \n", "<NA> | \n", "E | \n", "<NA> | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
3923594 | \n", "USW00094967 | \n", "1897-12-31 | \n", "TMAX | \n", "-144 | \n", "<NA> | \n", "<NA> | \n", "6 | \n", "<NA> | \n", "
3923595 | \n", "USW00094967 | \n", "1897-12-31 | \n", "PRCP | \n", "0 | \n", "P | \n", "<NA> | \n", "6 | \n", "<NA> | \n", "
3923596 | \n", "UZM00038457 | \n", "1897-12-31 | \n", "TMAX | \n", "-49 | \n", "<NA> | \n", "<NA> | \n", "r | \n", "<NA> | \n", "
3923597 | \n", "UZM00038457 | \n", "1897-12-31 | \n", "PRCP | \n", "4 | \n", "<NA> | \n", "<NA> | \n", "r | \n", "<NA> | \n", "
3923598 | \n", "UZM00038618 | \n", "1897-12-31 | \n", "PRCP | \n", "66 | \n", "<NA> | \n", "<NA> | \n", "r | \n", "<NA> | \n", "
7847198 rows × 8 columns
\n", "