{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "# Generate Credit Card Transactions\n",
    "**This notebook generates credit card transactions and randomly injects fraud chain attacks.**\n",
    "\n",
    "**THIS NOTEBOOK CAN BE RUN IN PARALLEL WITH `1_setup.ipynb`**\n",
    "\n",
    "**Recommended settings to run this notebook in SageMaker Studio:**\n",
    "\n",
    "- Image: Data Science\n",
    "- Kernel: Python3\n",
    "- Instance type: <font color='blue'>ml.m5.large (2 vCPU + 8 GiB)</font>\n",
    "\n",
    "---\n",
    "\n",
    "## Contents\n",
    "\n",
    "1. [Background](#Background)\n",
    "1. [Setup](#Setup)\n",
    "1. [Generate Transactions](#Generate-Transactions)\n",
    "1. [Inject Fradulent Transactions](#Inject-Fradulent-Transactions)\n",
    "1. [Save Generated Data](#Save-Generated-Data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Background\n",
    "This notebook generates random credit card transactions for 10K users over a period of 5 months. In an ideal scenario, these historical transactions would be accumulated into a data lake/store for batch processing so as to derive insights and analytics about this data. Credit card numbers can be bought in bulk on the dark web through previous leaks or hacks of organizations that store this sensitive data. Fraudsters will buy these card lists and attempt to make as many transactions as possible with the stolen numbers until the card is blocked. These fraud chain attacks typically happen in a short time frame and can be easily spotted amongst historical transactions. This is because the velocity of transactions during the attack significantly differs from that of cardholder’s usual spending pattern. This notebook is optional to run. The generated data already exists in the `./data` folder for you to use. Re-run this notebook if you desire to re-populate fresh data or understand the whole process of how this dataset was generated."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Prerequisites "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install Faker"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Imports "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from botocore.client import ClientError\n",
    "from collections import defaultdict\n",
    "from faker import Faker\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import sagemaker\n",
    "import datetime\n",
    "import hashlib\n",
    "import random\n",
    "import boto3\n",
    "import math\n",
    "import os"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Seed for Reproducibility"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "faker = Faker()\n",
    "faker.seed_locale('en_US', 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "SEED = 123\n",
    "random.seed(SEED)\n",
    "np.random.seed(SEED)\n",
    "faker.seed_instance(SEED)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Constants "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "TOTAL_UNIQUE_TRANSACTIONS = 5400000 # 5.4 Million\n",
    "TOTAL_UNIQUE_USERS = 10000\n",
    "BUCKET = sagemaker.Session().default_bucket()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Generate Transactions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Generate Unique Credit Card Numbers \n",
    "<p> Credit card numbers are uniquely assigned to users. Since, there are 10K users, we would want to generate 10K unique card numbers.</p>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def generate_unique_credit_card_numbers(n: int) -> list:\n",
    "    cc_ids = set()\n",
    "    for _ in range(n):\n",
    "        cc_id = faker.credit_card_number(card_type='visa')\n",
    "        cc_ids.add(cc_id)\n",
    "    return list(cc_ids) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "credit_card_numbers = generate_unique_credit_card_numbers(TOTAL_UNIQUE_USERS)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "assert len(credit_card_numbers) == 10000 \n",
    "assert len(credit_card_numbers[0]) == 16 # validate if generated number is 16-digit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# inspect random sample of credit card numbers \n",
    "random.sample(credit_card_numbers, 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Generate Time Series\n",
    "<p>Generate 5.4 Million random timestamps spread across a period of 5 months (2022-01-01 to 2022-06-01) in sorted order.</p>\n",
    "<b>Note:</b> The timestamps are NOT unique themselves. We can have 2 or more transactions occurring at the same time coming from different users. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def generate_timestamps(n: int) -> list:\n",
    "    start = datetime.datetime.strptime('2022-01-01 00:00:00', '%Y-%m-%d %H:%M:%S')\n",
    "    end = datetime.datetime.strptime('2022-06-01 00:01:00', '%Y-%m-%d %H:%M:%S')\n",
    "    timestamps = list()\n",
    "    for _ in range(n):\n",
    "        timestamp = faker.date_time_between(start_date=start, end_date=end, tzinfo=None).strftime('%Y-%m-%d %H:%M:%S')\n",
    "        timestamps.append(timestamp)\n",
    "    timestamps = sorted(timestamps)\n",
    "    return timestamps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "timestamps = generate_timestamps(TOTAL_UNIQUE_TRANSACTIONS)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "assert len(timestamps) == TOTAL_UNIQUE_TRANSACTIONS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# inspect random sample of timestamps\n",
    "random.sample(timestamps, 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Generate Random Transaction Amounts \n",
    "<p>The transaction amounts are presumed to follow Pareto distribution, as it is logical for consumers to make many more smaller purchases than large ones. The break down of the distribution is shown in the table below.</p>\n",
    "\n",
    "\n",
    "| Percentage        | Range (Amount in $)     |\n",
    "| :-------------: | :----------: |\n",
    "|  5\\% | 0.01 to 1    |\n",
    "| 7.5\\%   | 1 to 10 |\n",
    "| 52.5\\%   | 10 to 100 |\n",
    "| 25\\%   | 100 to 1000 |\n",
    "| 10\\%   | 1000 to 10000 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def get_random_transaction_amount(start: float, end: float) -> float:\n",
    "    amt = round(np.random.uniform(start, end), 2)\n",
    "    return amt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "distribution_percentages = {0.05: (0.01, 1.01), \n",
    "                            0.075: (1, 11.01),\n",
    "                            0.525: (10, 100.01),\n",
    "                            0.25: (100, 1000.01),\n",
    "                            0.10: (1000, 10000.01)}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "amounts = []\n",
    "\n",
    "for percentage, span in distribution_percentages.items():\n",
    "    n = int(TOTAL_UNIQUE_TRANSACTIONS * percentage)\n",
    "    start, end = span\n",
    "    for _ in range(n):\n",
    "        amounts.append(get_random_transaction_amount(start, end+1))\n",
    "        \n",
    "random.shuffle(amounts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "assert len(amounts) == TOTAL_UNIQUE_TRANSACTIONS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# inspect random sample of transaction amounts\n",
    "random.sample(amounts, 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Generate Credit Card Transactions\n",
    "<br>\n",
    "<div style=\"text-align: justify\">\n",
    "Using the random credit card numbers, timestamps and transaction amounts generated in the above steps, \n",
    "we can generate random credit card transactions by combining them. The transaction id for the transaction is the md5\n",
    "hash of the above mentioned entities.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def generate_transaction_id(timestamp: str, credit_card_number: str, transaction_amount: float) -> str:\n",
    "    hashable = f'{timestamp}{credit_card_number}{transaction_amount}'\n",
    "    hexdigest = hashlib.md5(hashable.encode('utf-8')).hexdigest()\n",
    "    return hexdigest"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "transactions = []\n",
    "for timestamp, amount in zip(timestamps, amounts):\n",
    "    credit_card_number = random.choice(credit_card_numbers)\n",
    "    transaction_id = generate_transaction_id(timestamp, credit_card_number, amount)\n",
    "    transactions.append({'tid': transaction_id, \n",
    "                         'datetime': timestamp, \n",
    "                         'cc_num': credit_card_number, \n",
    "                         'amount': amount, \n",
    "                         'fraud_label': 0})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "assert len(transactions) == TOTAL_UNIQUE_TRANSACTIONS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# inspect random sample of credit card transactions\n",
    "random.sample(transactions, 1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Inject Fradulent Transactions\n",
    "<p> A typical fraud chain looks like the one as shown in the image below.</p>\n",
    "\n",
    "![SegmentLocal](images/fraud_pattern.png \"connection\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "FRAUD_RATIO = 0.0025 # percentage of transactions that are fraudulent\n",
    "NUMBER_OF_FRAUDULENT_TRANSACTIONS = int(FRAUD_RATIO * TOTAL_UNIQUE_TRANSACTIONS)\n",
    "ATTACK_CHAIN_LENGTHS = [3, 4, 5, 6, 7, 8, 9, 10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Create Transaction Chains "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "visited = set()\n",
    "chains = defaultdict(list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def size(chains: dict) -> int:\n",
    "    counts = {key: len(values)+1 for (key, values) in chains.items()}\n",
    "    return sum(counts.values())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def create_attack_chain(i: int):\n",
    "    chain_length = random.choice(ATTACK_CHAIN_LENGTHS)\n",
    "    for j in range(1, chain_length):\n",
    "        if i+j not in visited:\n",
    "            if size(chains) == NUMBER_OF_FRAUDULENT_TRANSACTIONS:\n",
    "                break\n",
    "            chains[i].append(i+j)\n",
    "            visited.add(i+j)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "while size(chains) < NUMBER_OF_FRAUDULENT_TRANSACTIONS:\n",
    "    i = random.choice(range(TOTAL_UNIQUE_TRANSACTIONS))\n",
    "    if i not in visited:\n",
    "        create_attack_chain(i)\n",
    "        visited.add(i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "assert size(chains) == NUMBER_OF_FRAUDULENT_TRANSACTIONS"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Modify Transactions with Fraud Chain Attacks "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def generate_timestamps_for_fraud_attacks(timestamp: str, chain_length: int) -> list:\n",
    "    timestamps = []\n",
    "    timestamp = datetime.datetime.strptime(timestamp, '%Y-%m-%d %H:%M:%S')\n",
    "    for _ in range(chain_length):\n",
    "        # interval in seconds between fraudulent attacks\n",
    "        delta = random.randint(30, 120)\n",
    "        current = timestamp + datetime.timedelta(seconds=delta)\n",
    "        timestamps.append(current.strftime('%Y-%m-%d %H:%M:%S'))\n",
    "        timestamp = current\n",
    "    return timestamps "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def generate_amounts_for_fraud_attacks(chain_length: int) -> list:\n",
    "    amounts = []\n",
    "    for percentage, span in distribution_percentages.items():\n",
    "        n = math.ceil(chain_length * percentage)\n",
    "        start, end = span\n",
    "        for _ in range(n):\n",
    "            amounts.append(get_random_transaction_amount(start, end+1))\n",
    "    return amounts[:chain_length]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "for key, chain in chains.items():\n",
    "    transaction = transactions[key]\n",
    "    timestamp = transaction['datetime']\n",
    "    cc_num = transaction['cc_num']\n",
    "    amount = transaction['amount']\n",
    "    transaction['fraud_label'] = 1\n",
    "    inject_timestamps = generate_timestamps_for_fraud_attacks(timestamp, len(chain))\n",
    "    inject_amounts = generate_amounts_for_fraud_attacks(len(chain))\n",
    "    random.shuffle(inject_amounts)\n",
    "    for i, idx in enumerate(chain):\n",
    "            original_transaction = transactions[idx]\n",
    "            inject_timestamp = inject_timestamps[i]\n",
    "            original_transaction['datetime'] = inject_timestamp\n",
    "            original_transaction['fraud_label'] = 1\n",
    "            original_transaction['cc_num'] = cc_num\n",
    "            original_transaction['amount'] = inject_amounts[i]\n",
    "            original_transaction['tid'] = generate_transaction_id(inject_timestamp, cc_num, amount)\n",
    "            transactions[idx] = original_transaction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Transform Transactions to Pandas DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "transactions_df = pd.DataFrame(transactions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "fraud_transactions = transactions_df[transactions_df.fraud_label.eq(1)]\n",
    "fraud_transactions.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "assert fraud_transactions.count()[0] == NUMBER_OF_FRAUDULENT_TRANSACTIONS"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Save Generated Data\n",
    "<p> The generated raw transactions data will be used by the next step = SageMaker PySpark Processing Job to do aggregations on the raw data columns and derive new features which are useful for model training in the later steps.\n",
    "The generated data is saved locally and then copied to S3 bucket.</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Save Transactions Data to Local Folder ./data and upload to S3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "data_dir = os.path.join(os.getcwd(), 'data/raw')\n",
    "os.makedirs(data_dir, exist_ok=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "transactions_df.to_csv(f'{data_dir}/transactions.csv', index=False)\n",
    "transactions_df.to_csv(f's3://{BUCKET}/raw/transactions.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   }
  ],
  "instance_type": "ml.g4dn.xlarge",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  },
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}