{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 1: Prepare datasets \n",
    "**This notebook generates and transforms datasets into a ML ready state to be ingested into SageMaker Feature Store.**\n",
    "\n",
    "**Note:** Please set kernel to `Python 3 (Data Science)` and select instance to `ml.t3.medium`\n",
    "\n",
    "\n",
    "---\n",
    "\n",
    "## Contents\n",
    "\n",
    "1. [Background](#Background)\n",
    "1. [Setup](#Setup)\n",
    "1. [Generate Online Grocery shopping dataset](#Generate-Online-Grocery-shopping-dataset)\n",
    "1. [Transform raw features into Machine Learning ready features](#Transform-raw-features-into-Machine-Learning-ready-features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Background\n",
    "\n",
    "This notebook generates a relational set of files describing customers’ orders over time.\n",
    "\n",
    "We have preset the notebook to generate a sample of 100,000 synthetic grocery orders from a total of 10,000 synthetically generated customers list. For each customer, the notebook generates between 1 to 10 of their orders, with products purchased in each order. The notebook also generates a timestamp on which the order was placed. The goal of the generated dataset and the example notebooks contained in this repository is to illustrate how to use **SageMaker Feature Store** to predict which products will be in a user’s next order and to enable a grocery retail store to revolutionize how consumers discover and purchase groceries online."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Prerequisites"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "!pip install faker\n",
    "!pip install python-dateutil"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create sub-directory structure for data files\n",
    "\n",
    "Note: If you re-run this cell, you will get warnings from mkdir command that says 'File exists'; these warnings can be ignored"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# create sub-directories\n",
    "!mkdir ../data/test\n",
    "!mkdir ../data/train\n",
    "!mkdir ../data/transformed\n",
    "!mkdir ../data/partitions\n",
    "!mkdir ../data/validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Imports "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import MinMaxScaler, LabelEncoder\n",
    "from datetime import datetime, timezone, date\n",
    "from faker import Faker\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import hashlib\n",
    "import logging\n",
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set locale and seed for reproducability "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "faker = Faker()\n",
    "faker.seed_locale('en_US', 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "SEED = 123\n",
    "random.seed(SEED)\n",
    "np.random.seed(SEED)\n",
    "faker.seed_instance(SEED)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "logger = logging.getLogger('__name__')\n",
    "logger.setLevel(logging.DEBUG)\n",
    "logger.addHandler(logging.StreamHandler())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "logger.info(f'Using Pandas version: {pd.__version__}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Helper functions  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_timestamp(start, end) -> str:\n",
    "    start = datetime.strptime(start, '%Y-%m-%d %H:%M:%S')\n",
    "    end = datetime.strptime(end, '%Y-%m-%d %H:%M:%S')\n",
    "    timestamp = faker.date_time_between(start_date=start, end_date=end, tzinfo=None).strftime('%Y-%m-%d %H:%M:%S')\n",
    "    return timestamp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_date(start, end) -> str:\n",
    "    start = datetime.strptime(start, '%Y-%m-%d')\n",
    "    end = datetime.strptime(end, '%Y-%m-%d')\n",
    "    date = faker.date_between_dates(date_start=start, date_end=end).strftime('%Y-%m-%d')\n",
    "    return date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_md5_hash(string: str) -> str:\n",
    "    hash_object = hashlib.md5(string.encode())\n",
    "    return hash_object.hexdigest()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_event_timestamp():\n",
    "    # naive datetime representing local time\n",
    "    naive_dt = datetime.now()\n",
    "    # take timezone into account\n",
    "    aware_dt = naive_dt.astimezone()\n",
    "    # time in UTC\n",
    "    utc_dt = aware_dt.astimezone(timezone.utc)\n",
    "    # transform to ISO-8601 format\n",
    "    event_time = utc_dt.isoformat(timespec='milliseconds')\n",
    "    event_time = event_time.replace('+00:00', 'Z')\n",
    "    return event_time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Generate Online Grocery shopping dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate random synthetic Customer profiles "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Customer:\n",
    "    def __init__(self):\n",
    "        self.customer_id = None\n",
    "        self.name = None\n",
    "        self.sex = None\n",
    "        self.state = None\n",
    "        self.age = None\n",
    "        self.is_married = None\n",
    "        self.active_since = None\n",
    "        self.event_time = None\n",
    "        \n",
    "    def as_dict(self):\n",
    "        return {'customer_id': self.customer_id, \n",
    "                'name': self.name,\n",
    "                'sex': self.sex, \n",
    "                'state': self.state, \n",
    "                'age': self.age, \n",
    "                'is_married': self.is_married, \n",
    "                'active_since': self.active_since,\n",
    "                'event_time': self.event_time\n",
    "               }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_customer(i) -> Customer:\n",
    "    customer = Customer()\n",
    "    profile = faker.profile()\n",
    "    customer.customer_id = f'C{i}'\n",
    "    customer.name = profile['name'].lower()\n",
    "    customer.sex = profile['sex']\n",
    "    customer.state = faker.state().lower()\n",
    "    customer.age = random.randint(18, 91)\n",
    "    customer.is_married = faker.boolean()\n",
    "    customer.active_since = generate_timestamp('2016-01-01 00:00:00', '2020-01-01 00:00:01')\n",
    "    customer.event_time = generate_event_timestamp()\n",
    "    return customer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customer = generate_customer(1)\n",
    "customer.__dict__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers = []\n",
    "customer_ids = []\n",
    "n = 10000  # number of synthetic customers to generate\n",
    "for i in range(n):\n",
    "    customer = generate_customer(i+1)\n",
    "    customers.append(customer)\n",
    "    customer_ids.append(customer.customer_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df = pd.DataFrame([customer.as_dict() for customer in customers])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate random synthetic purchase orders "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load list of `products` raw data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_df = pd.read_csv('../data/raw/product_category_mapping.csv')\n",
    "products_df['product_name'] = products_df['product_name'].str.lower()\n",
    "products_df['product_category'] = products_df['product_category'].str.lower()\n",
    "products_df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add event timestamp to the feature records "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "event_timestamps = [generate_event_timestamp() for _ in range(len(products_df))]\n",
    "products_df['event_time'] = event_timestamps\n",
    "products_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "product_ids = products_df['product_id'].tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Generate purchase orders specific to customers "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Order:\n",
    "    def __init__(self):\n",
    "        self.order_id = None\n",
    "        self.customer_id = None\n",
    "        self.product_id = None\n",
    "        self.purchase_amount = None\n",
    "        self.is_reordered = None \n",
    "        self.purchased_on = None\n",
    "        self.event_time = None\n",
    "        \n",
    "    def as_dict(self):\n",
    "        return {'order_id': self.order_id, \n",
    "                'customer_id': self.customer_id, \n",
    "                'product_id': self.product_id,\n",
    "                'purchase_amount': self.purchase_amount,\n",
    "                'is_reordered': self.is_reordered,\n",
    "                'purchased_on': self.purchased_on, \n",
    "                'event_time': self.event_time}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_order(i) -> Order:\n",
    "    order = Order()\n",
    "    order.order_id = f'O{i}'\n",
    "    order.customer_id = random.choice(customer_ids)\n",
    "    order.product_id = random.choice(product_ids)\n",
    "    order.purchase_amount = random.randint(1, 101) + round(random.random(), 2)\n",
    "    order.is_reordered = random.choice([1, 1, 0])  # assume chance of reordering is twice as that of not reordering\n",
    "    order.purchased_on = generate_timestamp('2020-01-01 00:01:01', '2021-06-01 00:00:01')\n",
    "    order.event_time = generate_event_timestamp()\n",
    "    return order"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "order = generate_order(1)\n",
    "order.__dict__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders = []\n",
    "n = 100000  # number of synthetic orders to generate\n",
    "for i in range(n):\n",
    "    order = generate_order(i+1)\n",
    "    orders.append(order)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df = pd.DataFrame([order.as_dict() for order in orders])\n",
    "orders_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Write generated customers, products and orders data to local directory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df.to_csv('../data/raw/customers.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_df.to_csv('../data/raw/products.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df.to_csv('../data/raw/orders.csv', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Transform raw features into Machine Learning ready features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### I) Transform raw `customers` data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df = pd.read_csv('../data/raw/customers.csv')\n",
    "customers_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "label_encoder = LabelEncoder()\n",
    "min_max_scaler = MinMaxScaler()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df.drop('name', axis=1, inplace=True)\n",
    "customers_df.drop('state', axis=1, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bins = [18, 30, 40, 50, 60, 70, 90]\n",
    "labels = ['18-29', '30-39', '40-49', '50-59', '60-69', '70-plus']\n",
    "customers_df['age_range'] = pd.cut(customers_df.age, bins, labels=labels, include_lowest=True)\n",
    "customers_df = pd.concat([customers_df, pd.get_dummies(customers_df['age_range'], prefix='age')], axis=1)\n",
    "customers_df.drop('age', axis=1, inplace=True)\n",
    "customers_df.drop('age_range', axis=1, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df['sex'] = label_encoder.fit_transform(customers_df['sex'])\n",
    "customers_df['is_married'] = label_encoder.fit_transform(customers_df['is_married'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df['active_since'] =  pd.to_datetime(customers_df['active_since'], format='%Y-%m-%d %H:%M:%S')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_delta_in_days(date_time) -> int:\n",
    "    today = date.today()\n",
    "    delta = today - date_time.date()\n",
    "    return delta.days"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df['n_days_active'] = customers_df['active_since'].apply(lambda x: get_delta_in_days(x))\n",
    "customers_df['n_days_active'] = min_max_scaler.fit_transform(customers_df[['n_days_active']])\n",
    "customers_df.drop('active_since', axis=1, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "customers_df.to_csv('../data/transformed/customers.csv', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### II) Transform raw `products` data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_df = pd.read_csv('../data/raw/products.csv')\n",
    "products_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_df.drop('product_name', axis=1, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_df = pd.concat([products_df, pd.get_dummies(products_df['product_category'], prefix='category')], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_df.drop('product_category', axis=1, inplace=True)\n",
    "products_df.columns = products_df.columns.str.replace(' ', '_')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "products_df.to_csv('../data/transformed/products.csv', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### III) Transform raw `orders` data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df = pd.read_csv('../data/raw/orders.csv')\n",
    "orders_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df['purchased_on'] =  pd.to_datetime(orders_df['purchased_on'], format='%Y-%m-%d %H:%M:%S')\n",
    "orders_df['n_days_since_last_purchase'] = orders_df['purchased_on'].apply(lambda x: get_delta_in_days(x))\n",
    "orders_df['n_days_since_last_purchase'] = min_max_scaler.fit_transform(orders_df[['n_days_since_last_purchase']])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df['purchase_amount'] = min_max_scaler.fit_transform(orders_df[['purchase_amount']])\n",
    "orders_df['is_reordered'] = label_encoder.fit_transform(orders_df['is_reordered'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate partitioned orders data by month"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from datetime import datetime\n",
    "from dateutil.relativedelta import relativedelta\n",
    "\n",
    "print(f'Total Orders Count = {orders_df.shape[0]}') \n",
    "partitions_path = '../data/partitions'\n",
    "start_date_str = '2020-01-01 00:00:00'\n",
    "end_date_str = '2021-06-01 00:00:01'\n",
    "date_format = '%Y-%m-%d %H:%M:%S'\n",
    "start_date = datetime.strptime(start_date_str, date_format)\n",
    "print(f'start_date = {start_date}')\n",
    "end_date = datetime.strptime(end_date_str, date_format)\n",
    "print(f'end_date = {end_date}')\n",
    "a_month = relativedelta(months=1)\n",
    "print(f'a_month = {a_month}')\n",
    "current_start_date = start_date\n",
    "print(f'current_start_date = {current_start_date}')\n",
    "current_end_date = start_date + a_month\n",
    "print(f'current_end_date = {current_end_date}')\n",
    "print(f'----')\n",
    "if not os.path.exists(partitions_path):\n",
    "    os.makedirs(partitions_path)\n",
    "\n",
    "while current_end_date <= end_date:\n",
    "    print(f'Dates between {current_start_date} and {current_end_date}')\n",
    "    partitions_df = orders_df[orders_df['purchased_on'].between(current_start_date, current_end_date)].copy()\n",
    "    partitions_df.drop('purchased_on', axis=1, inplace=True)\n",
    "    partition = f'{current_start_date.strftime(\"%Y\")}-{int(current_start_date.strftime(\"%m\"))}'    \n",
    "    current_partitions_path = f'{partitions_path}/{partition}'\n",
    "    print(current_partitions_path)\n",
    "    if not os.path.exists(current_partitions_path):\n",
    "        os.makedirs(current_partitions_path)\n",
    "    print(f'Partitions Orders Count = {partitions_df.shape[0]}')\n",
    "    partitions_df.to_csv(f'{current_partitions_path}/partition.csv', index=False)\n",
    "    partitions_df.iloc[0:0]\n",
    "    current_start_date = current_end_date\n",
    "    current_end_date = current_start_date + a_month\n",
    "    print(f'----')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df.drop('purchased_on', axis=1, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "orders_df.to_csv('../data/transformed/orders.csv', index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   }
  ],
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}