{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Module 1: Prepare datasets \n", "**This notebook generates and transforms datasets into a ML ready state to be ingested into SageMaker Feature Store.**\n", "\n", "**Note:** Please set kernel to `Python 3 (Data Science)` and select instance to `ml.t3.medium`\n", "\n", "\n", "---\n", "\n", "## Contents\n", "\n", "1. [Background](#Background)\n", "1. [Setup](#Setup)\n", "1. [Generate Online Grocery shopping dataset](#Generate-Online-Grocery-shopping-dataset)\n", "1. [Transform raw features into Machine Learning ready features](#Transform-raw-features-into-Machine-Learning-ready-features)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Background\n", "\n", "This notebook generates a relational set of files describing customers’ orders over time.\n", "\n", "We have preset the notebook to generate a sample of 100,000 synthetic grocery orders from a total of 10,000 synthetically generated customers list. For each customer, the notebook generates between 1 to 10 of their orders, with products purchased in each order. The notebook also generates a timestamp on which the order was placed. The goal of the generated dataset and the example notebooks contained in this repository is to illustrate how to use **SageMaker Feature Store** to predict which products will be in a user’s next order and to enable a grocery retail store to revolutionize how consumers discover and purchase groceries online." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Prerequisites" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%capture\n", "!pip install faker\n", "!pip install python-dateutil" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create sub-directory structure for data files\n", "\n", "Note: If you re-run this cell, you will get warnings from mkdir command that says 'File exists'; these warnings can be ignored" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# create sub-directories\n", "!mkdir ../data/test\n", "!mkdir ../data/train\n", "!mkdir ../data/transformed\n", "!mkdir ../data/partitions\n", "!mkdir ../data/validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Imports " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import MinMaxScaler, LabelEncoder\n", "from datetime import datetime, timezone, date\n", "from faker import Faker\n", "import pandas as pd\n", "import numpy as np\n", "import hashlib\n", "import logging\n", "import random" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set locale and seed for reproducability " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "faker = Faker()\n", "faker.seed_locale('en_US', 0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "SEED = 123\n", "random.seed(SEED)\n", "np.random.seed(SEED)\n", "faker.seed_instance(SEED)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "logger = logging.getLogger('__name__')\n", "logger.setLevel(logging.DEBUG)\n", "logger.addHandler(logging.StreamHandler())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "logger.info(f'Using Pandas version: {pd.__version__}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Helper functions " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def generate_timestamp(start, end) -> str:\n", " start = datetime.strptime(start, '%Y-%m-%d %H:%M:%S')\n", " end = datetime.strptime(end, '%Y-%m-%d %H:%M:%S')\n", " timestamp = faker.date_time_between(start_date=start, end_date=end, tzinfo=None).strftime('%Y-%m-%d %H:%M:%S')\n", " return timestamp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def generate_date(start, end) -> str:\n", " start = datetime.strptime(start, '%Y-%m-%d')\n", " end = datetime.strptime(end, '%Y-%m-%d')\n", " date = faker.date_between_dates(date_start=start, date_end=end).strftime('%Y-%m-%d')\n", " return date" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_md5_hash(string: str) -> str:\n", " hash_object = hashlib.md5(string.encode())\n", " return hash_object.hexdigest()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def generate_event_timestamp():\n", " # naive datetime representing local time\n", " naive_dt = datetime.now()\n", " # take timezone into account\n", " aware_dt = naive_dt.astimezone()\n", " # time in UTC\n", " utc_dt = aware_dt.astimezone(timezone.utc)\n", " # transform to ISO-8601 format\n", " event_time = utc_dt.isoformat(timespec='milliseconds')\n", " event_time = event_time.replace('+00:00', 'Z')\n", " return event_time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generate Online Grocery shopping dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate random synthetic Customer profiles " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Customer:\n", " def __init__(self):\n", " self.customer_id = None\n", " self.name = None\n", " self.sex = None\n", " self.state = None\n", " self.age = None\n", " self.is_married = None\n", " self.active_since = None\n", " self.event_time = None\n", " \n", " def as_dict(self):\n", " return {'customer_id': self.customer_id, \n", " 'name': self.name,\n", " 'sex': self.sex, \n", " 'state': self.state, \n", " 'age': self.age, \n", " 'is_married': self.is_married, \n", " 'active_since': self.active_since,\n", " 'event_time': self.event_time\n", " }" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def generate_customer(i) -> Customer:\n", " customer = Customer()\n", " profile = faker.profile()\n", " customer.customer_id = f'C{i}'\n", " customer.name = profile['name'].lower()\n", " customer.sex = profile['sex']\n", " customer.state = faker.state().lower()\n", " customer.age = random.randint(18, 91)\n", " customer.is_married = faker.boolean()\n", " customer.active_since = generate_timestamp('2016-01-01 00:00:00', '2020-01-01 00:00:01')\n", " customer.event_time = generate_event_timestamp()\n", " return customer" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customer = generate_customer(1)\n", "customer.__dict__" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers = []\n", "customer_ids = []\n", "n = 10000 # number of synthetic customers to generate\n", "for i in range(n):\n", " customer = generate_customer(i+1)\n", " customers.append(customer)\n", " customer_ids.append(customer.customer_id)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df = pd.DataFrame([customer.as_dict() for customer in customers])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate random synthetic purchase orders " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load list of `products` raw data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "products_df = pd.read_csv('../data/raw/product_category_mapping.csv')\n", "products_df['product_name'] = products_df['product_name'].str.lower()\n", "products_df['product_category'] = products_df['product_category'].str.lower()\n", "products_df.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add event timestamp to the feature records " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "event_timestamps = [generate_event_timestamp() for _ in range(len(products_df))]\n", "products_df['event_time'] = event_timestamps\n", "products_df.head(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "product_ids = products_df['product_id'].tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Generate purchase orders specific to customers " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Order:\n", " def __init__(self):\n", " self.order_id = None\n", " self.customer_id = None\n", " self.product_id = None\n", " self.purchase_amount = None\n", " self.is_reordered = None \n", " self.purchased_on = None\n", " self.event_time = None\n", " \n", " def as_dict(self):\n", " return {'order_id': self.order_id, \n", " 'customer_id': self.customer_id, \n", " 'product_id': self.product_id,\n", " 'purchase_amount': self.purchase_amount,\n", " 'is_reordered': self.is_reordered,\n", " 'purchased_on': self.purchased_on, \n", " 'event_time': self.event_time}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def generate_order(i) -> Order:\n", " order = Order()\n", " order.order_id = f'O{i}'\n", " order.customer_id = random.choice(customer_ids)\n", " order.product_id = random.choice(product_ids)\n", " order.purchase_amount = random.randint(1, 101) + round(random.random(), 2)\n", " order.is_reordered = random.choice([1, 1, 0]) # assume chance of reordering is twice as that of not reordering\n", " order.purchased_on = generate_timestamp('2020-01-01 00:01:01', '2021-06-01 00:00:01')\n", " order.event_time = generate_event_timestamp()\n", " return order" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "order = generate_order(1)\n", "order.__dict__" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders = []\n", "n = 100000 # number of synthetic orders to generate\n", "for i in range(n):\n", " order = generate_order(i+1)\n", " orders.append(order)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df = pd.DataFrame([order.as_dict() for order in orders])\n", "orders_df.head(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Write generated customers, products and orders data to local directory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df.to_csv('../data/raw/customers.csv', index=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "products_df.to_csv('../data/raw/products.csv', index=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df.to_csv('../data/raw/orders.csv', index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Transform raw features into Machine Learning ready features" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### I) Transform raw `customers` data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df = pd.read_csv('../data/raw/customers.csv')\n", "customers_df.head(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "label_encoder = LabelEncoder()\n", "min_max_scaler = MinMaxScaler()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df.drop('name', axis=1, inplace=True)\n", "customers_df.drop('state', axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bins = [18, 30, 40, 50, 60, 70, 90]\n", "labels = ['18-29', '30-39', '40-49', '50-59', '60-69', '70-plus']\n", "customers_df['age_range'] = pd.cut(customers_df.age, bins, labels=labels, include_lowest=True)\n", "customers_df = pd.concat([customers_df, pd.get_dummies(customers_df['age_range'], prefix='age')], axis=1)\n", "customers_df.drop('age', axis=1, inplace=True)\n", "customers_df.drop('age_range', axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df['sex'] = label_encoder.fit_transform(customers_df['sex'])\n", "customers_df['is_married'] = label_encoder.fit_transform(customers_df['is_married'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df['active_since'] = pd.to_datetime(customers_df['active_since'], format='%Y-%m-%d %H:%M:%S')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_delta_in_days(date_time) -> int:\n", " today = date.today()\n", " delta = today - date_time.date()\n", " return delta.days" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df['n_days_active'] = customers_df['active_since'].apply(lambda x: get_delta_in_days(x))\n", "customers_df['n_days_active'] = min_max_scaler.fit_transform(customers_df[['n_days_active']])\n", "customers_df.drop('active_since', axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "customers_df.to_csv('../data/transformed/customers.csv', index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### II) Transform raw `products` data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "products_df = pd.read_csv('../data/raw/products.csv')\n", "products_df.head(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "products_df.drop('product_name', axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "products_df = pd.concat([products_df, pd.get_dummies(products_df['product_category'], prefix='category')], axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "products_df.drop('product_category', axis=1, inplace=True)\n", "products_df.columns = products_df.columns.str.replace(' ', '_')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "products_df.head(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "products_df.to_csv('../data/transformed/products.csv', index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### III) Transform raw `orders` data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df = pd.read_csv('../data/raw/orders.csv')\n", "orders_df.head(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df['purchased_on'] = pd.to_datetime(orders_df['purchased_on'], format='%Y-%m-%d %H:%M:%S')\n", "orders_df['n_days_since_last_purchase'] = orders_df['purchased_on'].apply(lambda x: get_delta_in_days(x))\n", "orders_df['n_days_since_last_purchase'] = min_max_scaler.fit_transform(orders_df[['n_days_since_last_purchase']])\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df['purchase_amount'] = min_max_scaler.fit_transform(orders_df[['purchase_amount']])\n", "orders_df['is_reordered'] = label_encoder.fit_transform(orders_df['is_reordered'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate partitioned orders data by month" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from datetime import datetime\n", "from dateutil.relativedelta import relativedelta\n", "\n", "print(f'Total Orders Count = {orders_df.shape[0]}') \n", "partitions_path = '../data/partitions'\n", "start_date_str = '2020-01-01 00:00:00'\n", "end_date_str = '2021-06-01 00:00:01'\n", "date_format = '%Y-%m-%d %H:%M:%S'\n", "start_date = datetime.strptime(start_date_str, date_format)\n", "print(f'start_date = {start_date}')\n", "end_date = datetime.strptime(end_date_str, date_format)\n", "print(f'end_date = {end_date}')\n", "a_month = relativedelta(months=1)\n", "print(f'a_month = {a_month}')\n", "current_start_date = start_date\n", "print(f'current_start_date = {current_start_date}')\n", "current_end_date = start_date + a_month\n", "print(f'current_end_date = {current_end_date}')\n", "print(f'----')\n", "if not os.path.exists(partitions_path):\n", " os.makedirs(partitions_path)\n", "\n", "while current_end_date <= end_date:\n", " print(f'Dates between {current_start_date} and {current_end_date}')\n", " partitions_df = orders_df[orders_df['purchased_on'].between(current_start_date, current_end_date)].copy()\n", " partitions_df.drop('purchased_on', axis=1, inplace=True)\n", " partition = f'{current_start_date.strftime(\"%Y\")}-{int(current_start_date.strftime(\"%m\"))}' \n", " current_partitions_path = f'{partitions_path}/{partition}'\n", " print(current_partitions_path)\n", " if not os.path.exists(current_partitions_path):\n", " os.makedirs(current_partitions_path)\n", " print(f'Partitions Orders Count = {partitions_df.shape[0]}')\n", " partitions_df.to_csv(f'{current_partitions_path}/partition.csv', index=False)\n", " partitions_df.iloc[0:0]\n", " current_start_date = current_end_date\n", " current_end_date = current_start_date + a_month\n", " print(f'----')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df.drop('purchased_on', axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df.head(5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orders_df.to_csv('../data/transformed/orders.csv', index=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }