{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Detect Heart Failure from Clinical Record with SageMaker Feature Store\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook runs with Kernel `Python 3 (Data Science)`.\n", "\n", "Note:\n", "\n", "The following policies need to be attached to the execution role that you used to run this notebook:\n", "\n", "* AmazonSageMakerFullAccess\n", "* AmazonSageMakerFeatureStoreAccess\n", "* AmazonS3FullAccess\n", "\n", "Note that the `AmazonS3FullAccess` policy is not attached to your role by default if you choose to `create a new role` when you start your SageMaker Studio instance. If you don't see the required policies above are listed under `Policy name`, you can go to the IAM console, find your role, choose `Attach Policies` under `Permissions`, find the policies you are missing from the list, then choose `Attach policy`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "* [Background](#1)\n", "* [Setup SageMaker Feature Store](#2)\n", "* [Inspect Dataset](#3)\n", "* [Prepare Data for Feature Store](#4)\n", "* [Create Features](#5)\n", "* [Work with FeatureGroup](#10)\n", "* [Build Training Dataset](#6)\n", "* [Train and Deploy the Model](#7)\n", "* [SageMaker Feature Store At Inference](#8)\n", "* [Cleanup Resources](#9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Background\n", "\n", "SageMaker Feature Store is a SageMaker capability that makes it easy for customers to create and manage curated features for machine learning (ML) development. It erves as the single source of truth to store, retrieve, remove, track, share, discover, and control access to features.\n", "SageMaker Feature Store enables data ingestion via a high TPS API and data consumption via the online and offline stores.\n", "\n", "\n", "This notebook provides an example for the APIs provided by SageMaker Feature Store by walking through the process of training a heart failure detection model with clinical records data. The notebook demonstrates how the dataset can be ingested into the Feature Store, queried to create a training dataset, and quickly accessed during inference.\n", "\n", "### Terminology\n", "* `Feature group` – A FeatureGroup is the main Feature Store resource that contains the metadata for all the data stored in Amazon SageMaker Feature Store. A feature group is a logical grouping of features, defined in the feature store, to describe records. A feature group’s definition is composed of a list of feature definitions, a record identifier name, and configurations for its online and offline store. \n", "\n", "* `Feature definition` – A FeatureDefinition consists of a name and one of the following data types: an Integral, String or Fractional. A FeatureGroup contains a list of feature definitions. \n", "\n", "* `Record identifier name` – Each feature group is defined with a record identifier name. The record identifier name must refer to one of the names of a feature defined in the feature group's feature definitions. \n", "\n", "* `Event time` – a point in time when a new event occurs that corresponds to the creation or update of a record in a feature group. All records in the feature group must have a corresponding Eventtime. It can be used to track changes to a record over time. The online store contains the record corresponding to the last Eventtime for a record identifier name, whereas the offline store contains all historic records.\n", "\n", "* `Online Store` – the low latency, high availability cache for a feature group that enables real-time lookup of records. The online store allows quick access to the latest value for a Record via the GetRecord API. A feature group contains an OnlineStoreConfig controlling where the data is stored.\n", "\n", "* `Offline store` – the OfflineStore, stores historical data in your S3 bucket. It is used when low (sub-second) latency reads are not needed. For example, when you want to store and serve features for exploration, model training, and batch inference. A feature group contains an OfflineStoreConfig controlling where the data is stored." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Setup SageMaker Feature Store\n", "Let's start by setting up the SageMaker Python SDK and boto client. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install s3fs\n", "\n", "import boto3\n", "import sagemaker\n", "from sagemaker.session import Session\n", "\n", "\n", "region = boto3.Session().region_name\n", "\n", "boto_session = boto3.Session(region_name=region)\n", "\n", "sagemaker_client = boto_session.client(service_name=\"sagemaker\", region_name=region)\n", "featurestore_runtime = boto_session.client(\n", " service_name=\"sagemaker-featurestore-runtime\", region_name=region\n", ")\n", "\n", "feature_store_session = Session(\n", " boto_session=boto_session,\n", " sagemaker_client=sagemaker_client,\n", " sagemaker_featurestore_runtime_client=featurestore_runtime,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set Up S3 Bucket For The OfflineStore\n", "SageMaker Feature Store writes the data in the `OfflineStore` of a `FeatureGroup` to a S3 bucket owned by you. To be able to write to your S3 bucket, SageMaker Feature Store assumes an IAM role which has access to it. The role is also owned by you. Note that the same bucket can be re-used across FeatureGroups. Data in the bucket is partitioned by FeatureGroup." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# change the bucket name to your desired bucket name\n", "default_s3_bucket_name = feature_store_session.default_bucket()\n", "prefix = \"feature-store-demo\"\n", "\n", "print(default_s3_bucket_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set up IAM Role" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker import get_execution_role\n", "\n", "# You can modify the following to use a role of your choosing. See the documentation for how to create this.\n", "role = get_execution_role()\n", "print(role)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Inspect Dataset\n", "\n", "The [Heart Failure Clinical Dataset](https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records) contains electronic medical records of patients quantify symptoms, body features, and clinical laboratory test values of 299 patients with heart failure in 2015.\n", "\n", "The dataset contains one table with thirteen (13) columns:\n", "\n", "- age: age of the patient (years)\n", "- anaemia: decrease of red blood cells or hemoglobin (boolean)\n", "- high blood pressure: if the patient has hypertension (boolean)\n", "- creatinine phosphokinase (CPK): level of the CPK enzyme in the blood (mcg/L)\n", "- diabetes: if the patient has diabetes (boolean)\n", "- ejection fraction: percentage of blood leaving the heart at each contraction (percentage)\n", "- platelets: platelets in the blood (kiloplatelets/mL)\n", "- sex: woman or man (binary)\n", "- serum creatinine: level of serum creatinine in the blood (mg/dL)\n", "- serum sodium: level of serum sodium in the blood (mEq/L)\n", "- smoking: if the patient smokes or not (boolean)\n", "- time: follow-up period (days). To clarify, the time column here is not the event time column we just mentioned, but is the days between the last time the patient was seen and the time of the follow-up happens to check if the patient has had a heart failure. We will create the event time column later in this demo.\n", "- (target)death event: if the patient deceased during the follow-up period (boolean)\n", "\n", "The objective of the model is to predict patients’ survival from their clinical records data.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import pandas as pd\n", "from IPython.display import display" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# download data from online source\n", "!mkdir data\n", "s3 = boto3.client(\"s3\")\n", "s3.download_file(\n", " f\"sagemaker-example-files-prod-{region}\",\n", " \"datasets/tabular/uci_heart_failure/heart_failure_clinical_records_dataset.csv\",\n", " \"data/clinical_records_dataset.csv\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "s3.upload_file(\n", " \"data/clinical_records_dataset.csv\",\n", " default_s3_bucket_name,\n", " f\"{prefix}/data/clinical_records_dataset.csv\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "clinical_data_file_name = \"clinical_records_dataset.csv\"\n", "clinical_data_path = \"s3://{}/{}/data/{}\".format(\n", " default_s3_bucket_name, prefix, clinical_data_file_name\n", ")\n", "clinical = pd.read_csv(clinical_data_path)\n", "pd.set_option(\"display.max_columns\", 500)\n", "clinical.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "print(\"percentage of the value missing in each column is: \")\n", "clinical.isnull().sum() / len(clinical)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset contains no missing value, and all columns are either numerical or binary, therefore no processing or feature engineering is needed in this case. Depending on your data and use case, you should examine your data and decide if any pre-processing and feature engineering steps are needed before you ingest your data into Feature Store." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Prepare data For Feature Store\n", "In the Amazon SageMaker Feature Store API, a feature is an attribute of a record. You can define a name and type for every feature stored in Feature Store. Name uniquely identifies a feature within a feature group. Type identifies the datatype for the values of the feature. Supported datatypes are: String, Integral and Fractional. \n", "\n", "Take a look at the data types and making sure they are all correct and readable by Feature store. SageMaker Feature Store Python SDK will map the string dtype to String feature type.\n", "\n", "In SageMaker Feature Store, a `record` is a collection of values for features for a single record identifier value. Specific features are flagged with record identifier and event time, and a combination of record identifier name and a timestamp uniquely identify a record within a feature group. we will need to specify a record identifier and an event time in this case, and since the raw data does not contain the two columns, we will need to create them.\n", "\n", "* For record identifier name: a record is a collection of values for features for a single record identifier value. In this case, we will create a unique ID for each patient in the previous step as the record indentifier. Making sure the identifier is the unique identifier for each instance.\n", "* For event time feature name: it refers to a point in time when a new event occurs that corresponds to the creation or update of a record in a feature group. It can be used to track changes to a record over time. For example, in this use case, EventTime can be appended to your data when no timestamp is available. In the following code, you can see how EventTime is appended to the clinical data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create a unique ID for each patient" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "#### Add an id for each patient\n", "clinical.reset_index(inplace=True)\n", "clinical.rename(columns={\"index\": \"patient_id\"}, inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "clinical.dtypes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "#### We want this id to be treated as a sting ID\n", "clinical[\"patient_id\"] = clinical[\"patient_id\"].astype(object)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create a TimeStamp for each Record" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import time\n", "\n", "current_time_sec = int(round(time.time()))\n", "# append EventTime feature\n", "clinical[\"EventTime\"] = pd.Series([current_time_sec] * len(clinical), dtype=\"float64\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Check data types for each column" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def cast_object_to_string(data_frame):\n", " for label in data_frame.columns:\n", " if data_frame.dtypes[label] == \"object\":\n", " data_frame[label] = data_frame[label].astype(\"str\").astype(\"string\")\n", "\n", "\n", "# cast object dtype to string. The SageMaker Feature Store Python SDK will then map the string dtype to String feature type.\n", "cast_object_to_string(clinical)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "clinical.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Create Features\n", "In this step we will create the FeatureGroup representing the patients' clinical records, then ingest the data into the created FeatureGroup.\n", "\n", "#### Assign a feature group name" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from time import gmtime, strftime, sleep\n", "\n", "clinical_feature_group_name = \"clinical-feature-group-\" + strftime(\"%d-%H-%M-%S\", gmtime())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create a FeatureGroup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.feature_store.feature_group import FeatureGroup\n", "\n", "clinical_feature_group = FeatureGroup(\n", " name=clinical_feature_group_name, sagemaker_session=feature_store_session\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define Identifier\n", " In this step, we will specify a record identifier name and an event time feature name. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# record identifier and event time feature names\n", "record_identifier_feature_name = \"patient_id\"\n", "event_time_feature_name = \"EventTime\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Load feature definitions to the feature group\n", "We can now load the feature definitions by passing a data frame containing the feature data. SageMaker Feature Store Python SDK will auto-detect the data schema based on input data. For developers using a schema rather than automatic detection, see the [Export Feature Groups from Data Wrangler example](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-data-export.html#data-wrangler-data-export-feature-store) for code that shows how to load the schema, map it, and add it as a FeatureDefinition that you can use to create the FeatureGroup. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "clinical_feature_group.load_feature_definitions(data_frame=clinical)\n", "# output is suppressed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create FeatureGroup\n", "In this step, we will use the create function to create the feature group. Note that the online store is not created by default, so you must set this as `True` if you want to enable it. The `s3_uri` is the S3 bucket location of your offline store. Check the [documentaion](https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store-create-feature-group.html) for a list of other parameters you can define." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def wait_for_feature_group_creation_complete(feature_group):\n", " status = feature_group.describe().get(\"FeatureGroupStatus\")\n", " while status == \"Creating\":\n", " print(\"Waiting for Feature Group Creation\")\n", " time.sleep(5)\n", " status = feature_group.describe().get(\"FeatureGroupStatus\")\n", " if status != \"Created\":\n", " raise RuntimeError(f\"Failed to create feature group {feature_group.name}\")\n", " print(f\"FeatureGroup {feature_group.name} successfully created.\")\n", "\n", "\n", "clinical_feature_group.create(\n", " s3_uri=f\"s3://{default_s3_bucket_name}/{prefix}\", # offline feature store bucket\n", " record_identifier_name=record_identifier_feature_name,\n", " event_time_feature_name=event_time_feature_name,\n", " role_arn=role,\n", " enable_online_store=True,\n", ")\n", "wait_for_feature_group_creation_complete(feature_group=clinical_feature_group)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Work with your FeatureGroup\n", "#### Check FeatureGroup Info\n", "When you create a feature group, it takes time to load the data, and you need to wait until the feature group is created before you can use it. You can check status using the DescribeFeatureGroup and ListFeatureGroups APIs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "clinical_feature_group.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "sagemaker_client.list_feature_groups() # use boto client to list FeatureGroups" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Put Records into the Feature Store \n", "After the FeatureGroups have been created, we can put data into the FeatureGroups by using the PutRecord API. This API can handle high TPS and is designed to be called by different streams. The data from all of these Put requests is buffered and written to S3 in chunks. The files will be written to the offline store within a few minutes of ingestion. You can use the ingest function to load your feature data. You pass in a data frame of feature data, set the number of workers, and choose to wait for it to return or not. For this example, to accelerate the ingestion process, we are specifying multiple workers to do the job simultaneously. It will take <1min to ingest data to the Clinical FeatureGroup we created." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "clinical_feature_group.ingest(data_frame=clinical, max_workers=3, wait=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Get Records from a Feature Group\n", "We can use the get_record function to retrieve the data for a specific feature by its record identifier from the online store." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "record_identifier_value = str(200)\n", "\n", "featurestore_runtime.get_record(\n", " FeatureGroupName=clinical_feature_group_name,\n", " RecordIdentifierValueAsString=record_identifier_value,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Generate Hive DDL Commands\n", "The SageMaker Python SDK’s Feature Store class also provides the functionality to generate Hive DDL commands. The schema of the table is generated based on the feature definitions. Columns are named after feature name and data-type are inferred based on feature type." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "print(clinical_feature_group.as_hive_ddl())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's wait for the data to appear in our offline store before moving forward to creating a dataset. This will take approximately 5 minutes. SageMaker Feature Store adds metadata for each record that's ingested into the offline store." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "s3_client = boto3.client(\"s3\", region_name=region)\n", "\n", "account_id = boto3.client(\"sts\").get_caller_identity()[\"Account\"]\n", "print(account_id)\n", "\n", "clinical_feature_group_s3_prefix = \"/\".join(\n", " clinical_feature_group.describe()\n", " .get(\"OfflineStoreConfig\")\n", " .get(\"S3StorageConfig\")\n", " .get(\"ResolvedOutputS3Uri\")\n", " .split(\"/\")[3:]\n", ")\n", "\n", "offline_store_contents = None\n", "while offline_store_contents is None:\n", " objects_in_bucket = s3_client.list_objects(\n", " Bucket=default_s3_bucket_name, Prefix=clinical_feature_group_s3_prefix\n", " )\n", " if \"Contents\" in objects_in_bucket and len(objects_in_bucket[\"Contents\"]) >= 1:\n", " offline_store_contents = objects_in_bucket[\"Contents\"]\n", " else:\n", " print(\"Waiting for data in offline store...\\n\")\n", " sleep(60)\n", "\n", "print(\"Data available.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build a Training Dataset\n", " SageMaker Feature Store automatically builds an AWS Glue data catalog when you create feature groups and you can turn this off if you want. In this example, we will create a training dataset with FeatureValues from the clinical FeatureGroup. This is done by utilizing the auto-built Catalog. We run an Athena query that does a simple `select all` in the offline store in S3 from the FeatureGroup.\n", " \n", "For testing purpose, we left out 9 records when creating the training dataset, so that we can use the left-out 9 records as test data for the reference. You can also do a train/test split. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "clinical_query = clinical_feature_group.athena_query()\n", "clinical_table = clinical_query.table_name" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Athena query\n", "query_string = 'SELECT * FROM \"' + clinical_table + '\" LIMIT 290'\n", "\n", "# run Athena query. The output is loaded to a Pandas dataframe.\n", "dataset = pd.DataFrame()\n", "clinical_query.run(\n", " query_string=query_string, output_location=\"s3://\" + default_s3_bucket_name + \"/query_results/\"\n", ")\n", "clinical_query.wait()\n", "dataset = clinical_query.as_dataframe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "id_for_test = []\n", "for i in range(299):\n", " if i not in dataset[\"patient_id\"].unique():\n", " id_for_test.append(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Prepare dataset for training" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Prepare query results for training.\n", "query_execution = clinical_query.get_query_execution()\n", "query_result = (\n", " \"s3://\"\n", " + default_s3_bucket_name\n", " + \"/\"\n", " + prefix\n", " + \"/query_results/\"\n", " + query_execution[\"QueryExecution\"][\"QueryExecutionId\"]\n", " + \".csv\"\n", ")\n", "print(query_result)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Select useful columns for training with target column as the first.\n", "dataset = dataset[\n", " [\n", " \"death_event\",\n", " \"age\",\n", " \"anaemia\",\n", " \"creatinine_phosphokinase\",\n", " \"diabetes\",\n", " \"ejection_fraction\",\n", " \"high_blood_pressure\",\n", " \"platelets\",\n", " \"serum_creatinine\",\n", " \"serum_sodium\",\n", " \"sex\",\n", " \"smoking\",\n", " \"time\",\n", " ]\n", "]\n", "# Write to csv in S3 without headers and index column.\n", "dataset.to_csv(\"dataset.csv\", header=False, index=False)\n", "s3_client.upload_file(\"dataset.csv\", default_s3_bucket_name, prefix + \"/training_input/dataset.csv\")\n", "dataset_uri_prefix = \"s3://\" + default_s3_bucket_name + \"/\" + prefix + \"/training_input/\";" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "dataset.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train and Deploy the Model\n", "For model training, we will use a SageMaker built-in algorithm called XGBoost to predict if a patient is likely to have a heart failure. SageMaker built-in algorithms provide highly optimized implementation of popular machine learning algorithms, simplifying the machine learning development and accelerating training and deployment. We will call the SageMaker XGBoost container and construct a generic SageMaker estimator." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "training_image = sagemaker.image_uris.retrieve(\"xgboost\", region, \"1.0-1\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "training_output_path = \"s3://\" + default_s3_bucket_name + \"/\" + prefix + \"/training_output\"\n", "\n", "from sagemaker.estimator import Estimator\n", "\n", "training_model = Estimator(\n", " training_image,\n", " role,\n", " instance_count=1,\n", " instance_type=\"ml.m5.2xlarge\",\n", " volume_size=5,\n", " max_run=3600,\n", " input_mode=\"File\",\n", " output_path=training_output_path,\n", " sagemaker_session=feature_store_session,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Due to cost consideration, the goal of this example is to showcase Feature Store capabilities, not necessarily to achieve the best result. In this example, we will skip hyperparamter tuning and go with default hyperparameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "training_model.set_hyperparameters(objective=\"binary:logistic\", num_round=50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Specify training dataset to the dataset we just created" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "train_data = sagemaker.inputs.TrainingInput(\n", " dataset_uri_prefix,\n", " distribution=\"FullyReplicated\",\n", " content_type=\"text/csv\",\n", " s3_data_type=\"S3Prefix\",\n", ")\n", "data_channels = {\"train\": train_data}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "training_model.fit(inputs=data_channels, logs=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set up Hosting for the Model\n", "Once the training is done, we can deploy the trained model as an Amazon SageMaker real-time hosted endpoint. This will allow us to make predictions (or inference) from the model. The endpoint deployment can be accomplished as follows. This takes 8-10 minutes to complete." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "predictor = training_model.deploy(initial_instance_count=1, instance_type=\"ml.m5.xlarge\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## SageMaker Feature Store During Inference\n", "SageMaker Feature Store can be useful in supplementing data for inference requests because of the low-latency GetRecord functionality. For this demo, we will be given a patientID and query our online FeatureGroup to build our inference request.\n", "\n", "From the patient ID we left out in training set, we can choose one patient ID to test the real-time reference. In this example we choose patient `194`, but you can choose either one from the left out id list for testing." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "id_for_test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To retrieve the data for a specific feature by its record identifier (patient ID we just randomly chose) from the online store, we can use the get_record function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "patient_id = str(194)\n", "\n", "\n", "# Helper to parse the feature value from the record.\n", "def get_feature_value(record, feature_name):\n", " return str(list(filter(lambda r: r[\"FeatureName\"] == feature_name, record))[0][\"ValueAsString\"])\n", "\n", "\n", "clinical_response = featurestore_runtime.get_record(\n", " FeatureGroupName=clinical_feature_group_name, RecordIdentifierValueAsString=patient_id\n", ")\n", "clinical_record = clinical_response[\"Record\"]\n", "clinical_record" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we choose the feature value from the retrieved feature list, exclude the record identifier id, the event time, and the target variable, and build a list of values as the input to the predictor. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "inference_request = [\n", " get_feature_value(clinical_record, \"age\"),\n", " get_feature_value(clinical_record, \"anaemia\"),\n", " get_feature_value(clinical_record, \"creatinine_phosphokinase\"),\n", " get_feature_value(clinical_record, \"diabetes\"),\n", " get_feature_value(clinical_record, \"ejection_fraction\"),\n", " get_feature_value(clinical_record, \"high_blood_pressure\"),\n", " get_feature_value(clinical_record, \"platelets\"),\n", " get_feature_value(clinical_record, \"serum_creatinine\"),\n", " get_feature_value(clinical_record, \"serum_sodium\"),\n", " get_feature_value(clinical_record, \"sex\"),\n", " get_feature_value(clinical_record, \"smoking\"),\n", " get_feature_value(clinical_record, \"time\"),\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The predictor will call our hosted model and give a prediction result. The model correctly predict the patient `194` is very likely (78% chance) to have a heart failure." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import json\n", "\n", "results = predictor.predict(\",\".join(inference_request), initial_args={\"ContentType\": \"text/csv\"})\n", "prediction = json.loads(results)\n", "print(prediction)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean Up Resources\n", "You can delete the model endpoint and the FeatureGroup after you are done with this demo due to cost considerations." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "predictor.delete_endpoint()\n", "clinical_feature_group.delete()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Further Read\n", "* [SageMaker Feature Store Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/feature-store.html)\n", "* [Store, Discover, and Share Machine Learning Features with Amazon SageMaker Feature Store](https://aws.amazon.com/blogs/aws/new-store-discover-and-share-machine-learning-features-with-amazon-sagemaker-feature-store/?sc_icampaign=launch_sagemaker-feature-store_reinvent20&sc_ichannel=ha&sc_icontent=awssm-6216&sc_iplace=ribbon&trk=ha_awssm-6216) \n", "* [Using streaming ingestion with Amazon SageMaker Feature Store to make ML-backed decisions in near-real time](https://aws.amazon.com/blogs/machine-learning/using-streaming-ingestion-with-amazon-sagemaker-feature-store-to-make-ml-backed-decisions-in-near-real-time/)\n", "* [Fraud Detection using SageMaker Feature Store](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-featurestore/sagemaker_featurestore_fraud_detection_python_sdk.ipynb)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/ml-lifecycle|feature_store|FS_demo.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/ml-lifecycle|feature_store|FS_demo.ipynb)\n" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }