{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Fraud Detection with Amazon SageMaker FeatureStore\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "\n", "\n", "---" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Kernel `Python 3 (Data Science)` works well with this notebook.\n", "\n", "The following policies need to be attached to the execution role:\n", "- AmazonSageMakerFullAccess\n", "- AmazonS3FullAccess\n", "\n", "## Contents\n", "1. [Background](#Background)\n", "1. [Setup SageMaker FeatureStore](#Setup-SageMaker-FeatureStore)\n", "1. [Inspect Dataset](#Inspect-Dataset)\n", "1. [Ingest Data into FeatureStore](#Ingest-Data-into-FeatureStore)\n", "1. [Build Training Dataset](#Build-Training-Dataset)\n", "1. [Train and Deploy the Model](#Train-and-Deploy-the-Model)\n", "1. [SageMaker FeatureStore At Inference](#SageMaker-FeatureStore-During-Inference)\n", "1. [Cleanup Resources](#Cleanup-Resources)\n", "\n", "## Background\n", "\n", "Amazon SageMaker FeatureStore is a new SageMaker capability that makes it easy for customers to create and manage curated data for machine learning (ML) development. SageMaker FeatureStore enables data ingestion via a high TPS API and data consumption via the online and offline stores. \n", "\n", "This notebook provides an example for the APIs provided by SageMaker FeatureStore by walking through the process of training a fraud detection model. The notebook demonstrates how the dataset's tables can be ingested into the FeatureStore, queried to create a training dataset, and quickly accessed during inference. \n", "\n", "\n", "### Terminology\n", "\n", "A **FeatureGroup** is the main resource that contains the metadata for all the data stored in SageMaker FeatureStore. A FeatureGroup contains a list of FeatureDefinitions. A **FeatureDefinition** consists of a name and one of the following data types: a integral, string or decimal. The FeatureGroup also contains an **OnlineStoreConfig** and an **OfflineStoreConfig** controlling where the data is stored. Enabling the online store allows quick access to the latest value for a Record via the GetRecord API. The offline store, a required configuration, allows storage of historical data in your S3 bucket. \n", "\n", "Once a FeatureGroup is created, data can be added as Records. **Records** can be thought of as a row in a table. Each record will have a unique **RecordIdentifier** along with values for all other FeatureDefinitions in the FeatureGroup. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Setup SageMaker FeatureStore\n", "\n", "Let's start by setting up the SageMaker Python SDK and boto client. Note that this notebook requires a `boto3` version above `1.17.21`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "\n", "original_boto3_version = boto3.__version__\n", "%pip install 'boto3>1.17.21'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.session import Session\n", "\n", "region = boto3.Session().region_name\n", "\n", "boto_session = boto3.Session(region_name=region)\n", "\n", "sagemaker_client = boto_session.client(service_name=\"sagemaker\", region_name=region)\n", "featurestore_runtime = boto_session.client(\n", " service_name=\"sagemaker-featurestore-runtime\", region_name=region\n", ")\n", "\n", "feature_store_session = Session(\n", " boto_session=boto_session,\n", " sagemaker_client=sagemaker_client,\n", " sagemaker_featurestore_runtime_client=featurestore_runtime,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### S3 Bucket Setup For The OfflineStore\n", "\n", "SageMaker FeatureStore writes the data in the OfflineStore of a FeatureGroup to a S3 bucket owned by you. To be able to write to your S3 bucket, SageMaker FeatureStore assumes an IAM role which has access to it. The role is also owned by you.\n", "Note that the same bucket can be re-used across FeatureGroups. Data in the bucket is partitioned by FeatureGroup." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Set the default s3 bucket name and it will be referenced throughout the notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You can modify the following to use a bucket of your choosing\n", "default_s3_bucket_name = feature_store_session.default_bucket()\n", "prefix = \"sagemaker-featurestore-demo\"\n", "\n", "print(default_s3_bucket_name)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Set up the IAM role. This role gives SageMaker FeatureStore access to your S3 bucket. \n", "\n", "