{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SageMaker distributed data parallel and SageMaker Debugger example for Tensorflow\n", "\n", "1. [Introduction](#Introduction)\n", "2. [Prerequisites](#Prerequisites)\n", "3. [Setup](#Setup)\n", "4. [Dataset](#Dataset)\n", " 1. [Process the Data (Optional)](#Process-the-Data-(Optional))\n", " 2. [Preview The Data](#Preview-The-Data)\n", "5. [Build A Distributed Training Job](#Build-A-Distributed-Training-Job)\n", " 1. [Preview the Training script](#Preview-the-Training-script)\n", " 2. [Configure SageMaker Debugger rules](#Configure-SageMaker-Debugger-rules)\n", " 3. [Configure & Run Training Estimator](#Configure-&-Run-Training-Estimator)\n", "6. [Analyze Debugger Output](#Analyze-Debugger-in-SageMaker-Studio)\n", "7. [Next steps & Clean Up](#Next-steps-&-Clean-Up)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "This lab cover advance training topics like debugging and distributed training. At the end of this lab, you will get exposure to SageMaker services like SageMaker debugger and Amazon SageMaker's distributed library. [SageMaker debugger](https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html) allows you to attach a debug process to your training job. This helps you monitor your training at a much granualar time interval and automatically profiling the instance to help you identify performance bottlenecks.\n", "\n", "While [Amazon SageMaker's distributed library](https://docs.aws.amazon.com/sagemaker/latest/dg/distributed-training.html) helps you train deep learning models faster and cheaper. The [data parallel](https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html) feature in this library is a distributed data parallel training framework for PyTorch, TensorFlow, and MXNet. This is a Tensorflow example using the [Caltech Birds (CUB 200 2011)](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset.\n", "\n", "**NOTE:** This example requires SageMaker Python SDK v2.X. and using Data Science Kernel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "To run this notebook, you can simply execute each cell in order. To understand what's happening, you'll need:\n", "\n", "- Access to the SageMaker default S3 bucket.\n", "- Solid understanding of AWS Sagemaker and how to training deep learning models.\n", "- Familiarity with Python and numpy\n", "- Familiarity with Tensorflow deep learning framework and distributed training concepts\n", "- Basic familiarity with AWS S3.\n", "- Basic familiarity with AWS Command Line Interface (CLI) -- ideally, you should have it set up with credentials to access the AWS account you're running this notebook from.\n", "- SageMaker Studio is preferred for the full UI integration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "Setting up the environment, load the libraries, and define the parameter for the entire notebook.\n", "\n", "Run the cell below if you are missing smdebug and tensorflow module for your kernel" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install smdebug\n", "!pip install tensorflow\n", "!pip install Jinja2==3.0 --force-reinstall" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import modules and initialize parameters for this notebook" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker import get_execution_role\n", "import time\n", "import logging\n", "import boto3\n", "\n", "role = get_execution_role()\n", "sess = sagemaker.Session()\n", "\n", "default_bucket = sess.default_bucket() # or use your own custom bucket name\n", "base_job_prefix = 'bird-tf-sdp-debugger'\n", "region_name = sagemaker.Session().boto_region_name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker.tensorflow import TensorFlow\n", "\n", "from sagemaker.debugger import (ProfilerConfig,\n", " FrameworkProfile,\n", " CollectionConfig,\n", " DebuggerHookConfig,\n", " DetailedProfilingConfig, \n", " DataloaderProfilingConfig, \n", " PythonProfilingConfig,\n", " Rule,\n", " PythonProfiler,\n", " cProfileTimer,\n", " ProfilerRule,\n", " rule_configs)\n", "\n", "from sagemaker.inputs import TrainingInput" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset\n", "The dataset we are using is from [Caltech Birds (CUB 200 2011)](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset contains 11,788 images across 200 bird species (the original technical report can be found here). Each species comes with around 60 images, with a typical size of about 350 pixels by 500 pixels. Bounding boxes are provided, as are annotations of bird parts. A recommended train/test split is given, but image size data is not." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Process the Data (Optional)\n", "If you kept the data from module 2, you can skip to [next section](#Preview-Input-Data). Otherwise run the code block below to process the data again. This is a repeat of module 2.\n", "\n", "Run the cell below to download the full dataset or download manually [here](https://course.fast.ai/datasets). Note that the file size is around 1.2 GB, and can take a while to download. If you plan to complete the entire workshop, please keep the file to avoid re-download and re-process the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "!wget 'https://s3.amazonaws.com/fast-ai-imageclas/CUB_200_2011.tgz'\n", "!tar xopf CUB_200_2011.tgz\n", "!rm CUB_200_2011.tgz\n", "\n", "s3_raw_data = f's3://{default_bucket}/{base_job_prefix}/full/data'\n", "!aws s3 cp --recursive ./CUB_200_2011 $s3_raw_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.sklearn.processing import SKLearnProcessor\n", "\n", "from sagemaker.processing import (\n", " ProcessingInput,\n", " ProcessingOutput,\n", ")\n", "import time \n", "\n", "timpstamp = str(time.time()).split('.')[0]\n", "# SKlearnProcessor for preprocessing\n", "output_prefix = f'{base_job_prefix}/outputs/{timpstamp}'\n", "output_s3_uri = f's3://{default_bucket}/{output_prefix}'\n", "\n", "class_selection = '13, 17, 35, 36, 47, 68, 73, 87'\n", "input_annotation = 'classes.txt'\n", "processing_instance_type = \"ml.m5.xlarge\"\n", "processing_instance_count = 1\n", "\n", "sklearn_processor = SKLearnProcessor(base_job_name = f\"{base_job_prefix}-preprocess\", # choose any name\n", " framework_version='0.20.0',\n", " role=role,\n", " instance_type=processing_instance_type,\n", " instance_count=processing_instance_count)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sklearn_processor.run(\n", " code='../../02_preprocessing/preprocessing.py',\n", " arguments=[\"--classes\", class_selection, \n", " \"--input-data\", input_annotation],\n", " inputs=[ProcessingInput(source=s3_raw_data, \n", " destination=\"/opt/ml/processing/input\")],\n", " outputs=[\n", " ProcessingOutput(source=\"/opt/ml/processing/output/train\", destination = output_s3_uri +'/train'),\n", " ProcessingOutput(source=\"/opt/ml/processing/output/valid\", destination = output_s3_uri +'/valid'),\n", " ProcessingOutput(source=\"/opt/ml/processing/output/test\", destination = output_s3_uri +'/test'),\n", " ProcessingOutput(source=\"/opt/ml/processing/output/manifest\", destination = output_s3_uri +'/manifest'),\n", " ],\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preview The Data\n", "\n", "If you use the data generated from previous module, please update the s3 location of the data in the cell below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3_train = output_s3_uri +'/train'\n", "s3_valid = output_s3_uri +'/valid'\n", "print(s3_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Copy the data locally" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!aws s3 cp $s3_valid ./data/valid --recursive" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n", "from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "import IPython.display as display \n", "\n", "from io import StringIO\n", "\n", "val_datagen = ImageDataGenerator(rescale=1./255)\n", "\n", "val_gen = val_datagen.flow_from_directory('./data/valid',\n", " target_size=(224, 224), \n", " batch_size=16)\n", "\n", "\n", "for _ in range(5):\n", " img, label = val_gen.next()\n", " print(img.shape) # (1,256,256,3)\n", " print(label[0])\n", " plt.imshow(img[0])\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build A Distributed Training Job" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preview the Training script\n", "\n", "The training script provides the code you need for distributed data parallel training using SageMaker's distributed data parallel library (`smdistributed.dataparallel`). The training script is very similar to a Tensorflow training script you might run outside of SageMaker using Horovod, but modified to run with the `smdistributed.dataparallel` library. This library's Tensorflow client provides an alternative to Horovod and is optimized for AWS infrastructure. \n", "\n", "For details about how to use `smdistributed.dataparallel`'s DDP in your native Tensorflow script, see the [Modify a Tensorflow Training Script Using SMD Data Parallel](https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-modify-sdp.html#data-parallel-modify-sdp-pt)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize code/train_param_server_debugger2.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure SageMaker Debugger rules\n", "We specify the following rules:\n", "* loss_not_decreasing: checks if loss is decreasing and triggers if the loss has not decreased by a certain persentage in the last few iterations\n", "* LowGPUUtilization: checks if GPU is under-utilizated\n", "* ProfilerReport: runs the entire set of performance rules and create a final output report with further insights and recommendations.\n", "\n", "To learn more about [SageMaker Debugger Built-in rules](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-built-in-rules.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Set the profile config for both system and framework metrics\n", "profiler_config = ProfilerConfig(\n", " system_monitor_interval_millis = 500,\n", " framework_profile_params = FrameworkProfile(\n", " detailed_profiling_config = DetailedProfilingConfig(\n", " start_step = 5, \n", " num_steps = 10\n", " ),\n", " dataloader_profiling_config = DataloaderProfilingConfig(\n", " start_step = 7, \n", " num_steps = 10\n", " ),\n", " python_profiling_config = PythonProfilingConfig(\n", " start_step = 9, \n", " num_steps = 10,\n", " python_profiler = PythonProfiler.CPROFILE, \n", " cprofile_timer = cProfileTimer.TOTAL_TIME\n", " )\n", " )\n", ")\n", "\n", "\n", "# Set the debugger hook config to save tensors\n", "debugger_hook_config = DebuggerHookConfig(\n", " collection_configs = [\n", " CollectionConfig(name = 'weights'),\n", " CollectionConfig(name = 'gradients')\n", " ]\n", ")\n", "\n", "# Set the rules to analyze tensors emitted during training\n", "# These specific set of rules will inspect the overall training performance and progress of the model\n", "rules=[\n", " ProfilerRule.sagemaker(rule_configs.ProfilerReport()),\n", " ProfilerRule.sagemaker(rule_configs.LowGPUUtilization()),\n", " Rule.sagemaker(rule_configs.loss_not_decreasing())\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure & Run Training Estimator\n", "\n", "### Estimator function options\n", "\n", "In the following code block, you can update the estimator function to use a different instance type, instance count, and distrubtion strategy. You're also passing in the training script you reviewed in the previous cell to this estimator.\n", "\n", "By adjusting the **instance count** and **smdistributed** parameters, you can toggle between single instance training w/ SageMaker debugger, multi-instance parameter server distributed training, and SageMaker distributed data parralel training.\n", "\n", "**Note** `smdistributed.dataparallel` supports model training on SageMaker with the following instance types only. For best performance, it is recommended you use an instance type that supports Amazon Elastic Fabric Adapter (ml.p3dn.24xlarge and ml.p4d.24xlarge).\n", "\n", "1. ml.p3.16xlarge\n", "1. ml.p3dn.24xlarge [Recommended]\n", "1. ml.p4d.24xlarge [Recommended]\n", "\n", "To get the best performance and the most out of `smdistributed.dataparallel`, you should use at least 2 instances, but you can also use 1 for testing this example.\n", "\n", "Alos, You may find yourself running into an error like this:\n", "\n", "```\n", "ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: The account-level service limit 'ml.p3.8xlarge for training job usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit.\n", "```\n", "To avoid customers getting unexpected bills for more powerful and more expensive instance usage, accounts are established by default with limited access to certain instance types. These are soft limits that can be raised by contacting AWS support. This lab defaults to a powerful GPU instance type, but you can run it on a lower-powered instance type. In such a case, you will pay less, but your training jobs will take longer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Training image parameters\n", "framework_version = '2.4.1'\n", "py_version = 'py37'\n", "cuda_version = 'cu110'\n", "image_os_version = 'ubuntu18.04'\n", "image_uri = '763104351884.dkr.ecr.{}.amazonaws.com/tensorflow-training:{}-gpu-{}-{}-{}'.format(region_name,\n", " framework_version,\n", " py_version,\n", " cuda_version,\n", " image_os_version)\n", "print(image_uri)\n", "\n", "# If running SageMaker Distributed Training Library\n", "smdistributed = False\n", "\n", "timestamp = str(time.time()).split('.')[0]\n", "\n", "# Location where the model checkpoints will be stored locally in the container before being uploaded to S3\n", "## Note: It is recommended that you use the default location of /opt/ml/checkpoints/ for saving/loading checkpoints.\n", "model_checkpoint_local_dir = '/opt/ml/checkpoints/'\n", "\n", "checkpoint_s3_uri = f's3://{default_bucket}/{base_job_prefix}/checkpoints/'\n", "hook_s3_uri = f's3://{default_bucket}/{base_job_prefix}/hooks/'\n", "\n", "\n", "instance_count = 2\n", "hyperparameters = {'lr': 0.0001,\n", " 'batch_size': 16,\n", " 'epochs': 30, \n", " 'dropout': 0.5,\n", " 'checkpoint_path': model_checkpoint_local_dir\n", " }\n", " \n", "metric_definitions = [{'Name': 'loss', 'Regex': 'loss: ([0-9\\\\.]+)'},\n", " {'Name': 'acc', 'Regex': 'accuracy: ([0-9\\\\.]+)'},\n", " {'Name': 'val_loss', 'Regex': 'val_loss: ([0-9\\\\.]+)'},\n", " {'Name': 'val_acc', 'Regex': 'val_accuracy: ([0-9\\\\.]+)'}]\n", "\n", "if instance_count > 1:\n", " if smdistributed:\n", " distributions = {\n", " \"smdistributed\":{\n", " \"dataparallel\":\n", " {\n", " \"enabled\": True\n", " }\n", " }\n", " }\n", " train_script = 'train_sdp_debugger_3.py'\n", " instance_type = 'ml.p3.16xlarge'\n", " else:\n", " distributions = {'parameter_server': {'enabled': True}}\n", " train_script = 'train_param_server_debugger2.py'\n", " instance_type = 'ml.p3.2xlarge'\n", " debugger_hook_config = None\n", " \n", " DISTRIBUTION_MODE = 'ShardedByS3Key'\n", "else:\n", " distributions = {'parameter_server': {'enabled': False}}\n", " DISTRIBUTION_MODE = 'FullyReplicated'\n", " train_script = 'train_param_server_debugger.py'\n", " instance_type = 'ml.p3.2xlarge'\n", " \n", "\n", "\n", "# Set the training script related parameters\n", "train_script_dir = 'code'\n", "container_log_level = logging.INFO\n", "\n", "# Location where the trained model will be stored locally in the container before being uploaded to S3\n", "model_local_dir = '/opt/ml/model'\n", "\n", "# output_s3_uri is the output from previous process.\n", "\n", "train_in = TrainingInput(s3_data=s3_train, distribution=DISTRIBUTION_MODE)\n", "val_in = TrainingInput(s3_data=s3_valid, distribution=DISTRIBUTION_MODE)\n", "\n", "inputs = {'train':train_in, 'valid': val_in}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Create the estimator\n", "estimator = TensorFlow(\n", " entry_point=train_script,\n", " source_dir=train_script_dir,\n", " checkpoint_s3_uri=checkpoint_s3_uri,\n", " distribution=distributions,\n", " instance_type=instance_type,\n", " instance_count=instance_count,\n", " hyperparameters=hyperparameters,\n", " metric_definitions=metric_definitions,\n", " role=role,\n", " base_job_name=base_job_prefix,\n", " image_uri=image_uri,\n", " container_log_level=container_log_level,\n", " profiler_config=profiler_config,\n", " debugger_hook_config=debugger_hook_config,\n", " rules=rules,\n", " script_mode=True)\n", "\n", "estimator.fit(inputs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyze Debugger in SageMaker Studio\n", "\n", "SageMaker debugger output is automatically integrated with SageMaker Studio with analysis and visuallizations. To access the outputs, go to \"Experiment and Trials\", right-click on the training job to \"Open Debugger for Insights\"\n", "\n", "![Debugger in Studio](static/debugger_output.png)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyze Debugger Programatically\n", "You can also access SageMaker dubugger output programatically using API. You will need the training job name and the region, and the code below shows you how to check debugger job status as well as visuallizing the profiler results in code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from smdebug.profiler.analysis.notebook_utils.training_job import TrainingJob\n", "\n", "training_job_name = estimator.latest_training_job.name\n", "print(f\"Training jobname: {training_job_name}\")\n", "print(f\"Region: {region_name}\")\n", "\n", "tj = TrainingJob(training_job_name, region_name)\n", "tj.wait_for_sys_profiling_data_to_be_available()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from smdebug.profiler.analysis.notebook_utils.timeline_charts import TimelineCharts\n", "\n", "system_metrics_reader = tj.get_systems_metrics_reader()\n", "system_metrics_reader.refresh_event_file_list()\n", "\n", "view_timeline_charts = TimelineCharts(\n", " system_metrics_reader,\n", " framework_metrics_reader=None,\n", " select_dimensions=[\"CPU\", \"GPU\"],\n", " select_events=[\"total\"],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next steps & Clean Up\n", "\n", "Now that you have a trained model, you can deploy an endpoint to host the model. After you deploy the endpoint, you can then test it with inference requests. DON'T forget to clean up the resources after you are done with your testing." ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }