{ "cells": [ { "cell_type": "markdown", "id": "878d65c2", "metadata": { "tags": [] }, "source": [ "# MLOps workshop with Amazon SageMaker" ] }, { "cell_type": "markdown", "id": "c5074715-59a1-4b2c-a67d-c01217b31106", "metadata": {}, "source": [ "## Module 01: Transform the data and train a model inside a Jupyter notebook." ] }, { "cell_type": "markdown", "id": "e4a3707f", "metadata": {}, "source": [ "In this workshop we will demonstrate a journey to cloud-native machine learning starting from a more traditional approach to model development and training directly in Jupyter notebooks to remote managed data transformations and training with Amazon SageMaker to fully automated pipelines with SageMaker Pipelines.\n", "\n", "In this first notebook we will predict house prices based on the well-known California housing dataset with a simple regression model in Tensorflow 2.\n", "\n", "To begin, we'll import some necessary packages and set up directories for training and test data. In this notebook, the only usage of SageMaker is to manage the compute of the notebook. There is no usage of SageMaker APIs." ] }, { "cell_type": "code", "execution_count": null, "id": "afc7b904-e7ae-4967-a83a-add06e751033", "metadata": { "tags": [] }, "outputs": [], "source": [ "%matplotlib inline\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "import glob\n", "import os\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import pandas as pd\n", "import sklearn.model_selection\n", "from sklearn.preprocessing import StandardScaler" ] }, { "cell_type": "code", "execution_count": null, "id": "aeb5bab8-6986-4df2-a353-c09a817af340", "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "\n", "data_dir = os.path.join(os.getcwd(), 'data')\n", "os.makedirs(data_dir, exist_ok=True)\n", "\n", "train_dir = os.path.join(os.getcwd(), 'data/train')\n", "os.makedirs(train_dir, exist_ok=True)\n", "\n", "test_dir = os.path.join(os.getcwd(), 'data/test')\n", "os.makedirs(test_dir, exist_ok=True)\n", "\n", "raw_dir = os.path.join(os.getcwd(), 'data/raw')\n", "os.makedirs(raw_dir, exist_ok=True)\n", "\n", "batch_dir = os.path.join(os.getcwd(), 'data/batch')\n", "os.makedirs(batch_dir, exist_ok=True)" ] }, { "cell_type": "markdown", "id": "c239d81a", "metadata": {}, "source": [ "## Exploratory Data Analysis (EDA)" ] }, { "cell_type": "markdown", "id": "945d65e5", "metadata": {}, "source": [ "According to The State of Data Science 2020 survey, data management, exploratory data analysis (EDA), feature selection, and feature engineering accounts for more than 66% of a data scientist’s time.\n", "\n", "Exploratory Data Analysis is an approach in analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. EDA assists Data science professionals in various ways:-\n", "\n", "Getting a better understanding of data.\n", "Identifying various data patterns.\n", "Getting a better understanding of the problem statement.\n", "Numerical EDA gives you some very important information, such as the names and data types of the columns, and the dimensions of the DataFrame. Visual EDA on the other hand will give you insight into features and target relationship and distribution.\n", "\n", "First we'll load the California Housing dataset and explore the data." ] }, { "cell_type": "markdown", "id": "2a372962", "metadata": {}, "source": [ "## Download California Housing dataset" ] }, { "cell_type": "markdown", "id": "a1f3cab1", "metadata": {}, "source": [ "We use the California housing dataset.\n", "\n", "More info on the dataset:\n", "\n", "This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/\n", "\n", "The target variable is the median house value for California districts.\n", "\n", "This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).\n", "\n", "We will use AWS cli to download the dataset from S3. You don't need to specify AWS credentials. They are assumed from the notebook IAM role. If you get an error in this step, check that the notebook was created with a proper IAM role." ] }, { "cell_type": "code", "execution_count": null, "id": "b508e3f5-3058-4bf0-935b-01a2fc0b8199", "metadata": { "tags": [] }, "outputs": [], "source": [ "!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/california_housing/cal_housing.tgz ." ] }, { "cell_type": "code", "execution_count": null, "id": "7c50f97a-f5f0-4279-91a6-c7c7ecce7b6b", "metadata": { "tags": [] }, "outputs": [], "source": [ "!tar -zxf cal_housing.tgz 2>/dev/null" ] }, { "cell_type": "code", "execution_count": null, "id": "e8918a04-dc8a-479f-91b0-e0293c9eb0f1", "metadata": { "tags": [] }, "outputs": [], "source": [ "columns = [\n", " \"longitude\",\n", " \"latitude\",\n", " \"housingMedianAge\",\n", " \"totalRooms\",\n", " \"totalBedrooms\",\n", " \"population\",\n", " \"households\",\n", " \"medianIncome\",\n", " \"medianHouseValue\",\n", "]\n", "df = pd.read_csv(\"CaliforniaHousing/cal_housing.data\", names=columns, header=None)" ] }, { "cell_type": "code", "execution_count": null, "id": "064da7f5-898f-471b-acf3-f70e00dd2d46", "metadata": { "tags": [] }, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "markdown", "id": "1243dd72", "metadata": {}, "source": [ "## Numerical EDA\n", "Check how big is dataset, how many and of what type features it has, and what is target." ] }, { "cell_type": "code", "execution_count": null, "id": "00771831-24bc-41ff-b4d9-756f82f0e15b", "metadata": { "tags": [] }, "outputs": [], "source": [ "df.info()" ] }, { "cell_type": "markdown", "id": "79c0c790", "metadata": {}, "source": [ "There are 9 attributes in each case of the dataset. They are:\n", "\n", "longitude - block group longitude\n", "latitude - block group latitude\n", "housingMedianAge - median house age in block group\n", "totalRooms - average number of rooms per household\n", "totalBedrooms - average number of bedrooms per household\n", "population - block group population\n", "households - average number of household members\n", "medianIncome - median income in block group\n", "medianHouseValue - median value of owner-occupied homes.\n", "It is important to notice that all data is numeric and there is no NULL values.\n", "Now, let's summarize the data to see the distribution of data" ] }, { "cell_type": "code", "execution_count": null, "id": "e765b5d9-9030-4793-bc97-acf47825b665", "metadata": { "tags": [] }, "outputs": [], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": null, "id": "68493aa3-a6f3-401c-ba8a-c591388fc3dd", "metadata": { "tags": [] }, "outputs": [], "source": [ "df.value_counts(\"housingMedianAge\", sort=True)" ] }, { "cell_type": "markdown", "id": "379d11f4", "metadata": {}, "source": [ "We can see that houses are rather old, around 28 years, looking at the mean." ] }, { "cell_type": "markdown", "id": "313eac34", "metadata": {}, "source": [ "## Visual EDA" ] }, { "cell_type": "markdown", "id": "80a11179", "metadata": {}, "source": [ "Let's begin exploring the data by using visualization.We will plot the histogram of each feature." ] }, { "cell_type": "code", "execution_count": null, "id": "36d13215-066c-4861-b7dd-6b4f64e1e1e4", "metadata": { "tags": [] }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "df.hist(bins=50, figsize=(20, 15))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "ff86e145", "metadata": {}, "source": [ "We see that the data is skewed and not normalized for most of the columns. We will not touch the latitude and longitude for now. Let's apply the logarithmic function to the rest of the columns and check the result." ] }, { "cell_type": "code", "execution_count": null, "id": "bd77357f-86a7-407b-a112-3d0004ed8db0", "metadata": { "tags": [] }, "outputs": [], "source": [ "columns_to_normalize = [\n", " 'medianIncome', 'housingMedianAge', 'totalRooms', \n", " 'totalBedrooms', 'population', 'households', 'medianHouseValue'\n", "]\n", "\n", "for column in columns_to_normalize:\n", " df[column] = np.log1p(df[column])" ] }, { "cell_type": "code", "execution_count": null, "id": "bbdc33e6", "metadata": {}, "outputs": [], "source": [ "df.hist(figsize=(12, 10), bins=50, edgecolor=\"black\", grid=False)\n", "plt.subplots_adjust(hspace=0.7, wspace=0.4)" ] }, { "cell_type": "markdown", "id": "d53d29a1", "metadata": {}, "source": [ "The data looks much better. Now we will check the coordinates. First of all, we will plot the coordinates and use the \"medianHouseValue\" column for coloring." ] }, { "cell_type": "code", "execution_count": null, "id": "11185d52-6233-4964-b1fd-3c7a010f69da", "metadata": { "tags": [] }, "outputs": [], "source": [ "from matplotlib.colors import LinearSegmentedColormap\n", "\n", "plt.figure(figsize=(10,10))\n", "\n", "cmap = LinearSegmentedColormap.from_list(name='Pacific Ocean shore', colors=['green','yellow','red'])\n", "\n", "f, ax = plt.subplots()\n", "points = ax.scatter(df['longitude'], df['latitude'], c=df['medianHouseValue'], s=10, cmap=cmap)\n", "f.colorbar(points)" ] }, { "cell_type": "markdown", "id": "769a6a10", "metadata": {}, "source": [ "Our dataset is about California. What we see in the plot is the Pacific Ocean shore. From the diagram (using the color indicator), it is clear that houses located near the ocean are more expensive. Using the human knowledge domain, we also notice that the most expensive houses are located near San Francisco (37.7749° N, 122.4194° W) and Los Angeles (34.0522° N, 118.2437°). Another observation is the relationship between house prices and the distance to those locations. We will engineer the data to produce linear dependencies between the house price and the location, which is a good fit for linear regression problems.We remove the \"longitude\" and the \"latitude\" columns and replace them with Euclidian distances to San Francisco and Los Angeles." ] }, { "cell_type": "code", "execution_count": null, "id": "f800380d-dc7f-49c5-be7e-f23b96e22f09", "metadata": { "tags": [] }, "outputs": [], "source": [ "sf_coord=[-122.4194, 37.7749]\n", "la_coord=[-118.2437, 34.0522]\n", "\n", "df['DistanceToSF']=np.sqrt((df['longitude']-sf_coord[0])**2+(df['latitude']-sf_coord[1])**2)\n", "df['DistanceToLA']=np.sqrt((df['longitude']-la_coord[0])**2+(df['latitude']-la_coord[1])**2)\n", "df.drop(columns=['longitude', 'latitude'],inplace=True)" ] }, { "cell_type": "markdown", "id": "99440d8c", "metadata": {}, "source": [ "Split the data to create training and validation datasets" ] }, { "cell_type": "code", "execution_count": null, "id": "082e4d42-38a0-47c7-b523-807651c8b97d", "metadata": { "tags": [] }, "outputs": [], "source": [ "X = df.drop(\"medianHouseValue\", axis=1)\n", "Y = df[\"medianHouseValue\"].copy()" ] }, { "cell_type": "code", "execution_count": null, "id": "87bfe087-edda-4a03-a703-f65d2cfdd77a", "metadata": { "tags": [] }, "outputs": [], "source": [ "print(\"Features:\", list(X.columns))\n", "print(\"Dataset shape:\", X.shape)\n", "print(\"Dataset Type:\", type(X))\n", "print(\"Label set shape:\", Y.shape)\n", "print(\"Label set Type:\", type(X))\n", "\n", "# We partition the dataset into 2/3 training and 1/3 test set.\n", "x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, Y, test_size=0.33)\n", "\n", "np.save(os.path.join(raw_dir, 'x_train.npy'), x_train)\n", "np.save(os.path.join(raw_dir, 'x_test.npy'), x_test)\n", "np.save(os.path.join(raw_dir, 'y_train.npy'), y_train)\n", "np.save(os.path.join(raw_dir, 'y_test.npy'), y_test)" ] }, { "cell_type": "code", "execution_count": null, "id": "f4ffa299-7c11-4c96-b799-2674e1ce737b", "metadata": { "tags": [] }, "outputs": [], "source": [ "scaler = StandardScaler()\n", "x_train = np.load(os.path.join(raw_dir, 'x_train.npy'))\n", "scaler.fit(x_train)" ] }, { "cell_type": "markdown", "id": "2cbee8aa", "metadata": {}, "source": [ "We save the training and test data on the file system." ] }, { "cell_type": "code", "execution_count": null, "id": "9ce5b639-65a2-414d-adf2-0c4b138c0d3b", "metadata": { "tags": [] }, "outputs": [], "source": [ "input_files = glob.glob('{}/raw/*.npy'.format(data_dir))\n", "print('\\nINPUT FILE LIST: \\n{}\\n'.format(input_files))\n", "for file in input_files:\n", " raw = np.load(file)\n", " # only transform feature columns\n", " if 'y_' not in file:\n", " transformed = scaler.transform(raw)\n", " if 'train' in file:\n", " if 'y_' in file:\n", " output_path = os.path.join(train_dir, 'y_train.npy')\n", " np.save(output_path, raw)\n", " print('SAVED LABEL TRAINING DATA FILE\\n')\n", " else:\n", " output_path = os.path.join(train_dir, 'x_train.npy')\n", " np.save(output_path, transformed)\n", " print('SAVED TRANSFORMED TRAINING DATA FILE\\n')\n", " else:\n", " if 'y_' in file:\n", " output_path = os.path.join(test_dir, 'y_test.npy')\n", " np.save(output_path, raw)\n", " print('SAVED LABEL TEST DATA FILE\\n')\n", " else:\n", " output_path = os.path.join(test_dir, 'x_test.npy')\n", " np.save(output_path, transformed)\n", " print('SAVED TRANSFORMED TEST DATA FILE\\n')" ] }, { "cell_type": "code", "execution_count": null, "id": "787e17b4-ab37-40a2-8338-64fe9ee1f08d", "metadata": { "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "import os\n", "import tensorflow as tf\n", "tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)\n", "\n", "def get_train_data(train_dir):\n", " x_train = np.load(os.path.join(train_dir, 'x_train.npy'))\n", " y_train = np.load(os.path.join(train_dir, 'y_train.npy'))\n", " print('x train', x_train.shape,'y train', y_train.shape)\n", "\n", " return x_train, y_train\n", "\n", "\n", "def get_test_data(test_dir):\n", " x_test = np.load(os.path.join(test_dir, 'x_test.npy'))\n", " y_test = np.load(os.path.join(test_dir, 'y_test.npy'))\n", " print('x test', x_test.shape,'y test', y_test.shape)\n", "\n", " return x_test, y_test\n", "\n", "def get_model():\n", " inputs = tf.keras.Input(shape=(8,))\n", " hidden_1 = tf.keras.layers.Dense(8, activation='tanh')(inputs)\n", " hidden_2 = tf.keras.layers.Dense(4, activation='sigmoid')(hidden_1)\n", " outputs = tf.keras.layers.Dense(1)(hidden_2)\n", " return tf.keras.Model(inputs=inputs, outputs=outputs)" ] }, { "cell_type": "markdown", "id": "0df79a33", "metadata": {}, "source": [ "Now we will do the actual training. Feel free to change the hyperparameter values (epochs,batch_size, etc.) to see how they affect the training metric." ] }, { "cell_type": "code", "execution_count": null, "id": "a3fcb1a1-fc61-43b3-b183-9858c08f1098", "metadata": { "tags": [] }, "outputs": [], "source": [ "x_train, y_train = get_train_data(train_dir)\n", "x_test, y_test = get_test_data(test_dir)\n", "\n", "device = '/cpu:0'\n", "print(device)\n", "batch_size = 128\n", "epochs = 25\n", "learning_rate = 0.01\n", "print('batch_size = {}, epochs = {}, learning rate = {}'.format(batch_size, epochs, learning_rate))\n", "\n", "with tf.device(device):\n", " model = get_model()\n", " optimizer = tf.keras.optimizers.SGD(learning_rate)\n", " model.compile(optimizer=optimizer, loss='mse')\n", " model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,\n", " validation_data=(x_test, y_test))\n", "\n", " # evaluate on test set\n", " scores = model.evaluate(x_test, y_test, batch_size, verbose=2)\n", " print(\"\\nTest MSE :\", scores)" ] }, { "cell_type": "markdown", "id": "e32b16c1", "metadata": {}, "source": [ "Mean Squared Error (MSE) is a commonly used metric in machine learning for evaluating the performance of regression models. It measures the average squared difference between the predicted and actual values. MSE penalizes larger errors more heavily due to the squaring operation. By calculating the mean of these squared differences, MSE provides a single numerical value to assess the model's accuracy. A lower MSE indicates better model performance, with zero being the ideal value." ] }, { "cell_type": "code", "execution_count": null, "id": "40315459-3d06-470d-bfef-e9731153bf4e", "metadata": { "tags": [] }, "outputs": [], "source": [ "model.save('model' + '/1')" ] }, { "cell_type": "code", "execution_count": null, "id": "17bcb84c-31b0-4cf0-bea4-51b941b63dd8", "metadata": { "tags": [] }, "outputs": [], "source": [ "!ls -R model" ] }, { "cell_type": "markdown", "id": "0b1cdc9a", "metadata": {}, "source": [ "Our model is trained now, and the metric is good. We will check the \"test\" dataset to see how close our prediction is to actual values" ] }, { "cell_type": "code", "execution_count": null, "id": "4552dd33-f100-4c5d-9c34-62065b9ae342", "metadata": { "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "\n", "model = tf.keras.models.load_model('model/1')\n", "\n", "x_test = np.load(os.path.join(test_dir, 'x_test.npy'))\n", "y_test = np.load(os.path.join(test_dir, 'y_test.npy'))\n", "scores = model.evaluate(x_test, y_test, verbose=2)\n", "print(\"\\nTest MSE :\", scores)" ] }, { "cell_type": "code", "execution_count": null, "id": "6ade50bb-b423-4c21-aef7-ef7ab0cf49e8", "metadata": { "tags": [] }, "outputs": [], "source": [ "y_pred = model.predict(x_test)\n", "flat_list_pred = [float('%.1f'%(item)) for sublist in y_pred for item in sublist]\n", "flat_list_test = [float('%.1f'%(item)) for item in y_test]\n", "test_result = pd.DataFrame({'Predicted':flat_list_pred,'Actual':flat_list_test})\n", "test_result\n", "fig= plt.figure(figsize=(16,8))\n", "test_result = test_result.reset_index()\n", "test_result = test_result.drop(['index'],axis=1)\n", "plt.plot(test_result[:50])\n", "plt.legend(['Actual','Predicted'])" ] }, { "cell_type": "markdown", "id": "fcca7cc6", "metadata": {}, "source": [ "The MSE metric suggested that our model would perform well, and indeed, we see in the visualization above a good correlation between actual and predicted values." ] }, { "cell_type": "code", "execution_count": null, "id": "1a264e6c", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (TensorFlow 2.6 Python 3.8 CPU Optimized)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/tensorflow-2.6-cpu-py38-ubuntu20.04-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 5 }