{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# End-to-End NLP: News Headline Classifier (Local Version)\n", "\n", "_**Train a PyTorch-based model to classify news headlines between four domains**_\n", "\n", "This notebook works well with the `Python 3 (PyTorch 1.13 Python 3.9 CPU Optimized)` kernel on SageMaker Studio, or `conda_pytorch_p38` on classic SageMaker Notebook Instances.\n", "\n", "---\n", "\n", "In this version, the model is trained and evaluated here on the notebook instance itself. We'll show in the follow-on notebook how to take advantage of Amazon SageMaker to separate these infrastructure needs.\n", "\n", "Note that you can safely ignore the WARNING about the pip version.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "# First install some libraries which might not be available across all kernels (e.g. in Studio):\n", "!pip install \"ipywidgets<8\" torchtext==0.6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download News Aggregator Dataset\n", "\n", "We will download **FastAi AG News** dataset from the [Registry of Open Data on AWS](https://registry.opendata.aws/fast-ai-nlp/) public repository. This dataset contains a table of news headlines and their corresponding classes.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "local_dir = \"data\"\n", "# Download the AG News data from the Registry of Open Data on AWS.\n", "!mkdir -p {local_dir}\n", "!aws s3 cp s3://fast-ai-nlp/ag_news_csv.tgz {local_dir} --no-sign-request\n", "\n", "# Un-tar the AG News data.\n", "!tar zxf {local_dir}/ag_news_csv.tgz -C {local_dir}/ --strip-components=1 --no-same-owner\n", "print(\"Done!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's visualize the dataset\n", "\n", "We will load the ag_news_csv/train.csv file to a Pandas dataframe for our data processing work." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import os\n", "import re\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import util.preprocessing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "column_names = [\"CATEGORY\", \"TITLE\", \"CONTENT\"]\n", "# we use the train.csv only\n", "df = pd.read_csv(f\"{local_dir}/train.csv\", names=column_names, header=None, delimiter=\",\")\n", "# shuffle the DataFrame rows\n", "df = df.sample(frac=1, random_state=1337)\n", "# make the category classes more readable\n", "mapping = {1: \"World\", 2: \"Sports\", 3: \"Business\", 4: \"Sci/Tech\"}\n", "df = df.replace({\"CATEGORY\": mapping})\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this exercise we'll **only use**:\n", "\n", "- The **title** (Headline) of the news story, as our input\n", "- The **category**, as our target variable\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "df[\"CATEGORY\"].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset has **four article categories** with equal weighting:\n", "\n", "- Business\n", "- Sci/Tech\n", "- Sports\n", "- World\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Natural Language Pre-Processing\n", "\n", "We'll do some basic processing of the text data to convert it into numerical form that the algorithm will be able to consume to create a model.\n", "\n", "We will do typical pre processing for NLP workloads such as: dummy encoding the labels, tokenizing the documents and set fixed sequence lengths for input feature dimension, padding documents to have fixed length input vectors.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dummy Encode the Labels\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "encoded_y, labels = util.preprocessing.dummy_encode_labels(df, \"CATEGORY\")\n", "print(labels)\n", "print(encoded_y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, looking at the first record in our (shuffled) dataframe:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "df[\"CATEGORY\"].iloc[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "encoded_y[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tokenize and Set Fixed Sequence Lengths\n", "\n", "We want to describe our inputs at the more meaningful word level (rather than individual characters), and ensure a fixed length of the input feature dimension.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "_cell_guid": "7bcf422f-0e75-4d49-b3b1-12553fcaf4ff", "_uuid": "46b7fc9aef5a519f96a295e980ba15deee781e97", "tags": [] }, "outputs": [], "source": [ "processed_docs, tokenizer = util.preprocessing.tokenize_and_pad_docs(df, \"TITLE\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "df[\"TITLE\"].iloc[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "processed_docs[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import Word Embeddings\n", "\n", "To represent our words in numeric form, we'll use pre-trained vector representations for each word in the vocabulary: In this case we'll be using [pre-trained word embeddings from FastText](https://fasttext.cc/docs/en/crawl-vectors.html), which are also available for a broad range of languages other than English.\n", "\n", "You could also explore training custom, domain-specific word embeddings using SageMaker's built-in [BlazingText algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html). See the official [blazingtext_word2vec_text8 sample](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_amazon_algorithms/blazingtext_word2vec_text8) for an example notebook showing how.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "embedding_matrix = util.preprocessing.get_word_embeddings(tokenizer, f\"{local_dir}/embeddings\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "np.save(\n", " file=f\"{local_dir}/embeddings/docs-embedding-matrix\",\n", " arr=embedding_matrix,\n", " allow_pickle=False,\n", ")\n", "vocab_size = embedding_matrix.shape[0]\n", "print(embedding_matrix.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split Train and Test Sets\n", "\n", "Finally we need to divide our data into model training and evaluation sets:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "x_train, x_test, y_train, y_test = train_test_split(\n", " processed_docs, encoded_y, test_size=0.2, random_state=42\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Do you always remember to save your datasets for traceability when experimenting locally? ;-)\n", "os.makedirs(f\"{local_dir}/train\", exist_ok=True)\n", "np.save(f\"{local_dir}/train/train_X.npy\", x_train)\n", "np.save(f\"{local_dir}/train/train_Y.npy\", y_train)\n", "os.makedirs(f\"{local_dir}/test\", exist_ok=True)\n", "np.save(f\"{local_dir}/test/test_X.npy\", x_test)\n", "np.save(f\"{local_dir}/test/test_Y.npy\", y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define the Model\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "from torch.utils.data import DataLoader\n", "\n", "seed = 42\n", "np.random.seed(seed)\n", "num_classes = len(labels)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "class Net(nn.Module):\n", " def __init__(self, vocab_size=400000, emb_dim=300, num_classes=4):\n", " super(Net, self).__init__()\n", " self.embedding = nn.Embedding(vocab_size, emb_dim)\n", " self.conv1 = nn.Conv1d(emb_dim, 128, kernel_size=3)\n", " self.max_pool1d = nn.MaxPool1d(5)\n", " self.flatten1 = nn.Flatten()\n", " self.dropout1 = nn.Dropout(p=0.3)\n", " self.fc1 = nn.Linear(896, 128)\n", " self.fc2 = nn.Linear(128, num_classes)\n", "\n", " def forward(self, x):\n", " x = self.embedding(x)\n", " x = torch.transpose(x, 1, 2)\n", " x = self.flatten1(self.max_pool1d(self.conv1(x)))\n", " x = self.dropout1(x)\n", " x = F.relu(self.fc1(x))\n", " x = self.fc2(x)\n", " return F.softmax(x, dim=-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define Train and Helper Functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def test(model, test_loader, device):\n", " model.eval()\n", " test_loss = 0.0\n", " correct = 0\n", " with torch.no_grad():\n", " for data, target in test_loader:\n", " data, target = data.to(device), target.to(device)\n", " output = model(data)\n", " test_loss += F.binary_cross_entropy(output, target, reduction=\"sum\").item()\n", " pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability\n", " target_index = target.max(1, keepdim=True)[1]\n", " correct += pred.eq(target_index).sum().item()\n", "\n", " test_loss /= len(test_loader.dataset) # Average loss over dataset samples\n", " print(f\"val_loss: {test_loss:.4f}, val_acc: {correct/len(test_loader.dataset):.4f}\")\n", "\n", "\n", "def train(\n", " train_loader, test_loader, embedding_matrix, num_classes=4, epochs=12, learning_rate=0.001\n", "):\n", " ###### Setup model architecture ############\n", " model = Net(\n", " vocab_size=embedding_matrix.shape[0],\n", " emb_dim=embedding_matrix.shape[1],\n", " num_classes=num_classes,\n", " )\n", " model.embedding.weight = torch.nn.parameter.Parameter(\n", " torch.FloatTensor(embedding_matrix), False\n", " )\n", " device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", " model.to(device)\n", " optimizer = optim.RMSprop(model.parameters(), lr=learning_rate)\n", "\n", " for epoch in range(1, epochs + 1):\n", " model.train()\n", " running_loss = 0.0\n", " n_batches = 0\n", " for batch_idx, (X_train, y_train) in enumerate(train_loader, 1):\n", " data, target = X_train.to(device), y_train.to(device)\n", " optimizer.zero_grad()\n", " output = model(data)\n", " loss = F.binary_cross_entropy(output, target)\n", " loss.backward()\n", " optimizer.step()\n", " running_loss += loss.item()\n", " n_batches += 1\n", " print(f\"epoch: {epoch}, train_loss: {running_loss / n_batches:.6f}\") # (Avg over batches)\n", " print(\"Evaluating model\")\n", " test(model, test_loader, device)\n", " return model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "class Dataset(torch.utils.data.Dataset):\n", " def __init__(self, data, labels):\n", " \"\"\"Initialization\"\"\"\n", " self.labels = labels\n", " self.data = data\n", "\n", " def __len__(self):\n", " \"\"\"Denotes the total number of samples\"\"\"\n", " return len(self.data)\n", "\n", " def __getitem__(self, index):\n", " # Load data and get label\n", " X = torch.as_tensor(self.data[index]).long()\n", " y = torch.as_tensor(self.labels[index])\n", " return X, y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fit (Train) and Evaluate the Model\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "# fit the model here in the notebook:\n", "epochs = 5\n", "learning_rate = 0.001\n", "model_dir = \"model/\"\n", "trainloader = torch.utils.data.DataLoader(Dataset(x_train, y_train), batch_size=16, shuffle=True)\n", "testloader = torch.utils.data.DataLoader(Dataset(x_test, y_test), batch_size=32, shuffle=True)\n", "\n", "print(\"Training model\")\n", "model = train(\n", " trainloader,\n", " testloader,\n", " embedding_matrix,\n", " num_classes=num_classes,\n", " epochs=epochs,\n", " learning_rate=learning_rate,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use the Model (Locally)\n", "\n", "Let's evaluate our model with some example headlines...\n", "\n", "If you struggle with the widget, you can always simply call the `classify()` function from Python. You can be creative with your headlines!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import ipywidgets as widgets\n", "from IPython import display\n", "\n", "\n", "def classify(text):\n", " \"\"\"Classify a headline and print the results\"\"\"\n", " processed = tokenizer.preprocess(text)\n", " padded = tokenizer.pad([processed])\n", " final_text = []\n", " for w in padded[0]:\n", " final_text.append(tokenizer.vocab.stoi[w])\n", " final_text = torch.tensor([final_text])\n", " model.cpu()\n", " model.eval()\n", " with torch.no_grad():\n", " result = model(final_text)\n", " print(result)\n", " ix = np.argmax(result.detach())\n", " print(f\"Predicted class: '{labels[ix]}' with confidence {result[0][ix]:.2%}\")\n", "\n", "\n", "# Either try out the interactive widget:\n", "interaction = widgets.interact_manual(\n", " classify,\n", " text=widgets.Text(\n", " value=\"The markets were bullish after news of the merger\",\n", " placeholder=\"Type a news headline...\",\n", " description=\"Headline:\",\n", " layout=widgets.Layout(width=\"99%\"),\n", " ),\n", ")\n", "interaction.widget.children[1].description = \"Classify!\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# Or just use the function to classify your own headline:\n", "classify(\"Retailers are expanding after the recent economic growth\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Review\n", "\n", "In this notebook we pre-processed publicly downloadable data and trained a neural news headline classifier model: As a data scientist might normally do when working on a local machine.\n", "\n", "...But can we use the cloud more effectively to allocate high-performance resources; and easily deploy our trained models for use by other applications?\n", "\n", "Head on over to the next notebook, [Headline Classifier SageMaker.ipynb](Headline%20Classifier%20SageMaker.ipynb), where we'll show how the same model can be trained and then deployed on specific target infrastructure with Amazon SageMaker.\n" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (PyTorch 1.13 Python 3.9 CPU Optimized)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/pytorch-1.13-cpu-py39" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 4 }