{ "cells": [ { "cell_type": "markdown", "id": "11ae81b2-4ac4-495f-9f9e-01c6af9d7b63", "metadata": {}, "source": [ "### Kernel and SageMaker Instance Setup\n", "**Please use the ml.g4dn.xlarge instance for this notebook. The Kernel is 'Data Science - Python3'**" ] }, { "cell_type": "markdown", "id": "1f6b8b81-aee6-4e38-9473-481ee76692e8", "metadata": { "tags": [] }, "source": [ "# Text Summarization of Consumer Health Questions\n", "## Part 1 Fine tuning Flan-t5 locally in the notebook\n", "\n", "In this notebook we will learn how to fine tune the Flan-t5 model for medical summarization task in the local notebook. We will use the MeQSum dataset for fine-tuning. The MeQSum dataset contains three columns : id, text and summary. We will first split the dataset into three parts - train, validation and test. For training, we use the text column as input and the summary column as the label (output). After training the model we use the test dataset to generate summary and then compare that with the human generated summary in the dataset. \n", "\n", "### MeQSum Dataset\n", "\"On the Summarization of Consumer Health Questions\". Asma Ben Abacha and Dina Demner-Fushman. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019. \n", "#### Citation Information\n", "@Inproceedings{MeQSum,\n", "author = {Asma {Ben Abacha} and Dina Demner-Fushman},\n", "title = {On the Summarization of Consumer Health Questions},\n", "booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28th - August 2},\n", "year = {2019},\n", "abstract = {Question understanding is one of the main challenges in question answering. In real world applications, users often submit natural language questions that are longer than needed and include peripheral information that increases the complexity of the question, leading to substantially more false positives in answer retrieval. In this paper, we study neural abstractive models for medical question summarization. We introduce the MeQSum corpus of 1,000 summarized consumer health questions. We explore data augmentation methods and evaluate state-of-the-art neural abstractive models on this new task. In particular, we show that semantic augmentation from question datasets improves the overall performance, and that pointer-generator networks outperform sequence-to-sequence attentional models on this task, with a ROUGE-1 score of 44.16%. We also present a detailed error analysis and discuss directions for improvement that are specific to question summarization. }}\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "36b2694b-9396-44ba-92bd-0ec9924ecbfa", "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install -q openpyxl==3.0.3 xlrd==1.2.0\n", "!pip install -q torch==1.13.1 datasets==2.12.0 transformers==4.28.0 rouge-score==0.1.2 nltk==3.8.1 sentencepiece==0.1.99 evaluate==0.4.0" ] }, { "cell_type": "markdown", "id": "3f9939f2-7868-4aa5-aa79-dc31a54dbb7b", "metadata": {}, "source": [ "## 1. Data Preparation" ] }, { "cell_type": "markdown", "id": "d5ce83e9-b7ae-4932-abe6-3ebea96a7f8a", "metadata": {}, "source": [ "#### Download the publicly available dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "42d062f8-b7fb-4e65-8ebb-724b5f84d61b", "metadata": { "tags": [] }, "outputs": [], "source": [ "import urllib.request\n", "urllib.request.urlretrieve('https://github.com/abachaa/MeQSum/raw/master/MeQSum_ACL2019_BenAbacha_Demner-Fushman.xlsx', 'MeQSum_ACL2019_BenAbacha_Demner-Fushman.xlsx')\n", "# please wait a few seconds till you see the excel file in your folder." ] }, { "cell_type": "markdown", "id": "7cd4cf01-1d14-4fa3-a09c-ca73b089c8e7", "metadata": {}, "source": [ "#### Prepare the data for ingestion e.g. making all text lowercase, and adding row IDs" ] }, { "cell_type": "code", "execution_count": null, "id": "8901bbb6-66bf-497c-a3c1-2ebe6db9f0ea", "metadata": { "tags": [] }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "# dataset from https://github.com/abachaa/MeQSum\n", "df = pd.read_excel('MeQSum_ACL2019_BenAbacha_Demner-Fushman.xlsx')\n", "df = df.drop('File', axis=1)\n", "df = df.rename(columns={'CHQ':'Text'})\n", "df = df.dropna()\n", "df['Text']= df['Text'].apply(lambda x: x.lower())\n", "df['Summary'] = df['Summary'].apply(lambda x: x.lower())\n", "df['Id'] = range(0, len(df.index))\n", "df = df[['Id', 'Text', 'Summary']]\n", "# df = df.sample(frac=1).reset_index(drop=True) # to shaffule\n", "df" ] }, { "cell_type": "markdown", "id": "b5fdf263-6589-402b-b271-15b1f1f9d80d", "metadata": {}, "source": [ "#### Import the tokenisation libraries & functions. \n", "##### Tokenisation is required to convert raw text into smaller subsets. It does this by creating vectorised versions of each word." ] }, { "cell_type": "code", "execution_count": null, "id": "60895532-1706-4bc5-bdf9-2c0513fcab5d", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Import libraries required for modelling\n", "\n", "import torch\n", "import datasets\n", "from datasets import Dataset\n", "from datasets import load_metric\n", "from datasets import concatenate_datasets\n", "\n", "import transformers\n", "from transformers import AutoTokenizer\n", "from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer\n", "\n", "import numpy as np\n", "import evaluate\n", "\n", "import nltk\n", "from nltk.tokenize import sent_tokenize\n", "nltk.download(\"punkt\")" ] }, { "cell_type": "code", "execution_count": null, "id": "2aea70ac-4fab-4368-9d8c-f530db178ef1", "metadata": { "tags": [] }, "outputs": [], "source": [ "model_checkpoint = 'google/flan-t5-small'\n", "tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)" ] }, { "cell_type": "markdown", "id": "169e7dca-ddc2-4ee7-8a11-cde168268b63", "metadata": {}, "source": [ "#### Outputs of the tokeniser:\n", "**input_ids**: the corresponding indicies of each word. Note that AWS wasn't in this vocabulary and has been split into A, W, S\n", "\n", "**attention_mask**: indicates to the LLM which tokens should be prioritised\n" ] }, { "cell_type": "code", "execution_count": null, "id": "994ddc3b-2915-46ac-bce2-df8d821bf9b1", "metadata": { "tags": [] }, "outputs": [], "source": [ "tokenizer(\"Hello, welcome to AWS!\")" ] }, { "cell_type": "code", "execution_count": null, "id": "3da11596-f097-4fc2-83a6-65779e01a4f3", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Similar to any ML model, we split the data into a train and test set\n", "\n", "train = df[:700]\n", "val = df[700:900]\n", "test = df[900:]\n", "print('train: {}, val: {}, test: {}'.format(train.shape, val.shape, test.shape))" ] }, { "cell_type": "code", "execution_count": null, "id": "84c09d05-7f3c-494a-a362-1e4a92e9eda5", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Metadata and dataset objects\n", "\n", "train_dataset = Dataset.from_pandas(train)\n", "val_dataset = Dataset.from_pandas(val)\n", "test_dataset = Dataset.from_pandas(test)" ] }, { "cell_type": "code", "execution_count": null, "id": "f6278893-356b-471a-a571-b487433e1462", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Deterimine the max input length and max target length based on the number of rows in the dataset\n", "\n", "tokenized_inputs = concatenate_datasets([train_dataset, val_dataset, test_dataset]).map(lambda x: tokenizer(x[\"Text\"], truncation=True), batched=True, remove_columns=[\"Text\", \"Summary\"])\n", "max_input_length = max([len(x) for x in tokenized_inputs[\"input_ids\"]])\n", "print(f\"Max input length: {max_input_length}\")\n", "\n", "tokenized_targets = concatenate_datasets([train_dataset, val_dataset, test_dataset]).map(lambda x: tokenizer(x[\"Summary\"], truncation=True), batched=True, remove_columns=[\"Text\", \"Summary\"])\n", "max_target_length = max([len(x) for x in tokenized_targets[\"input_ids\"]])\n", "print(f\"Max target length: {max_target_length}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "e4f415a0-99a0-429b-8ddd-7f2430d69fa9", "metadata": { "tags": [] }, "outputs": [], "source": [ "display(train_dataset)" ] }, { "cell_type": "markdown", "id": "9e247d20-f343-40de-82d5-166197af842b", "metadata": { "tags": [] }, "source": [ "#### Create a function to tokenise inputs to the model & ensure vectors are the same length" ] }, { "cell_type": "code", "execution_count": null, "id": "526692e1-9e6e-4ee0-9079-a52139035b40", "metadata": { "tags": [] }, "outputs": [], "source": [ "def preprocess_function(sample,padding=\"max_length\"):\n", " inputs = [\"summarize: \" + item for item in sample[\"Text\"]]\n", " model_inputs = tokenizer(inputs, max_length=max_input_length, padding=padding, truncation=True)\n", "\n", " labels = tokenizer(text_target=sample[\"Summary\"], max_length=max_target_length, padding=padding, truncation=True)\n", "\n", " if padding == \"max_length\":\n", " labels[\"input_ids\"] = [\n", " [(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels[\"input_ids\"]\n", " ]\n", "\n", " model_inputs[\"labels\"] = labels[\"input_ids\"]\n", " return model_inputs" ] }, { "cell_type": "code", "execution_count": null, "id": "54658c51-eb8d-4bc4-84f0-7cdd3fee58b8", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Apply the tokenisation function\n", "\n", "tokenized_train = train_dataset.map(preprocess_function, batched=True)\n", "tokenized_val = val_dataset.map(preprocess_function, batched=True)\n", "\n", "print(f\"Keys of tokenized dataset: {tokenized_train.features}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "687e4b1c-bda3-49a5-bfd1-edcf3bd12d68", "metadata": { "tags": [] }, "outputs": [], "source": [ "assert isinstance(tokenizer, transformers.PreTrainedTokenizerFast)" ] }, { "cell_type": "markdown", "id": "66a8f722-d3d7-437e-a708-bf1370ff99ee", "metadata": {}, "source": [ "# 2. Train the model using HuggingFace" ] }, { "cell_type": "code", "execution_count": null, "id": "3b9ecc1f-5ab6-4e24-abb4-f9a64439c8c9", "metadata": { "tags": [] }, "outputs": [], "source": [ "model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)" ] }, { "cell_type": "markdown", "id": "620299aa-9cb7-4426-ae79-22ee4a1956d4", "metadata": {}, "source": [ "#### Determine parameters \n", "**batch_size:** affects the efficiency of prediction\n", "\n", "**label_pad_token_id:** the id to use when padding the labels (-100 will be automatically ignored) \n", "\n", "**data_collator:** an object with the important parameters required for tokenisation" ] }, { "cell_type": "code", "execution_count": null, "id": "69a58a16-1395-4b77-8f7d-59f75db995a8", "metadata": { "tags": [] }, "outputs": [], "source": [ "batch_size = 4\n", "label_pad_token_id = -100\n", "data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, label_pad_token_id=label_pad_token_id, pad_to_multiple_of=8)" ] }, { "cell_type": "markdown", "id": "dfdfe0bb-80e7-4e7f-bc75-99dbb97a0127", "metadata": {}, "source": [ "#### Define a function to evaluate the models performance\n", "**ROUGE**: Recall-Oriented Understudy for Gisting Evaluation. A set of metrics for evaluating automatic summarization algorithms" ] }, { "cell_type": "code", "execution_count": null, "id": "086f121e-3cf0-41d5-b9a3-2d8d08168ffd", "metadata": { "tags": [] }, "outputs": [], "source": [ "metric = evaluate.load(\"rouge\")\n", "\n", "# A function to post process the outputs of the model, and present them in an easy to read format\n", "\n", "def postprocess_text(preds, labels):\n", " preds = [pred.strip() for pred in preds]\n", " labels = [label.strip() for label in labels]\n", " preds = [\"\\n\".join(sent_tokenize(pred)) for pred in preds]\n", " labels = [\"\\n\".join(sent_tokenize(label)) for label in labels]\n", " return preds, labels\n", "\n", "# A function to generate the metrics pertaining to the prediction\n", "\n", "def compute_metrics(eval_preds):\n", " preds, labels = eval_preds\n", " if isinstance(preds, tuple):\n", " preds = preds[0]\n", " decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)\n", " labels = np.where(labels != -100, labels, tokenizer.pad_token_id)\n", " decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)\n", " decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)\n", "\n", " result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)\n", " result = {k: round(v * 100, 4) for k, v in result.items()}\n", " prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]\n", " result[\"gen_len\"] = np.mean(prediction_lens)\n", " return result" ] }, { "cell_type": "code", "execution_count": null, "id": "ce6e7ab9-010c-47ef-9609-18ffe25f8aaa", "metadata": { "tags": [] }, "outputs": [], "source": [ "# The gc library (garbage collector library) is used for memory management\n", "\n", "import gc\n", "gc.collect()\n", "\n", "DEVICE = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "if DEVICE == \"cuda\":\n", " print(\"[INFO] training using {}\".format(torch.cuda.get_device_name(0)))\n", "\n", "torch.cuda.empty_cache()\n", "%env WANDB_DISABLED=True" ] }, { "cell_type": "markdown", "id": "62f143db-89ca-4c43-9530-a970bf87d3bd", "metadata": {}, "source": [ "#### Define the objects & parameters for model training" ] }, { "cell_type": "code", "execution_count": null, "id": "c94e6461-f756-4dbc-ac78-e1b2c14785b8", "metadata": { "tags": [] }, "outputs": [], "source": [ "model_name = model_checkpoint.split(\"/\")[-1]\n", "print(f\"The name of the model is {model_name}\")\n", "\n", "# Arguments that will be included in the model training object\n", "\n", "args = Seq2SeqTrainingArguments(\n", " f\"{model_name}-finetuned-meqsum2019\",\n", " evaluation_strategy = \"epoch\",\n", " save_strategy=\"epoch\",\n", " load_best_model_at_end=True,\n", " learning_rate=2e-5,\n", " per_device_train_batch_size=batch_size,\n", " per_device_eval_batch_size=batch_size,\n", " weight_decay=0.01,\n", " save_total_limit=3,\n", " num_train_epochs=10,\n", " logging_strategy=\"steps\",\n", " logging_steps=100,\n", " predict_with_generate=True,\n", " fp16=False\n", ")" ] }, { "cell_type": "markdown", "id": "1d6dd71b-8369-40cf-8423-60baaa6450f4", "metadata": {}, "source": [ "#### The outputs of the training job below will show the models performance\n", "If the validation loss is lower than the training loss, then the model is generalising well" ] }, { "cell_type": "code", "execution_count": null, "id": "2c1f1e59-8078-48f5-bf47-59e35d067ef5", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Create the model training object & train it\n", "\n", "trainer = Seq2SeqTrainer(\n", " model,\n", " args,\n", " train_dataset=tokenized_train,\n", " eval_dataset=tokenized_val,\n", " data_collator=data_collator,\n", " tokenizer=tokenizer,\n", " compute_metrics=compute_metrics\n", ")\n", "\n", "trainer.train()" ] }, { "cell_type": "markdown", "id": "cd93016e-7d04-486b-b923-375e9ce59fc7", "metadata": {}, "source": [ "## 3. Perform inferencing on the test dataset" ] }, { "cell_type": "markdown", "id": "09e50071-f3cd-4102-856d-8e5e496e46dc", "metadata": {}, "source": [ "#### Perform the necessary transformations (functions we defined above) to understand the model's performance on the test dataset, and view it's output summaries" ] }, { "cell_type": "code", "execution_count": null, "id": "193fa260-fd02-4c2c-90da-7189146fa38b", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Tokenise the test dataset\n", "\n", "test_dataset = Dataset.from_pandas(test)\n", "tokenized_test = test_dataset.map(\n", " preprocess_function,\n", " batched=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "3089428f-6881-4370-894b-e7e4513aadc8", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Obtain metrics of the model's performance on the test set\n", "\n", "predict_results = trainer.predict(tokenized_test)\n", "predict_results.metrics" ] }, { "cell_type": "markdown", "id": "35234aa0-001b-433a-afc4-101e75ff6077", "metadata": {}, "source": [ "#### The `Predicted Summary` column is the model's output" ] }, { "cell_type": "code", "execution_count": null, "id": "9f3b2150-cc92-4289-a44e-505125a60734", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Decode the prediction result & view the predictions\n", "\n", "if args.predict_with_generate:\n", " predictions = tokenizer.batch_decode(predict_results.predictions, skip_special_tokens=True, clean_up_tokenization_spaces=True)\n", " predictions = [pred.strip() for pred in predictions]\n", "\n", "# Model summarisation performance can be inspected by evaluation metrics and spot checks\n", "\n", "test['Predicted Summary'] = predictions\n", "pd.set_option('display.max_colwidth', 1024)\n", "display(test)" ] }, { "cell_type": "markdown", "id": "1d27a41b-9825-41bb-88f9-22551e2e78a5", "metadata": {}, "source": [ "## 4. Stop the notebook instance\n", "This notebook uses ml.g4dn.2xlarge which we will need for other labs. Please stop the notebook kernel and instance once you are done. To stop the instance use the menu on the left. Look for the symbol that has a black square inside a circle. Click on the 'power' button next to the Jupyter Notebook instance. select 'Shutdown All'" ] }, { "cell_type": "code", "execution_count": null, "id": "bcdef9e6-5738-4972-ac22-319773301951", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "instance_type": "ml.g4dn.xlarge", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }