{ "cells": [ { "cell_type": "markdown", "id": "a84cc22c", "metadata": {}, "source": [ "# Deploying Dolly-12B on SageMaker using DeepSpeed Large Model Container DLC\n", "\n", "In this notebook, we explore how to host a large language model on SageMaker using the [Large Model Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-large-model-inference.html) container that is optimized for hosting large models using DJLServing. DJLServing is a high-performance universal model serving solution powered by the Deep Java Library (DJL) that is programming language agnostic. To learn more about DJL and DJLServing, you can refer to our recent [blog post](https://aws.amazon.com/blogs/machine-learning/deploy-large-models-on-amazon-sagemaker-using-djlserving-and-deepspeed-model-parallel-inference/).\n", "\n", "In this notebook, we deploy the [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b).\n", "\n", "This notebook was tested on a `ml.t3.medium` instance using the `Python 3 (Data Science)` kernel on SageMaker Studio." ] }, { "cell_type": "markdown", "id": "e3e17815-054e-4242-9524-f71273417769", "metadata": {}, "source": [ "# License information\n", "\n", "Please view the license information of using this model [here](https://huggingface.co/databricks/dolly-v2-12b)" ] }, { "cell_type": "markdown", "id": "9f899bc2", "metadata": {}, "source": [ "## Create a SageMaker Model for Deployment\n", "As a first step, we'll import the relevant libraries and configure several global variables such as the hosting image that will be used nd the S3 location of our model artifacts" ] }, { "cell_type": "code", "execution_count": null, "id": "ae274590-2828-4592-8b01-219797b226a9", "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install sagemaker boto3 huggingface_hub --upgrade --quiet" ] }, { "cell_type": "code", "execution_count": null, "id": "dc9515a9", "metadata": { "tags": [] }, "outputs": [], "source": [ "import sagemaker\n", "from sagemaker.model import Model\n", "from sagemaker import serializers, deserializers\n", "from sagemaker import image_uris\n", "import boto3\n", "import os\n", "import time\n", "import json\n", "import jinja2\n", "from pathlib import Path" ] }, { "cell_type": "code", "execution_count": null, "id": "8ffef362", "metadata": { "tags": [] }, "outputs": [], "source": [ "role = sagemaker.get_execution_role() # execution role for the endpoint\n", "sess = sagemaker.session.Session() # sagemaker session for interacting with different AWS APIs\n", "bucket = sess.default_bucket() # bucket to house artifacts\n", "model_bucket = sess.default_bucket() # bucket to house artifacts\n", "s3_code_prefix = \"databricks-dolly-v2-12b/code\" # folder within bucket where code artifact will go\n", "s3_model_prefix = \"databricks-dolly-v2-12b/model\" # folder where model checkpoint will go\n", "\n", "region = sess._region_name # region name of the current SageMaker Studio environment\n", "account_id = sess.account_id() # account_id of the current SageMaker Studio environment\n", "\n", "s3_client = boto3.client(\"s3\") # client to intreract with S3 API\n", "sm_client = boto3.client(\"sagemaker\") # client to intreract with SageMaker\n", "smr_client = boto3.client(\"sagemaker-runtime\") # client to intreract with SageMaker Endpoints\n", "jinja_env = jinja2.Environment() # jinja environment to generate model configuration templates" ] }, { "cell_type": "code", "execution_count": null, "id": "7c88a9b1", "metadata": { "tags": [] }, "outputs": [], "source": [ "# lookup the inference image uri based on our current region\n", "inference_image_uri = (\n", " f\"763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0.21.0-deepspeed0.8.3-cu117\"\n", ")\n", "print(f\"Image going to be used is ---- > {inference_image_uri}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "cbca6d02-1142-482b-80ce-5521b267dd05", "metadata": { "tags": [] }, "outputs": [], "source": [ "from huggingface_hub import snapshot_download\n", "from pathlib import Path\n", "import os\n", "\n", "# - This will download the model into the current directory where ever the jupyter notebook is running\n", "local_model_path = Path(\".\")\n", "local_model_path.mkdir(exist_ok=True)\n", "model_name = \"databricks/dolly-v2-12b\"\n", "# Only download pytorch checkpoint files\n", "allow_patterns = [\"*.json\", \"*.pt\", \"*.bin\", \"*.txt\", \"*.model\"]\n", "\n", "# - Leverage the snapshot library to donload the model since the model is stored in repository using LFS\n", "model_download_path = snapshot_download(\n", " repo_id=model_name,\n", " cache_dir=local_model_path,\n", " allow_patterns=allow_patterns,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "ded6e587-f7fe-4d11-a479-2e09127e6811", "metadata": { "tags": [] }, "outputs": [], "source": [ "model_artifact = sess.upload_data(path=model_download_path, key_prefix=s3_model_prefix)\n", "print(f\"Model uploaded to --- > {model_artifact}\")\n", "print(f\"We will set option.s3url={model_artifact}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "326c19dd-681d-404e-8ca8-d293b75b2d87", "metadata": { "tags": [] }, "outputs": [], "source": [ "!rm -rf {model_download_path}" ] }, { "cell_type": "markdown", "id": "d73f2b49", "metadata": {}, "source": [ "## Deploying a Large Language Model using Hugging Face Accelerate\n", "The DJL Inference Image which we will be utilizing ships with a number of built-in inference handlers for a wide variety of tasks including:\n", "- `text-generation`\n", "- `question-answering`\n", "- `text-classification`\n", "- `token-classification`\n", "\n", "You can refer to this [GitRepo](https://github.com/deepjavalibrary/djl-serving/tree/master/engines/python/setup/djl_python) for a list of additional handlers and available NLP Tasks. <br>\n", "These handlers can be utilized as is without having to write any custom inference code. We simply need to create a `serving.properties` text file with our desired hosting options and package it up into a `tar.gz` artifact.\n", "\n", "Lets take a look at the `serving.properties` file that we'll be using for our first example" ] }, { "cell_type": "code", "execution_count": null, "id": "4b4f2f1b-6287-434c-a1f1-752e297a9162", "metadata": { "tags": [] }, "outputs": [], "source": [ "!mkdir -p code_dolly-12b" ] }, { "cell_type": "code", "execution_count": null, "id": "54487280-9623-4298-8ec7-225d1c5e22e1", "metadata": { "tags": [] }, "outputs": [], "source": [ "%%writefile ./code_dolly-12b/serving.properties\n", "option.s3url={{s3url}}\n", "engine=DeepSpeed\n", "option.tensor_parallel_degree=4\n", "option.dtype=fp16\n", "option.task=text-generation\n", "option.entryPoint=djl_python.deepspeed" ] }, { "cell_type": "code", "execution_count": null, "id": "52de9bc2", "metadata": { "tags": [] }, "outputs": [], "source": [ "# we plug in the appropriate model location into our `serving.properties` file based on the region in which this notebook is running\n", "template = jinja_env.from_string(Path(\"code_dolly-12b/serving.properties\").open().read())\n", "Path(\"code_dolly-12b/serving.properties\").open(\"w\").write(template.render(s3url=model_artifact))\n", "!pygmentize code_dolly-12b/serving.properties | cat -n" ] }, { "cell_type": "markdown", "id": "3f8d0b1c", "metadata": {}, "source": [ "There are a few options specified here. Lets go through them in turn<br>\n", "1. `engine` - specifies the engine that will be used for this workload. In this case we'll be hosting a model using the [DJL Python Engine](https://github.com/deepjavalibrary/djl-serving/tree/master/engines/python)\n", "2. `option.entryPoint` - specifies the entrypoint code that will be used to host the model. djl_python.huggingface refers to the `huggingface.py` module from [djl_python repo](https://github.com/deepjavalibrary/djl-serving/tree/master/engines/python/setup/djl_python). \n", "3. `option.s3url` - specifies the location of the model files. Alternativelly an `option.model_id` option can be used instead to specifiy a model from Hugging Face Hub (e.g. `EleutherAI/gpt-j-6B`) and the model will be automatically downloaded from the Hub. The s3url approach is recommended as it allows you to host the model artifact within your own environment and enables faster deployments by utilizing optimized approach within the DJL inference container to transfer the model from S3 into the hosting instance \n", "4. `option.task` - This is specific to the `huggingface.py` inference handler and specifies for which task this model will be used\n", "5. `option.device_map` - Enables layer-wise model partitioning through [Hugging Face Accelerate](https://huggingface.co/docs/accelerate/usage_guides/big_modeling#designing-a-device-map). With `option.device_map=auto`, Accelerate will determine where to put each **layer** to maximize the use of your fastest devices (GPUs) and offload the rest on the CPU, or even the hard drive if you don’t have enough GPU RAM (or CPU RAM). Even if the model is split across several devices, it will run as you would normally expect.\n", "6. `option.load_in_8bit` - Quantizes the model weights to int8 thereby greatly reducing the memory footprint of the model from the initial FP32. See this [blog post](https://huggingface.co/blog/hf-bitsandbytes-integration) from Hugging Face for additional information \n", "\n", "For more information on the available options, please refer to the [SageMaker Large Model Inference Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-large-model-configuration.html)\n", "\n", "Notice that the engine parameter is now set to DeepSpeed and the option.entryPoint has been modified to use the deepspeed.py module. Python scripts that use DeepSpeed can not be launched as traditional python scripts (i.e. python deepspeed.py would not work.) Setting engine=DeepSpeed will automatically configure the environment and launch the inference script appropriatelly.\n", "The only other new parameter here is option.tensor_parallel_degree where we have to specify the number of GPU devices to which the model will be sharded.\n", "\n", "Unlike Accelerate where the model was partitioned along the layers, DeepSpeed uses TensorParallelism where individual layers (Tensors) are sharded accross devices. For example each GPU can have a slice of each layer. \n", "\n", "Where with the layer-wise approach, the data flows through each GPU device sequeantially, here data is sent to all GPU devices where a partial result is compute on each GPU. The partial results are then collected though an All-Gather operation to compute the final result. \n", "TensorParallelism generally provides higher GPU utilization and better performance." ] }, { "cell_type": "markdown", "id": "4c2d0302", "metadata": {}, "source": [ "We place the `serving.properties` file into a tarball and upload it to S3" ] }, { "cell_type": "code", "execution_count": null, "id": "2a9ac570", "metadata": { "tags": [] }, "outputs": [], "source": [ "!rm -f model.tar.gz\n", "!tar czvf model.tar.gz -C code_dolly-12b ." ] }, { "cell_type": "code", "execution_count": null, "id": "dfd0ce74", "metadata": { "tags": [] }, "outputs": [], "source": [ "s3_code_artifact = sess.upload_data(\"model.tar.gz\", bucket, s3_code_prefix)\n", "print(f\"S3 Code or Model tar ball uploaded to --- > {s3_code_artifact}\")" ] }, { "cell_type": "markdown", "id": "5e4bb2e7", "metadata": {}, "source": [ "## Deploy Model to a SageMaker Endpoint\n", "With a helper function we can now deploy our endpoint and invoke it with some sample inputs" ] }, { "cell_type": "code", "execution_count": null, "id": "30c4991b", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.utils import name_from_base\n", "\n", "model_name = name_from_base(f\"dolly-12b\")\n", "print(model_name)\n", "\n", "create_model_response = sm_client.create_model(\n", " ModelName=model_name,\n", " ExecutionRoleArn=role,\n", " PrimaryContainer={\"Image\": inference_image_uri, \"ModelDataUrl\": s3_code_artifact},\n", ")\n", "model_arn = create_model_response[\"ModelArn\"]\n", "\n", "print(f\"Created Model: {model_arn}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "f3631412", "metadata": { "tags": [] }, "outputs": [], "source": [ "endpoint_config_name = f\"{model_name}-config\"\n", "endpoint_name = f\"{model_name}-endpoint\"\n", "\n", "endpoint_config_response = sm_client.create_endpoint_config(\n", " EndpointConfigName=endpoint_config_name,\n", " ProductionVariants=[\n", " {\n", " \"VariantName\": \"variant1\",\n", " \"ModelName\": model_name,\n", " \"InstanceType\": \"ml.g5.12xlarge\",\n", " \"InitialInstanceCount\": 1,\n", " # \"ModelDataDownloadTimeoutInSeconds\": 2400,\n", " \"ContainerStartupHealthCheckTimeoutInSeconds\": 600,\n", " },\n", " ],\n", ")\n", "endpoint_config_response" ] }, { "cell_type": "code", "execution_count": null, "id": "dc206d70", "metadata": { "tags": [] }, "outputs": [], "source": [ "create_endpoint_response = sm_client.create_endpoint(\n", " EndpointName=f\"{endpoint_name}\", EndpointConfigName=endpoint_config_name\n", ")\n", "print(f\"Created Endpoint: {create_endpoint_response['EndpointArn']}\")" ] }, { "cell_type": "markdown", "id": "287d7f09-209b-4c39-9f40-cead808dac81", "metadata": {}, "source": [ "Let's run an example with a basic text generation prompt `Large model inference is`" ] }, { "cell_type": "code", "execution_count": null, "id": "8ec49948-7ad2-4dac-8db5-35dbd9a32240", "metadata": { "tags": [] }, "outputs": [], "source": [ "import time\n", "\n", "resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n", "status = resp[\"EndpointStatus\"]\n", "print(\"Status: \" + status)\n", "\n", "while status == \"Creating\":\n", " time.sleep(60)\n", " resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n", " status = resp[\"EndpointStatus\"]\n", " print(\"Status: \" + status)\n", "\n", "print(\"Arn: \" + resp[\"EndpointArn\"])\n", "print(\"Status: \" + status)" ] }, { "cell_type": "markdown", "id": "042080bf-0726-4092-9a4d-76cf49796fad", "metadata": {}, "source": [ "Now let's try the model" ] }, { "cell_type": "code", "execution_count": null, "id": "b6cc88fc-2e0c-4b09-b7f9-1522a43117da", "metadata": { "tags": [] }, "outputs": [], "source": [ "%%time\n", "prompts = [\"Hi, How are you?\"]\n", "response_model = smr_client.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " Body=json.dumps(\n", " {\n", " \"inputs\": prompts,\n", " \"parameters\": {\n", " \"early_stopping\": True,\n", " \"no_repeat_ngram_size\": 4,\n", " \"max_new_tokens\": 200,\n", " \"do_sample\": True,\n", " \"temperature\": 0.1,\n", " \"top_p\": 0.95,\n", " },\n", " }\n", " ),\n", " ContentType=\"application/json\",\n", ")\n", "\n", "response_model[\"Body\"].read().decode(\"utf8\")" ] }, { "cell_type": "markdown", "id": "f609e370-0d55-4b9e-bde3-1570da32f185", "metadata": {}, "source": [ "### Clean Up" ] }, { "cell_type": "code", "execution_count": null, "id": "479e5991", "metadata": {}, "outputs": [], "source": [ "# - Delete the end point\n", "sm_client.delete_endpoint(EndpointName=endpoint_name)" ] }, { "cell_type": "code", "execution_count": null, "id": "432657ac", "metadata": {}, "outputs": [], "source": [ "# - In case the end point failed we still want to delete the model\n", "sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n", "sm_client.delete_model(ModelName=model_name)" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 21, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 28, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 29, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }