{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Amazon SageMaker DICOM Training Overview\n", "\n", "In this example we will demonstrate how to integrate the [MONAI](http://monai.io) framework into Amazon SageMaker, use SageMaker Ground Truth labelled data, and give example code of MONAI pre-processing transforms and neural network (DenseNet) that you can use to train a medical image classification model using DICOM images directly. Please also visit [Build a medical image analysis pipeline on Amazon SageMaker using the MONAI framework](https://aws.amazon.com/blogs/industries/build-a-medical-image-analysis-pipeline-on-amazon-sagemaker-using-the-monai-framework/) for additional details on how to deploy the MONAI model, pipe input data from S3, and perform batch inferences using SageMaker batch transform.\n", "\n", "For more information about the PyTorch in SageMaker, please visit [sagemaker-pytorch-containers](https://github.com/aws/sagemaker-pytorch-containers) and [sagemaker-python-sdk](https://github.com/aws/sagemaker-python-sdk) github repositories." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n", "# SPDX-License-Identifier: MIT-0\n", "\n", "!pip install -r ./source/requirements.txt\n", "!mkdir -p data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "from pathlib import Path\n", "from dotenv import load_dotenv\n", "env_path = Path('.') / 'set.env'\n", "load_dotenv(dotenv_path=env_path)\n", "\n", "bucket=os.environ.get('BUCKET')\n", "bucket_path=os.environ.get('BUCKET_PATH')\n", "user=os.environ.get('DICOM_USER')\n", "password = os.environ.get('DICOM_PASSWORD')\n", "\n", "print('Bucket: '+bucket)\n", "print('Bucket Prefix: '+bucket_path)\n", "print('User: '+user)\n", "print('Password: '+password)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Amazon SageMaker GroundTruth Labeling Metadata\n", "\n", "Download and parse the GroundTruth JSON annotation file for labelled data, classes and DICOM image URLs. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import boto3\n", "\n", "image_url_list = []\n", "image_label_list = []\n", "\n", "# Get SageMaker GroundTruth labeling data from annotation file\n", "datadir = './data'\n", "metadata = datadir+'/meta-data.json'\n", "s3 = boto3.client('s3')\n", "s3.download_file(bucket, bucket_path, metadata)\n", "\n", "# Load Labels\n", "with open(metadata) as f:\n", " manifest = json.load(f)\n", " class_names = list(json.loads(manifest[0]['annotations'][0]['annotationData']['content'])['disease'].keys())\n", " \n", "for i, j in enumerate(manifest):\n", " label_dict = json.loads(json.loads(manifest[i]['annotations'][0]['annotationData']['content'])['labels'])\n", " image_url_list.extend([label_dict['imageurl']])\n", " image_label_list.extend([class_names.index(label_dict['label'][0])])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DICOM Upload Files to S3\n", "\n", "Download the DICOM files from Orthanc DICOM Server into S3 for SageMaker training." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import urllib3\n", "import requests\n", "import os\n", "from io import BytesIO\n", "import contextlib;\n", "urllib3.disable_warnings()\n", "\n", "image_file_list = []\n", "\n", "#Load DICOM images to S3\n", "for i, j in enumerate(image_url_list):\n", " file_name = image_url_list[i].split(\"/file\")[0].split(\"instances/\")[1] + '.dcm'\n", " response = requests.get(image_url_list[i], auth=(user, password), stream=True, verify=False)\n", " fp = BytesIO(response.content)\n", " s3.upload_fileobj(fp, bucket, file_name) \n", " image_file_list.append(file_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DICOM Display Sample Set\n", "\n", "Download a sample of DICOM images from the S3 bucket and display with label from annotation file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import monai\n", "from monai.transforms import Compose, LoadImage, Resize, ScaleIntensity, ToTensor, SqueezeDim\n", "import matplotlib.pyplot as plt\n", "\n", "#Display sample of DICOM Images\n", "trans = Compose([LoadImage(image_only=True), Resize(spatial_size=(108,96)),SqueezeDim()])\n", "plt.subplots(1, 3, figsize=(8, 8))\n", "for i in range(0,3):\n", " s3.download_file(bucket, image_file_list[i], datadir+'/'+image_file_list[i])\n", " img = trans(datadir+'/'+image_file_list[i])\n", " plt.subplot(1, 3, i + 1)\n", " plt.xlabel(class_names[image_label_list[i]])\n", " plt.imshow(img, cmap='gray')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "\n", "### Create Sagemaker session and S3 location for DICOM dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "\n", "sagemaker_session = sagemaker.Session()\n", "role = sagemaker.get_execution_role()\n", "\n", "inputs = sagemaker_session.upload_data(path=datadir, bucket=bucket)\n", "print('input spec (in this case, just an S3 path): {}'.format(inputs))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train Model\n", "### Training\n", "\n", "The ```monai_dicom.py``` script provides all the code we need for training and hosting a SageMaker model (model_fn function to load a model). The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, such as:\n", "\n", "* SM_MODEL_DIR: A string representing the path to the directory to write model artifacts to. These artifacts are uploaded to S3 for model hosting.\n", "* SM_NUM_GPUS: The number of gpus available in the current container.\n", "* SM_CURRENT_HOST: The name of the current container on the container network.\n", "* SM_HOSTS: JSON encoded list containing all the hosts .\n", "Supposing one input channel, 'training', was used in the call to the PyTorch estimator's fit() method, the following will be set, following the format SM_CHANNEL_[channel_name]:\n", "\n", "* SM_CHANNEL_TRAINING: A string representing the path to the directory containing data in the 'training' channel.\n", "For more information about training environment variables, please visit [SageMaker Containers](https://github.com/aws/sagemaker-containers).\n", "\n", "A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to model_dir so that it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an argparse.ArgumentParser instance." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize source/monai_dicom.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run training in SageMaker\n", "\n", "The `PyTorch` class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, the training instance type, and hyperparameters. In this case we are going to run our training job on ```ml.m5.xlarge``` instance. But this example can be ran on one or multiple, cpu or gpu instances ([full list of available instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)). The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the ```monai_dicom.py``` script above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.pytorch import PyTorch\n", "\n", "estimator = PyTorch(entry_point='monai_dicom.py',\n", " source_dir='source',\n", " role=role,\n", " framework_version='1.5.0',\n", " py_version='py3',\n", " instance_count=1,\n", " instance_type='ml.m5.xlarge',\n", " hyperparameters={\n", " 'backend': 'gloo',\n", " 'epochs': 5\n", " })" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After we've constructed our PyTorch object, we can fit it using the DICOM dataset we uploaded to S3." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "estimator.fit({'train': inputs})" ] } ], "metadata": { "kernelspec": { "display_name": "conda_pytorch_latest_p36", "language": "python", "name": "conda_pytorch_latest_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }