{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Object Detection in Chest Xrays\n", "\n", "This workshop uses a portion of the NIH Chest Xray dataset. Specifically, we will use about 1,000 images where we will predict the location of the trachea and throat of the patient.\n", "\n", "In addition, you'll learn about a variety of SageMaker features for training. In this lab we will:\n", "1. Download and prepare the result of a Ground Truth labelling job for xray image classification\n", "2. Visualize this dataset locally\n", "3. Train an object detection model using the built-in object detection algorithm from SageMaker\n", "4. Leverage GPU's and spot instances for running the training job.\n", "5. Set up our own model using script mode, leveraging GluonCV\n", "6. Leverage debugger for this job\n", "7. Visualize the network for our model locally.\n", "8. View and track all of this progress using Experiments.\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --upgrade pip\n", "!pip install matplotlib\n", "!pip install imageio\n", "!pip install --upgrade awscli\n", "!pip install --upgrade boto3\n", "!pip install sagemaker-experiments\n", "!pip install --upgrade sagemaker" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enable SageMaker Experiments\n", "First, let's create an experiment so we can track this job and all of our assets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import time\n", "from smexperiments.experiment import Experiment\n", "\n", "sm = boto3.client('sagemaker')\n", "\n", "experiment_name = f\"xray-object-detection-{int(time.time())}\"\n", "experiment = Experiment.create(experiment_name=experiment_name, \n", " description=\"Training an object detection model on XRay data\", \n", " sagemaker_boto_client=sm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can open the Experiments tab on the lefthand side, and you should see a new experiment!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# Download and Prepare NIH Images\n", "Next, we are going to access those 1000 images from the NIH dataset. Please ask your AWS SA for a link to the dataset. This is a timed presign url, so make sure to use it quickly!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Change to URL sent by Workshop leader\n", "DATA_SOURCE = 'https://nih-xray-data.s3.amazonaws.com/compressed-image-file/images.tar.gz?AWSAccessKeyId=ASIASUWHP42B3EFFCQIY&Signature=hPONsLLap26VOe6WNnBe0NGea6I%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEMX%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJGMEQCIAq%2Bx85gM9%2BMmWykkKE5qEJFeQOm5EDYUddVPwyrFtbtAiBqn3Sh%2FuRX3%2FS24Z7mRL670ouRjUrHqPJYmzQ1WULeuCrFAwgdEAAaDDE4MTg4MDc0MzU1NSIMPZ72cjbIbsy9QqicKqIDWl75J2dA07YuBGAJ6dUNuonWmfujluzIHeuWtIVL4GWi%2FnRESP3MNj1ddB2RbSdrpIKcE7PNhaPU99Ih3ld96hP%2BVYJ1DQbsh12zpvaPVrE2oYnG2TcTSAWqsaX5yIhUtG2VHyZui48aK8MihEVFhXtXHYLkRIE60hUskjMOrjbG%2Bm5MxFIkGyX%2BT69aI9wRlKs%2BN82XtstmsqJcMdgcdKls6KtGNJ5uY8poRWtEqNAVRAWcePMJbZnhmJm%2Bleanr52CXc6sp9QNdWhBrUZbi9rgIYYDR2nsoJVyhP0H%2FE0Yqc1bYaYgRRJhciXmlMSq%2BXdArh5awwnI%2Fpk8XhIQr4ouzzxV8nRTk4yK54JuyvoeviF7utmOz96eXCk%2BkNqqhHAlf%2FtR%2FkaOX5c%2BkHMa23Qr0X2BTxqFGwbste2ANPjRZc9DHmFP0hSDiWv0MgRtfMZxaLEkrpkYURE2Fy749TjdxzrxLPvdxi5R2UP4hCImAcwQvnF0VMW22B9oxJIvvfFBwVwpYN2Am2AmqJ7TTDxuLxnaUvyX0N4u8UL%2FtgcSjzDFgOL8BTrHAe7nptBN3%2BitZjpZ4dIGMC0bhHJ1rP2y4WZv6zGLcjGmRhIWllldzkrK7Ij2I2O8zPVbORGKB9wUdJZ31JaWiuO5o0cFyOBopLV1VjyN%2B4BwvLExUJWUXlL3QbNfaQD8GxfxZDVCTpK5T2BRJsID0s2dIcZvjlNiiKZYzSWh%2BZx52Ixs6IPcy9Gh1EfFMgpeAKK7rl1AaeWFF4VCEW1ixMyoa16U0CP1KfA2eXlhw3SsJrtQ25gx7OaXT3wywviml1lLT6Fu9pM%3D&Expires=1604434685'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sagemaker\n", "\n", "BUCKET = sagemaker.Session().default_bucket()\n", "\n", "PREFIX='hcls-xray' #Change to your directory/prefix\n", "\n", "IMAGE_FILE = 'image_data.tar.gz' #do not edit\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#first we download the compressed data from the bucket\n", "!wget \"$DATA_SOURCE\" -O image_data.tar.gz\n", "#then we will decompress the images to be used for training and validation\n", "!tar -xf image_data.tar.gz\n", "#now copy the data to S3\n", "!aws s3 cp --recursive --quiet images s3://$BUCKET/$PREFIX/image_data/\n", "print('Files uploaded to S3')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_new_manifest_file(input_file, output_file):\n", "\n", " template_manifest=open(input_file).readlines()\n", " output_manifest=[]\n", " for i in template_manifest:\n", " i=i.strip()\n", " i=i.replace('BUCKET',BUCKET) #have the manifest point to the actual bucket each individual is using for the workshop\n", " i=i.replace('PREFIX',PREFIX) #have the manifest point to the actual bucket each individual is using for the workshop\n", "\n", " output_manifest.append(i)\n", " f_out=open(output_file,'w')\n", " print(*output_manifest,file=f_out,sep=\"\\n\")\n", " f_out.close()\n", " \n", " return output_manifest\n", " \n", "output_manifest = create_new_manifest_file('template.manifest', 'output.manifest')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's inspect the contents of this labeled manfiest file. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "json.loads(output_manifest[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's copy the manifest file out to S3." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws s3 cp output.manifest s3://$BUCKET/$PREFIX/output.manifest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# Analyze Local Data \n", "Next, let's open up a few of those image files to make sure we know what we're dealing with. Remember, these are picking up after someone has finished labelling them with SageMaker Ground Truth! " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "import os\n", "from collections import namedtuple\n", "from collections import defaultdict\n", "from collections import Counter\n", "import itertools\n", "import json\n", "import random\n", "import time\n", "import imageio\n", "import numpy as np\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "from matplotlib.backends.backend_pdf import PdfPages\n", "from sklearn.metrics import confusion_matrix\n", "import boto3\n", "import sagemaker\n", "from urllib.parse import urlparse\n", "\n", "fids2bbs = defaultdict(list)\n", "\n", "from ground_truth_od import group_miou\n", "from ground_truth_od import BoundingBox, WorkerBoundingBox, \\\n", " GroundTruthBox, BoxedImage" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "EXP_NAME = 'nih-chest-xrays' #where to put experiment data\n", "\n", "OUTPUT_MANIFEST_S3=f'{BUCKET}/{PREFIX}/output.manifest' #location of the manifest file in S3\n", "IMAGE_DATA_S3=f'{BUCKET}/{PREFIX}' #location of image data in s3\n", "\n", "print('S3 Location of Manifest File:')\n", "print(OUTPUT_MANIFEST_S3)\n", "\n", "print('S3 Location of Image Data:')\n", "print(IMAGE_DATA_S3)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we will load and preprocess the manifest file. This manifest file is in fact an **augmented manifest file**, and also contains the location of the throat of the patient in the xray" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def read_data(file_name):\n", " !mkdir -p data #make a data directory if it does not exist\n", " with open(file_name, 'r') as f:\n", " output = [json.loads(line.strip()) for line in f.readlines()]\n", "\n", " return output\n", "\n", "def write_manifest(file_name):\n", " f_out=open(file_name,'w')\n", " for i in output_clean:\n", " print(json.dumps(i),file=f_out,sep=\"\\n\")\n", " f_out.close()\n", "\n", "def filter_manifest(file_name):\n", " 'remove any images that are not labeled.'\n", " \n", " output = read_data(file_name)\n", " \n", " output_clean =[]\n", " \n", " metadata_info ='xray-labeling-job-clone-clone-full-clone-metadata' #change depending on the job\n", " \n", " for the_sample in output:\n", " z = the_sample[metadata_info]['creation-date']\n", " output_clean.append(the_sample)\n", "\n", " print(f'Number of images without errors {len(output_clean)}')\n", " \n", " return(output_clean)\n", "\n", "\n", "output_clean = filter_manifest('output.manifest')\n", "write_manifest('data/output_manifest_clean.manifest')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_groundtruth_labels(output):\n", " # Create data arrays.\n", " img_uris = [None] * len(output)\n", " confidences = [None] * len(output)\n", " groundtruth_labels = [None] * len(output)\n", " human = np.zeros(len(output))\n", "\n", " # Find the job name contained within the manifest file manifest corresponds to.\n", " keys = list(output[0].keys())\n", " metakey = keys[np.where([('-metadata' in k) for k in keys])[0][0]]\n", " jobname = metakey[:-9]\n", "\n", " # Extract the data.\n", " for datum_id, datum in enumerate(output):\n", " img_uris[datum_id] = datum['source-ref']\n", " groundtruth_labels[datum_id] = str(datum[metakey]['class-map'])\n", " confidences[datum_id] = datum[metakey]['objects']\n", " human[datum_id] = int(datum[metakey]['human-annotated'] == 'yes')\n", " groundtruth_labels = np.array(groundtruth_labels)\n", " \n", " return groundtruth_labels\n", "\n", "groundtruth_labels = get_groundtruth_labels(output_clean)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "groundtruth_labels[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def map_images_to_labels(output):\n", "\n", " # Create data arrays.\n", " confidences = np.zeros(len(output))\n", "\n", " # Find the job name the manifest corresponds to.\n", " keys = list(output[0].keys())\n", " metakey = keys[np.where([('-metadata' in k) for k in keys])[0][0]]\n", " jobname = metakey[:-9]\n", " output_images = []\n", " consolidated_boxes = []\n", "\n", " # Extract the data.\n", " for datum_id, datum in enumerate(output):\n", " image_size = datum[jobname]['image_size'][0]\n", " box_annotations = datum[jobname]['annotations']\n", " uri = datum['source-ref']\n", " box_confidences = datum[metakey]['objects']\n", " human = int(datum[metakey]['human-annotated'] == 'yes')\n", "\n", " # Make image object.\n", " image = BoxedImage(id=datum_id, size=image_size,\n", " uri=uri)\n", "\n", " # Create bounding boxes for image.\n", " boxes = []\n", " for i, annotation in enumerate(box_annotations):\n", " box = BoundingBox(image_id=datum_id, boxdata=annotation)\n", " box.confidence = box_confidences[i]['confidence']\n", " box.image = image\n", " box.human = human\n", " boxes.append(box)\n", " consolidated_boxes.append(box)\n", " image.consolidated_boxes = boxes\n", "\n", " # Store if the image is human labeled.\n", " image.human = human\n", "\n", " # Retrieve ground truth boxes for the image.\n", " oid_boxes_data = fids2bbs[image.oid_id]\n", " gt_boxes = []\n", " for data in oid_boxes_data:\n", " gt_box = GroundTruthBox(image_id=datum_id, oiddata=data,\n", " image=image)\n", " gt_boxes.append(gt_box)\n", " image.gt_boxes = gt_boxes\n", "\n", " output_images.append(image)\n", " \n", " return output_images, jobname\n", "\n", "output_images, jobname = map_images_to_labels(output_clean)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(output_clean)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_bounding_boxes(output_clean, output_images):\n", " # Iterate through the json files, creating bounding box objects.\n", " \n", " output_with_answers=[] #only include images with the answers in them\n", " output_images_with_answers=[]\n", "\n", " output_with_no_answers=[]\n", " output_images_with_no_answers=[]\n", "\n", " for i in range(0,len(output_clean)):\n", " try:\n", " #images with class_id have answers in them\n", " x = output_clean[i][jobname]['annotations'][0]['class_id']\n", "\n", " output_with_answers.append(output_clean[i])\n", " output_images_with_answers.append(output_images[i])\n", " except:\n", " output_with_no_answers.append(output_clean[i])\n", " output_images_with_no_answers.append(output_images[i])\n", " pass\n", "\n", " #add the box to the image\n", " for i in range(0,len(output_with_answers)):\n", " the_output=output_with_answers[i]\n", " the_image=output_images_with_answers[i]\n", " answers=the_output[jobname]['annotations']\n", " box=WorkerBoundingBox(image_id=i,boxdata=answers[0],worker_id='anon-worker')\n", " box.image=the_image\n", " the_image.worker_boxes.append(box)\n", "\n", " print(f\"Number of images with labeled trachea/throat: {len(output_images_with_answers)}\")\n", " print(f\"Number of images without labeled trachea/throat: {len(output_with_no_answers)}\")\n", " \n", " return output_with_answers, output_images_with_answers\n", " \n", "output_with_answers, output_images_with_answers = create_bounding_boxes(output_clean, output_images)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def download_images(output_images_with_answers, image_dir = 'data', dataset_size = 5):\n", " image_subset = np.random.choice(output_images_with_answers, dataset_size, replace=False)\n", "\n", " for img in image_subset:\n", " target_fname = os.path.join(\n", " image_dir, img.uri.split('/')[-1])\n", " if not os.path.isfile(target_fname):\n", " !aws s3 cp {img.uri} {target_fname}\n", " \n", " return image_subset\n", " \n", "image_subset = download_images(output_images_with_answers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we're going to plot the bounding boxes on the XRay data. Your plot should look something like this!\n", "\n", "![](images/gt_label_output.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def visualize_images(image_subset, image_dir = 'data', n_show = 5):\n", " \n", " # Find human and auto-labeled images in the subset.\n", " human_labeled_subset = [img for img in image_subset if img.human]\n", "\n", " # Show examples of each\n", " fig, axes = plt.subplots(n_show, 2, figsize=(9, 2*n_show),\n", " facecolor='white', dpi=100)\n", " fig.suptitle('Human-labeled examples', fontsize=24)\n", " axes[0, 0].set_title('Worker labels', fontsize=14)\n", " axes[0, 1].set_title('Consolidated label', fontsize=14)\n", " for row, img in enumerate(np.random.choice(human_labeled_subset, size=n_show)):\n", " img.download(image_dir)\n", " img.plot_worker_bbs(axes[row, 0])\n", " img.plot_consolidated_bbs(axes[row, 1])\n", "\n", "visualize_images(image_subset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Note that in this context we only had one labeler, so the consolidated label will be identical to the worker label)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Split Data and Copy to S3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def split_data(output):\n", " \n", " # Shuffle output in place.\n", " np.random.shuffle(output)\n", "\n", " dataset_size = len(output)\n", " train_test_split_index = round(dataset_size*0.9)\n", "\n", " train_data = output[:train_test_split_index]\n", " test_data = output[train_test_split_index:]\n", "\n", " train_test_split_index_2 = round(len(test_data)*0.5)\n", " validation_data=test_data[:train_test_split_index_2]\n", " hold_out=test_data[train_test_split_index_2:]\n", " \n", " return train_data, validation_data, hold_out\n", " \n", "train_data, validation_data, hold_out = split_data(output_with_answers)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "num_training_samples = 0\n", "with open('data/train.manifest', 'w') as f:\n", " for line in train_data:\n", " f.write(json.dumps(line))\n", " f.write('\\n')\n", " num_training_samples += 1\n", "\n", "with open('data/validation.manifest', 'w') as f:\n", " for line in validation_data:\n", " f.write(json.dumps(line))\n", " f.write('\\n')\n", "with open('data/hold_out.manifest', 'w') as f:\n", " for line in hold_out:\n", " f.write(json.dumps(line))\n", " f.write('\\n')\n", "\n", "print(f'Training Data Set Size: {len(train_data)}')\n", "print(f'Validatation Data Set Size: {len(validation_data)}')\n", "print(f'Hold Out Data Set Size: {len(hold_out)}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def copy_to_s3(bucket, prefix, expr_name):\n", " !aws s3 cp data/train.manifest s3://{bucket}/{prefix}/{expr_name}/train.manifest\n", " !aws s3 cp data/validation.manifest s3://{bucket}/{prefix}/{expr_name}/validation.manifest\n", " !aws s3 cp data/hold_out.manifest s3://{bucket}/{prefix}/{expr_name}/hold_out.manifest\n", " \n", "copy_to_s3(BUCKET, PREFIX, EXP_NAME)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Train on SageMaker & Track with Experiments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a trial within the experiment that we can associate this job with. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from smexperiments.trial import Trial\n", "\n", "trial_name = f\"built-in-object-detection-{int(time.time())}\"\n", "\n", "trial = Trial.create(trial_name = trial_name,\n", " experiment_name = experiment_name,\n", " sagemaker_boto_client = sm)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import re\n", "from sagemaker import get_execution_role\n", "from time import gmtime, strftime\n", "\n", "role = get_execution_role()\n", "sess = sagemaker.Session()\n", "s3 = boto3.resource('s3')\n", "\n", "training_image = sagemaker.image_uris.retrieve('object-detection', boto3.Session().region_name, version='latest')\n", "augmented_manifest_filename_train = 'train.manifest'\n", "augmented_manifest_filename_validation = 'validation.manifest'\n", "bucket_name = BUCKET\n", "s3_prefix = EXP_NAME\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Defines paths for use in the training job request.\n", "s3_train_data_path = 's3://{}/{}/{}/train.manifest'.format(BUCKET, PREFIX, EXP_NAME)\n", "s3_validation_data_path = 's3://{}/{}/{}/validation.manifest'.format(BUCKET, PREFIX, EXP_NAME )\n", "s3_debug_path = \"s3://{}/{}/{}/debug-hook-data\".format(BUCKET, PREFIX, EXP_NAME)\n", "s3_output_path = f's3://{BUCKET}/{PREFIX}/{EXP_NAME}/output'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "augmented_manifest_s3_key = s3_train_data_path.split(bucket_name)[1][1:]\n", "s3_obj = s3.Object(bucket_name, augmented_manifest_s3_key)\n", "augmented_manifest = s3_obj.get()['Body'].read().decode('utf-8')\n", "augmented_manifest_lines = augmented_manifest.split('\\n')\n", "num_training_samples = len(augmented_manifest_lines) # Compute number of training samples for use in training job request.\n", "\n", "# Determine the keys in the training manifest and exclude the meta data from the labling job.\n", "attribute_names = list(json.loads(augmented_manifest_lines[0]).keys())\n", "attribute_names = [attrib for attrib in attribute_names if 'meta' not in attrib]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create unique job name\n", "job_name_prefix = EXP_NAME\n", "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n", "model_job_name = job_name_prefix + timestamp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create unique job name\n", "job_name_prefix = EXP_NAME\n", "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n", "model_job_name = job_name_prefix + timestamp\n", "\n", "# set up your training job using boto3 API syntax\n", "training_params = \\\n", " {\n", " \"AlgorithmSpecification\": {\n", " # NB. This is one of the named constants defined in the first cell.\n", " \"TrainingImage\": training_image,\n", " \"TrainingInputMode\": \"Pipe\"\n", " },\n", " \"RoleArn\": role,\n", " \"OutputDataConfig\": {\n", " \"S3OutputPath\": s3_output_path\n", " },\n", " \"ResourceConfig\": {\n", " \"InstanceCount\": 1,\n", " \"InstanceType\": \"ml.p3.2xlarge\", #Use a GPU backed instance\n", " \"VolumeSizeInGB\": 50\n", " },\n", " \"TrainingJobName\": model_job_name,\n", " \"HyperParameters\": { # NB. These hyperparameters are at the user's discretion and are beyond the scope of this demo.\n", " \"base_network\": \"resnet-50\",\n", " \"use_pretrained_model\": \"1\",\n", " \"num_classes\": \"1\",\n", " \"mini_batch_size\": \"10\",\n", " \"epochs\": \"30\",\n", " \"learning_rate\": \"0.001\",\n", " \"lr_scheduler_step\": \"\",\n", " \"lr_scheduler_factor\": \"0.1\",\n", " \"optimizer\": \"sgd\",\n", " \"momentum\": \"0.9\",\n", " \"weight_decay\": \"0.0005\",\n", " \"overlap_threshold\": \"0.5\",\n", " \"nms_threshold\": \"0.45\",\n", " \"image_shape\": \"300\",\n", " \"label_width\": \"350\",\n", " \"num_training_samples\": str(num_training_samples)\n", " },\n", " \"StoppingCondition\": {\n", " \"MaxRuntimeInSeconds\": 86400,\n", " \"MaxWaitTimeInSeconds\":259200,\n", "\n", " },\n", " \"EnableManagedSpotTraining\" :True,\n", " \"InputDataConfig\": [\n", " {\n", " \"ChannelName\": \"train\",\n", " \"DataSource\": {\n", " \"S3DataSource\": {\n", " \"S3DataType\": \"AugmentedManifestFile\", # NB. Augmented Manifest\n", " \"S3Uri\": s3_train_data_path,\n", " \"S3DataDistributionType\": \"FullyReplicated\",\n", " # NB. This must correspond to the JSON field names in your augmented manifest.\n", " \"AttributeNames\": attribute_names\n", " }\n", " },\n", " \"ContentType\": \"application/x-recordio\",\n", " \"RecordWrapperType\": \"RecordIO\",\n", " \"CompressionType\": \"None\"\n", " },\n", " {\n", " \"ChannelName\": \"validation\",\n", " \"DataSource\": {\n", " \"S3DataSource\": {\n", " \"S3DataType\": \"AugmentedManifestFile\", # NB. Augmented Manifest\n", " \"S3Uri\": s3_validation_data_path,\n", " \"S3DataDistributionType\": \"FullyReplicated\",\n", " # NB. This must correspond to the JSON field names in your augmented manifest.\n", " \"AttributeNames\": attribute_names\n", " }\n", " },\n", " \"ContentType\": \"application/x-recordio\",\n", " \"RecordWrapperType\": \"RecordIO\",\n", " \"CompressionType\": \"None\"\n", " }\n", " ],\n", " \"ExperimentConfig\": {\n", " 'ExperimentName': experiment_name,\n", " 'TrialName': trial_name,\n", " 'TrialComponentDisplayName': 'Training'\n", " },\n", " \"DebugHookConfig\":{\n", " 'S3OutputPath': s3_debug_path,\n", " 'CollectionConfigurations': [\n", " {\n", " 'CollectionName': 'all_tensors',\n", " 'CollectionParameters': {\n", " 'include_regex': '.*',\n", " \"save_steps\":\"1, 2, 3\"\n", " }\n", " },\n", " ]\n", " },\n", " }\n", "\n", "print('Training job name: {}'.format(model_job_name))\n", "print('\\nInput Data Location: {}'.format(\n", " training_params['InputDataConfig'][0]['DataSource']['S3DataSource']))\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "client = boto3.client(service_name='sagemaker')\n", "client.create_training_job(**training_params)\n", "\n", "# Confirm that the training job has started\n", "status = client.describe_training_job(TrainingJobName=model_job_name)['TrainingJobStatus']\n", "print(f'Training job name: {model_job_name}')\n", "print('Training job current status: {}'.format(status))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the default p3.2xlarge as noted here, this job should take about an hour to train. While that's happening, circle back and step through the code again. Make sure you really understood how everything is coming together.\n", "\n", "### Monitor Job Progress using Experiments\n", "If you are running on Studio, you should be able to open up the Experiments tab and see the status of your job." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Convert Images into RecordIO\n", "As is well documented, training deep learning models can take a long time. One way to speed this up is by using an optimized file format, such as recordIO. Let's convert our pngs into recordIO for the next step. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install mxnet\n", "!pip install opencv-python-headless" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# point the first argument to the location of your local png image folder\n", "# running the script with this command will create a lst file, listing all of your images for the train set\n", "!python im2rec.py --root \"/root/images\" --prefix \"train\" --exts '.png' --chunks 1 --create_list 'Yes'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# running this file will create a train.idx and train.rec file\n", "!python im2rec.py --root \"/root/images\" --prefix '/root/amazon-sagemaker-architecting-for-ml-hcls/Starter Notebooks/Advanced Data Science - XRay Analysis/' --exts '.png' --chunks 1 --create_list 'no'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws s3 cp train.idx s3://$BUCKET/$PREFIX/recio-files/\n", "!aws s3 cp train.rec s3://$BUCKET/$PREFIX/recio-files/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# Bring your own Model and Train on SageMaker with Script Mode\n", "Once your job finishes, you are welcome to explore bringing your own script into SageMaker. Below we're demonstrating using GluonCV to bring a custom ssd model using the MXNet container. This is nice because it's coming with it's own pre-trained model! \n", "- https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_neo_compilation_jobs/gluoncv_ssd_mobilenet/gluoncv_ssd_mobilenet_neo.ipynb\n", "\n", "Notice that by using script mode we automatically get access to debugger, which will give us the ability to visualize our neural network locally. Let's get it up and running!\n", "\n", "If you prefer, you are welcome to bring your own preferred SSD model instead.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile src/requirements.txt\n", "\n", "gluoncv" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.mxnet import MXNet\n", "import sagemaker\n", "from sagemaker.debugger import DebuggerHookConfig, CollectionConfig\n", "\n", "role = sagemaker.get_execution_role()\n", "\n", "ssd_estimator = MXNet(entry_point='ssd_entry_point.py',\n", " source_dir = 'src',\n", " role=role,\n", " output_path=s3_output_path,\n", " instance_count=1,\n", " instance_type='ml.p3.8xlarge',\n", " framework_version='1.6',\n", " py_version='py3',\n", " use_spot_instances=True,\n", " max_wait = (8600*3),\n", " max_run = 8600,\n", " distribution={'parameter_server': {'enabled': True}},\n", " hyperparameters={'epochs': 1, 'data-shape': 350},\n", " debugger_hook_config = DebuggerHookConfig(\n", " s3_output_path = s3_debug_path,\n", " collection_configs = [CollectionConfig(name='all_tensors',\n", " parameters={'include_regex':'.*', 'save_steps':'1,2,3'})])) \n", "\n", "ssd_estimator.fit(inputs = {'train': 's3://{}/{}/recio-files'.format(BUCKET, PREFIX)}, \n", " experiment_config = {'ExperimentName': experiment_name,\n", " 'TrialName': 'xray-recordio-gluoncv', 'TrialComponentDisplayName': 'Training'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might discover an issue here - GluonCV is stuggling to find the labels from our bounding boxes. Can you figure out how to supply them correctly? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# Visualize Model with SageMaker Debugger\n", "Now, we're going to use SageMaker Debugger to build a TensorPlot of our model!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](images/tensorplot.gif)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws s3 sync {s3_debug_path} ." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import tensor_plot \n", "\n", "visualization = tensor_plot.TensorPlot(\n", " regex=\".*relu_output\", \n", " path=folder_name,\n", " steps=10, \n", " batch_sample_id=0,\n", " color_channel = 1,\n", " title=\"Relu outputs\",\n", " label=\".*sequential0_input_0\",\n", " prediction=\".*sequential0_output_0\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we plot too many layers, it can crash the notebook. If you encounter performance or out of memory issues, then either try to reduce the layers to plot by changing the regex or run this Notebook in JupyterLab instead of Jupyter.\n", "\n", "In the below cell we vizualize outputs of all layers, including final classification. Please note that because training job ran only for a few epochs classification accuracy is not high." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "visualization.fig.show(renderer=\"iframe\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# Extentions\n", "If you make it here with spare time, why not try to bring another model into SageMaker? Or set up the automatic model tuner on your own script file? Or optimize your model for deployment using SageMaker neo? \n", "\n", "You can also deploy some of these models and start to get predictions from them using `model.deploy()`.\n", "\n", "Feel free to use the rest of your time to build something awesome. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }