{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Classify Skin Lesion Images" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: This notebook was developed using the `Data Science 3.0` image." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Background" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "According to [The Skin Cancer Foundation](https://www.skincancer.org/skin-cancer-information/skin-cancer-facts/), 1 in 5 Americans will develop skin cancer by the age of 70 and more than 2 people die of skin cancer in the U.S. every hour. Worldwide, [melanoma of skin is the 17th most common form of cancer](https://www.wcrf.org/cancer-trends/skin-cancer-statistics/). However, because non-melanoma skin cancer is often excluded from official statistics, it is significantly under-reported. In Brazil, [according to the Brazilian Cancer Institute (INCA)](https://www.sciencedirect.com/science/article/pii/S0010482519304019?via%3Dihub#b6), skin cancer accounts for 33% of all cancer diagnoses in the country. Developing an algorithm to distinguish skin cancer from other lesions would allow communities to make the best use of limited healthcare resources.\n", "\n", "In this lab, you will use Amazon SageMaker to build, train, and deploy a classification model based on the [PAD-UFES-20 skin lesion dataset](https://data.mendeley.com/datasets/zr7vgbcyr2/1).\n", "\n", "```\n", "Pacheco, Andre G. C.; Lima, Gustavo R.; Salomão, Amanda S.; Krohling, Breno; Biral, Igor P.; de Angelo, Gabriel G. ; Alves Jr, Fábio C. R. ; Esgario, José G. M.; Simora, Alana C. ; Castro, Pedro B. C. ; Rodrigues, Felipe B.; Frasson, Patricia H. L. ; Krohling, Renato A.; Knidel, Helder ; Santos, Maria C. S. ; Espírito Santo, Rachel B.; Macedo, Telma L. S. G.; Canuto, Tania R. P. ; de Barros, Luíz F. S. (2020), “PAD-UFES-20: a skin lesion dataset composed of patient data and clinical images collected from smartphones”, Mendeley Data, V1, doi: 10.17632/zr7vgbcyr2.1\n", "```\n", "\n", "The model will distinguish 6 skin lesion classes from cell phone images.\n", "\n", "- Skin Cancers \n", " - BCC: Basal Cell Carcinoma \n", " - MEL: Melanoma \n", " - SCC: Squamous Cell Carcinoma and Bowen’s disease \n", "- Skin Diseases \n", " - ACK: Actinic Keratosis \n", " - NEV: Nevus \n", " - SEK: Seborrheic Keratosis \n", " \n", " ![Skin Lesions](img/lesions.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Build" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1. Import Python packages and create clients" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -U pip -q -q\n", "%pip install -U sagemaker sagemaker-experiments -q -q" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import json\n", "from matplotlib import pyplot as plt\n", "import matplotlib.image as mpimg\n", "import numpy as np\n", "import os\n", "import pandas as pd\n", "import sagemaker\n", "from sagemaker import image_uris, model_uris, script_uris, hyperparameters\n", "from sagemaker.sklearn.estimator import SKLearn\n", "from sklearn.metrics import precision_recall_fscore_support\n", "import smexperiments.experiment\n", "from time import strftime, sleep\n", "\n", "boto_session = boto3.session.Session()\n", "sagemaker_session = sagemaker.session.Session(boto_session)\n", "sagemaker_boto_client = boto_session.client(\"sagemaker\")\n", "\n", "REGION = boto_session.region_name\n", "print(f\"AWS Region is {REGION}\")\n", "\n", "ACCOUNT_ID = boto_session.client(\"sts\").get_caller_identity().get(\"Account\")\n", "print(f\"AWS Account ID is {ACCOUNT_ID}\")\n", "\n", "SAGEMAKER_EXECUTION_ROLE = sagemaker.session.get_execution_role(sagemaker_session)\n", "print(f\"Assumed SageMaker role is {SAGEMAKER_EXECUTION_ROLE}\")\n", "\n", "S3_BUCKET = sagemaker_session.default_bucket()\n", "S3_PREFIX = 'skin-lesion-classification-lab'\n", "print(f\"Default S3 location is s3://{S3_BUCKET}/{S3_PREFIX}\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a new experiment\n", "skin_lesion_experiment = smexperiments.experiment.Experiment.create(\n", " description=\"Classify skin lesions\",\n", " experiment_name=f\"Classify-skin-lesions-{strftime('%Y-%m-%d-%H-%M-%S')}\",\n", " sagemaker_boto_client=sagemaker_boto_client,\n", " tags=[{\"Key\": \"Creator\", \"Value\": \"arosalez\"}],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2. Create Processing Script" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile scripts/processing/preprocess.py\n", "import boto3\n", "import logging\n", "import os\n", "import pandas as pd\n", "import shutil\n", "from sklearn.model_selection import train_test_split\n", "\n", "logging.getLogger().setLevel(logging.INFO)\n", "\n", "# Define data source and other parameters.\n", "SRC_BUCKET = 'prod-dcd-datasets-cache-zipfiles'\n", "SRC_KEY = 'zr7vgbcyr2-1.zip'\n", "DATA_DIR = '/opt/ml/processing'\n", "\n", "# Download raw data zip from https://data.mendeley.com/datasets/zr7vgbcyr2/1\n", "logging.info(f'Downloading {SRC_KEY}')\n", "s3_boto_client = boto3.client(\"s3\")\n", "os.makedirs(f'{DATA_DIR}/input', exist_ok=True)\n", "s3_boto_client.download_file(SRC_BUCKET, SRC_KEY, f'{DATA_DIR}/input/raw.zip')\n", "\n", "# Unzip data\n", "logging.info(f'Unpacking {SRC_KEY}')\n", "shutil.unpack_archive(f'{DATA_DIR}/input/raw.zip', f'{DATA_DIR}/input')\n", "for i in range(1,4): \n", " logging.info(f'Unpacking imgs_part_{i}.zip')\n", " shutil.unpack_archive(f'{DATA_DIR}/input/images/imgs_part_{i}.zip', f'{DATA_DIR}/input/images')\n", " logging.info(f'Copying {DATA_DIR}/input/images/imgs_part_{i} to {DATA_DIR}/input/images/all_imgs')\n", " shutil.copytree(f'{DATA_DIR}/input/images/imgs_part_{i}', f'{DATA_DIR}/input/images/all_imgs', dirs_exist_ok=True)\n", "\n", "# Split data into training, validation, and test sets\n", "logging.info(f'Creating training-validation data split')\n", "metadata = pd.read_csv(f'{DATA_DIR}/input/metadata.csv')\n", "train_df, test_df = train_test_split(metadata, test_size=0.2, stratify=metadata['diagnostic'])\n", "train_df, val_df = train_test_split(train_df, test_size=0.05, stratify=train_df['diagnostic'])\n", "\n", "# Copy training data into folders for training\n", "logging.info(f'Copying training data to {DATA_DIR}/output/train')\n", "os.makedirs(f\"{DATA_DIR}/output/train\", exist_ok=True)\n", "train_df.to_csv(f'{DATA_DIR}/output/train/metadata.csv', index=False)\n", "for _,row in train_df.iterrows():\n", " src = f\"{DATA_DIR}/input/images/all_imgs/{row['img_id']}\"\n", " os.makedirs(f\"{DATA_DIR}/output/train/{row['diagnostic']}\", exist_ok=True)\n", " dest = f\"{DATA_DIR}/output/train/{row['diagnostic']}/{row['img_id']}\"\n", " shutil.copy2(src, dest) \n", " \n", "# Copy validation data into folders for training\n", "logging.info(f'Copying validation data to {DATA_DIR}/output/val')\n", "os.makedirs(f\"{DATA_DIR}/output/val\", exist_ok=True)\n", "train_df.to_csv(f'{DATA_DIR}/output/val/metadata.csv', index=False)\n", "for _,row in val_df.iterrows():\n", " src = f\"{DATA_DIR}/input/images/all_imgs/{row['img_id']}\"\n", " os.makedirs(f\"{DATA_DIR}/output/val/{row['diagnostic']}\", exist_ok=True)\n", " dest = f\"{DATA_DIR}/output/val/{row['diagnostic']}/{row['img_id']}\"\n", " shutil.copy2(src, dest)\n", " \n", "# Copy test data into folders for evaluation\n", "logging.info(f'Copying test data to {DATA_DIR}/output/test')\n", "os.makedirs(f\"{DATA_DIR}/output/test\", exist_ok=True)\n", "train_df.to_csv(f'{DATA_DIR}/output/test/metadata.csv', index=False)\n", "for _,row in val_df.iterrows():\n", " src = f\"{DATA_DIR}/input/images/all_imgs/{row['img_id']}\"\n", " os.makedirs(f\"{DATA_DIR}/output/test/{row['diagnostic']}\", exist_ok=True)\n", " dest = f\"{DATA_DIR}/output/test/{row['diagnostic']}/{row['img_id']}\"\n", " shutil.copy2(src, dest)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4. Define a Processor" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pre_processor = sagemaker.processing.FrameworkProcessor(\n", " estimator_cls=SKLearn,\n", " framework_version='1.0-1',\n", " command=['python'],\n", " role=SAGEMAKER_EXECUTION_ROLE,\n", " instance_count=1,\n", " instance_type='ml.m5.xlarge'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.5. Submit a Processing Run" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pre_processor.run(\n", " job_name=f\"skin-lesion-image-processing-job-{strftime('%Y-%m-%d-%H-%M-%S')}\",\n", " code=\"scripts/processing/preprocess.py\",\n", " outputs=[\n", " sagemaker.processing.ProcessingOutput(\n", " output_name=\"train\",\n", " source=\"/opt/ml/processing/output/train\",\n", " destination=f\"s3://{S3_BUCKET}/{S3_PREFIX}/data/train/\",\n", " ),\n", " sagemaker.processing.ProcessingOutput(\n", " output_name=\"validation\",\n", " source=\"/opt/ml/processing/output/val\",\n", " destination=f\"s3://{S3_BUCKET}/{S3_PREFIX}/data/val/\",\n", " ),\n", " sagemaker.processing.ProcessingOutput(\n", " output_name=\"test\",\n", " source=\"/opt/ml/processing/output/test\",\n", " destination=f\"s3://{S3_BUCKET}/{S3_PREFIX}/data/test/\",\n", " ),\n", " ],\n", " experiment_config={\n", " \"ExperimentName\": skin_lesion_experiment.experiment_name,\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.6. Download and Examine Metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if pre_processor.latest_job.describe().get('ProcessingJobStatus') == 'Completed':\n", " sagemaker_session.download_data(\n", " 'data',\n", " bucket=S3_BUCKET,\n", " key_prefix=f\"{S3_PREFIX}/data/train/metadata.csv\"\n", " )\n", " metadata = pd.read_csv('data/metadata.csv')\n", " \n", " display(metadata.head())\n", " display(metadata.describe(include=['object']))\n", " display(metadata.describe(include=[np.number]))\n", "\n", " plt.style.use('default')\n", " metadata['diagnostic'].value_counts().plot(kind='bar', color=\"#003181\")\n", " plt.title(\"Diagnosis Value Counts\")\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Train" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1. Create a New Experiment Trial" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mobilenet_trial = smexperiments.trial.Trial.create(\n", " experiment_name=skin_lesion_experiment.experiment_name,\n", " sagemaker_boto_client=sagemaker_boto_client,\n", " trial_name=f\"mobilenet-Trial-{strftime('%Y-%m-%d-%H-%M-%S')}\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2. Define an Estimator" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_id, model_version = \"tensorflow-ic-imagenet-mobilenet-v2-100-224-classification-4\", \"*\"\n", "training_instance_type = \"ml.p3.2xlarge\"\n", "mobilenet_job_name = f\"mobilenet-Training-Job\"\n", "\n", "# Retrieve the Docker image uri\n", "train_image_uri = image_uris.retrieve(\n", " model_id=model_id,\n", " model_version=model_version,\n", " image_scope=\"training\",\n", " instance_type=training_instance_type,\n", " region=None,\n", " framework=None)\n", "\n", "# Retrieve the training script uri\n", "train_source_uri = script_uris.retrieve(\n", " model_id=model_id, \n", " model_version=model_version, \n", " script_scope=\"training\")\n", "\n", "# Retrieve the pretrained model artifact uri for transfer learning\n", "train_model_uri = model_uris.retrieve(\n", " model_id=model_id, \n", " model_version=model_version, \n", " model_scope=\"training\")\n", "\n", "# Retrieve the default hyper-parameter values for fine-tuning the model\n", "hyperparameters = hyperparameters.retrieve_default(\n", " model_id=model_id, \n", " model_version=model_version\n", ")\n", "\n", "# Override default hyperparameters with custom values\n", "hyperparameters[\"epochs\"] = \"3\"\n", "hyperparameters[\"batch_size\"] = \"70\"\n", "hyperparameters[\"learning_rate\"] = 0.00010804583232953079\n", "hyperparameters[\"optimizer\"] = 'rmsprop'\n", "\n", "# Specify S3 urls for input data and output artifact\n", "training_dataset_s3_path = f\"s3://{S3_BUCKET}/{S3_PREFIX}/data/train\"\n", "validation_dataset_s3_path = f\"s3://{S3_BUCKET}/{S3_PREFIX}/data/val\"\n", "s3_output_location = f\"s3://{S3_BUCKET}/{S3_PREFIX}/output\"\n", "\n", "# Specify what metrics to look for in the logs\n", "training_metric_definitions = [\n", " {\"Name\": \"val_accuracy\", \"Regex\": \"- val_accuracy: ([0-9\\\\.]+)\"},\n", " {\"Name\": \"val_loss\", \"Regex\": \"- val_loss: ([0-9\\\\.]+)\"},\n", " {\"Name\": \"train_accuracy\", \"Regex\": \"- accuracy: ([0-9\\\\.]+)\"},\n", " {\"Name\": \"train_loss\", \"Regex\": \"- loss: ([0-9\\\\.]+)\"},\n", "]\n", "\n", "# Create estimator\n", "tf_ic_estimator = sagemaker.estimator.Estimator(\n", " base_job_name = mobilenet_job_name,\n", " role=SAGEMAKER_EXECUTION_ROLE,\n", " image_uri=train_image_uri,\n", " source_dir=train_source_uri,\n", " model_uri=train_model_uri,\n", " entry_point=\"transfer_learning.py\",\n", " instance_count=1,\n", " instance_type=training_instance_type,\n", " hyperparameters=hyperparameters,\n", " output_path=s3_output_location,\n", " enable_sagemaker_metrics=True,\n", " metric_definitions=training_metric_definitions,\n", " rules=[sagemaker.debugger.ProfilerRule.sagemaker(sagemaker.debugger.rule_configs.ProfilerReport())]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3. Submit a Training Job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf_ic_estimator.fit(\n", " inputs = {\n", " \"training\": training_dataset_s3_path,\n", " \"validation\": validation_dataset_s3_path\n", " }, \n", " experiment_config={\n", " \"TrialName\": mobilenet_trial.trial_name,\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.4. View Trial Details" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.analytics import ExperimentAnalytics\n", "\n", "trial_component_analytics = ExperimentAnalytics(\n", " sagemaker_session=sagemaker_session,\n", " experiment_name=skin_lesion_experiment.experiment_name,\n", " parameter_names=[\"SageMaker.InstanceType\"],\n", ")\n", "\n", "trial_component_analytics.dataframe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Deploy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![SageMaker Model Deployment Options](img/deployment_options.png \"SageMaker Model Deployment Options\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.1. Define a Model Using the Training Artifact" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_name = sagemaker.utils.name_from_base(f\"lesion-classifier-{model_id}\")\n", "inference_instance_type = \"ml.g4dn.xlarge\"\n", "\n", "# Get the inference docker container uri.\n", "deploy_image_uri = image_uris.retrieve(\n", " region=None,\n", " framework=None,\n", " image_scope=\"inference\",\n", " model_id=model_id,\n", " model_version=model_version,\n", " instance_type=inference_instance_type,\n", ")\n", "\n", "# Get the inference script uri\n", "deploy_source_uri = script_uris.retrieve(\n", " model_id=model_id, model_version=model_version, script_scope=\"inference\"\n", ")\n", "\n", "# Get the model artifact created by the training job\n", "model_data_uri = tf_ic_estimator.model_data\n", "\n", "# Define a SageMaker model using the training artifact\n", "model = sagemaker.model.Model(\n", " image_uri=deploy_image_uri,\n", " source_dir=deploy_source_uri,\n", " model_data=model_data_uri,\n", " entry_point=\"inference.py\",\n", " role=SAGEMAKER_EXECUTION_ROLE,\n", " predictor_cls=sagemaker.predictor.Predictor,\n", " name=endpoint_name,\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.2. Submit a Deployment Job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_predictor = model.deploy(\n", " initial_instance_count=1,\n", " instance_type=inference_instance_type,\n", " endpoint_name=endpoint_name,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.3. Download Test Images" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sagemaker_session.download_data(\n", " f\"data/test\",\n", " bucket=S3_BUCKET,\n", " key_prefix=f\"{S3_PREFIX}/data/test\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.4. Generate Test Predictions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "truth = []\n", "pred = []\n", "images = []\n", "\n", "diagnostic_codes = { \n", " 'BCC': 'Cancer: Basal Cell Carcinoma',\n", " 'MEL': 'Cancer: Melanoma',\n", " 'SCC': 'Cancer: Squamous Cell Carcinoma and Bowen’s disease',\n", " 'ACK': 'Disease: Actinic Keratosis',\n", " 'NEV': 'Disease: Nevus',\n", " 'SEK': 'Disease: Seborrheic Keratosis'\n", "}\n", "\n", "for true_diagnostic in ['ACK','BCC', 'MEL', 'NEV', 'SCC', 'SEK']:\n", " print(diagnostic_codes[true_diagnostic])\n", " filenames = []\n", " filenames = [name for name in os.listdir(f'data/test/{true_diagnostic}') if name.endswith('.png')][:25]\n", " n = len(filenames)\n", " cols = 5\n", " rows = int(np.ceil(n / cols))\n", " for filename in filenames:\n", " filename = f'data/test/{true_diagnostic}/{filename}'\n", " print(f'Predicting {filename}')\n", " with open(filename, \"rb\") as file:\n", " img = file.read()\n", " query_response = model_predictor.predict(\n", " img, {\"ContentType\": \"application/x-image\", \"Accept\": \"application/json;verbose\"}\n", " )\n", " model_predictions = json.loads(query_response)\n", " predicted_label = model_predictions[\"predicted_label\"]\n", " truth.append(true_diagnostic)\n", " pred.append(predicted_label) \n", " images.append(mpimg.imread(filename))\n", " sleep(0.1)\n", "\n", "precision, recall, f1, _ = precision_recall_fscore_support(truth, pred, average='weighted', zero_division=0)\n", "\n", "print(f'Inference Precision: {precision}')\n", "print(f'Inference Recall: {recall}')\n", "print(f'Inference F1-Score: {f1}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.5. View Sample Results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import random\n", "\n", "k = 12\n", "samples = random.sample(range(len(images)), 12)\n", "plt.figure()\n", "for i, sample in enumerate(samples):\n", " plt.subplot(3,4,i+1)\n", " plt.imshow(images[sample])\n", " plt.title(f'{truth[sample]}:{pred[sample]}')\n", " plt.axis('off')\n", " \n", "_ = plt.suptitle(\"Actual:Prediction\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.6. Delete Endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Workflow Automation with SageMaker Pipelines" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.workflow import pipeline\n", "from sagemaker.workflow.pipeline import Pipeline\n", "from sagemaker.workflow.pipeline_context import PipelineSession\n", "from sagemaker.workflow.steps import ProcessingStep, TrainingStep\n", "from sagemaker.inputs import TrainingInput" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.1. Define a Processing Step" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Update the pre_processor from above to use PipelineSession()\n", "pre_processor.sagemaker_session = PipelineSession()\n", "\n", "# Create a lazy initialization of the processor run that will wait to run during the pipeline execution\n", "processor_args = pre_processor.run(\n", " job_name=f\"skin-lesion-image-processing-job-{strftime('%Y-%m-%d-%H-%M-%S')}\",\n", " code=\"scripts/processing/process.py\",\n", " outputs=[\n", " sagemaker.processing.ProcessingOutput(\n", " output_name=\"train\",\n", " source=\"/opt/ml/processing/output/train\",\n", " destination=f\"s3://{S3_BUCKET}/{S3_PREFIX}/data/train/\",\n", " ),\n", " sagemaker.processing.ProcessingOutput(\n", " output_name=\"validation\",\n", " source=\"/opt/ml/processing/output/val\",\n", " destination=f\"s3://{S3_BUCKET}/{S3_PREFIX}/data/val/\",\n", " ),\n", " sagemaker.processing.ProcessingOutput(\n", " output_name=\"test\",\n", " source=\"/opt/ml/processing/output/test\",\n", " destination=f\"s3://{S3_BUCKET}/{S3_PREFIX}/data/test/\",\n", " ),\n", " ]\n", ")\n", "\n", "# Use the lazy procerring run to define a ProcessingStep\n", "processing_step = ProcessingStep(\n", " name=\"LesionImageProcessingStep\",\n", " step_args=processor_args,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.2. Define a Training Step" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Update the tf_ic_estimator from above to use PipelineSession()\n", "tf_ic_estimator.sagemaker_session = PipelineSession()\n", "\n", "# Create a lazy initialization of the training run that will wait to run during the pipeline execution\n", "training_args = tf_ic_estimator.fit(\n", " inputs = {\n", " \"training\": TrainingInput(\n", " s3_data=processing_step.properties.ProcessingOutputConfig.Outputs[\"train\"].S3Output.S3Uri, content_type=\"text/csv\"\n", " ),\n", " \"validation\": TrainingInput(\n", " s3_data=processing_step.properties.ProcessingOutputConfig.Outputs[\"validation\"].S3Output.S3Uri,content_type=\"text/csv\"\n", " )\n", " }\n", ")\n", "\n", "# Use the lazy training run to define a TrainingStep\n", "training_step = TrainingStep(\n", " name=\"LegionClassifierTrainingStep\",\n", " step_args=training_args\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.3. Create and Execute a Pipeline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a pipeline with the processing and training steps\n", "pipeline = Pipeline(\n", " name=f\"lesion-classifier-pipeline-{strftime('%Y-%m-%d-%H-%M-%S')}\",\n", " steps=[\n", " processing_step, \n", " training_step\n", " ],\n", " sagemaker_session=sagemaker_session,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Execute the pipeline\n", "pipeline.upsert(role_arn=SAGEMAKER_EXECUTION_ROLE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3.9.8 64-bit ('3.9.8')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.8" }, "vscode": { "interpreter": { "hash": "a8534c14445fc6cdc3039d8140510d6736e5b4960d89f445a45d8db6afd8452b" } } }, "nbformat": 4, "nbformat_minor": 4 }