{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Coswara Audio Classification\n", "\n", "In this notebook, we will demonstrate using a custom SagemMaker PyTorch container to train an acoustic classification model in SageMaker script mode.\n", "\n", "In this example, the model take reference to the paper VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR RAW WAVEFORMS by Wei Dai et al., you can get more information by reading the paper." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/aws-samples/applying-voice-classification-in-amazon-connect-contact-flow/blob/main/sagemaker-voice-classification/notebook/coswara-audio-classification.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dataset\n", "\n", "We will use the Coswara dataset to train our network. It is available for free here The data set distribution is here \n", "\n", "The following are the class labels:\n", "```\n", "0 = healthy \n", "1 = resp_illness_not_identified\n", "1 = no_resp_illness_exposed \n", "1 = recovered_full\n", "1 = positive_mild\n", "1 = positive_asymp \n", "1 = positive_moderate\n", "```\n", "\n", "The expected directory structure is as follows with respect to this notebook:\n", "\n", "```\n", "/home/ec2-user/SageMaker/Coswara-Data/\n", "|-- 20200413\n", "| |-- 20200413.csv\n", "| |-- 20200413.tar.gz.aa\n", "| |-- 20200413.tar.gz.ab\n", "| |-- 20200413.tar.gz.ac\n", "| |-- 20200413.tar.gz.ad\n", "...\n", "| \n", "`-- combined_data.csv\n", "```\n", "\n", "Let's take a look at a sample file to ensure dataset is downloaded to the correct location." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### first process the raw Coswara data\n", "uncompress audio recordings and generate metadata file for each type of recording, including: \n", "- breathing-deep-metadata.csv \n", "- breathing-shallow-metadata.csv \n", "- cough-heavy-metadata.csv \n", "- cough-shallow-metadata.csv \n", "- counting-fast-metadata.csv \n", "- counting-normal-metadata.csv \n", "- vowel-a-metadata.csv \n", "- vowel-e-metadata.csv \n", "- vowel-o-metadata.csv " ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "!chmod u+x ../preprocess.sh\n", "!../preprocess.sh" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import sagemaker\n", "from sagemaker import get_execution_role\n", "from sagemaker.pytorch import PyTorch\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "role = get_execution_role()\n", "ecr_repository_name = 'coswara-audio-classification'\n", "account_id = role.split(':')[4]\n", "region = boto3.Session().region_name\n", "sagemaker_session = sagemaker.Session(default_bucket='sagemaker-audio-classification-{}'.format(account_id)) ## this S3 bucket was created by the same CloudFormation stack for creating this notebook instance\n", "bucket = sagemaker_session.default_bucket()\n", "\n", "\n", "print('Account: {}'.format(account_id))\n", "print('Region: {}'.format(region))\n", "print('Role: {}'.format(role))\n", "print('S3 Bucket: {}'.format(bucket))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "with open('Dockerfile', 'w') as f:\n", " f.write(\"FROM 763104351884.dkr.ecr.{}.amazonaws.com/pytorch-training:1.5.1-gpu-py3\\n\".format(region))\n", " f.write(\"RUN apt-get update && apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends libsndfile1\")" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "%%writefile build_and_push.sh\n", "\n", "ACCOUNT_ID=$1\n", "REGION=$2\n", "REPO_NAME=$3\n", "DOCKERFILE=$4\n", "SERVER=\"${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com\"\n", "\n", "echo \"ACCOUNT_ID: ${ACCOUNT_ID}\"\n", "echo \"REPO_NAME: ${REPO_NAME}\"\n", "echo \"REGION: ${REGION}\"\n", "echo \"DOCKERFILE: ${DOCKERFILE}\"\n", "\n", "# Login to retrieve base container\n", "aws ecr get-login-password | docker login --username AWS --password-stdin 763104351884.dkr.ecr.${REGION}.amazonaws.com\n", "\n", "docker build -q -f ${DOCKERFILE} -t ${REPO_NAME} .\n", "\n", "docker tag ${REPO_NAME} ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:latest\n", "\n", "aws ecr get-login-password | docker login --username AWS --password-stdin ${SERVER}\n", "aws ecr describe-repositories --repository-names ${REPO_NAME} || aws ecr create-repository --repository-name ${REPO_NAME}\n", "\n", "docker push ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:latest" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "!bash build_and_push.sh $account_id $region $ecr_repository_name Dockerfile" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "## first time run this to upload data to S3\n", "train_data = sagemaker_session.upload_data(\n", " \"/home/ec2-user/SageMaker/Coswara-Data/\",\n", " bucket=bucket,\n", " key_prefix=\"Coswara-Data\",\n", ")" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "## following run this to avoid upload\n", "train_data = \"s3://sagemaker-audio-classification-{}/Coswara-Data\".format(account_id)\n", "\n", "train_input = sagemaker.session.TrainingInput(train_data,\n", " distribution='FullyReplicated',\n", " content_type='csv',\n", " s3_data_type='S3Prefix')\n", "\n", "train_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)\n", "print('ECR training container ARN: {}'.format(train_image_uri))\n", "\n", "hyperparams = {'lr': 0.0001388900761687841, # learning rate\n", " 'gamma': 0.6165182113724552, # Learning rate step gamma\n", " 'weight-decay': 0.001, # Optimizer regularization\n", " 'stepsize': 5, # Optimizer stepsize\n", " 'epochs': 30, # iterations to stablize\n", " 'batch-size': 256, # train batch size\n", " 'num-workers': 30,\n", " 'csv-file': 'counting-normal-metadata.csv' ## breathing-deep-metadata.csv, breathing-shallow-metadata.csv, cough-heavy-metadata.csv, cough-shallow-metadata.csv, counting-fast-metadata.csv, counting-normal-metadata.csv, vowel-a-metadata.csv, vowel-e-metadata.csv, vowel-o-metadata.csv\n", " }\n", "\n", "pytorch_estimator = PyTorch(image_uri=train_image_uri,\n", " entry_point='train.py',\n", " source_dir='./',\n", " role=role,\n", " instance_type='ml.c5.2xlarge',\n", " instance_count=1,\n", " output_path = \"s3://{}/\".format(bucket),\n", " hyperparameters = hyperparams,\n", " metric_definitions = [\n", " {'Name': 'test:loss', 'Regex': 'Average loss: ([0-9\\\\.]+)'},\n", " {'Name': 'test:f1', 'Regex': 'F1: ([0-9\\\\.]+)'},\n", " {'Name': 'test:f2', 'Regex': 'F2: ([0-9\\\\.]+)'},\n", " {'Name': 'test:precision', 'Regex': 'Precision: ([0-9\\\\.]+)'},\n", " {'Name': 'test:recall', 'Regex': 'Recall: ([0-9\\\\.]+)'},\n", " {'Name': 'test:accuracy', 'Regex': 'Accuracy: ([0-9\\\\.]+)'}\n", " ]\n", " )\n", "\n", "\n", "pytorch_estimator.fit({'training': train_input}, wait=True)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "## hyperparameter tuning (optional to run)\n", "\n", "objective_metric_name = 'test:f2'\n", "objective_type = 'Maximize'\n", "metric_definitions = [\n", " {'Name': 'test:loss', 'Regex': 'Average loss: ([0-9\\\\.]+)'},\n", " {'Name': 'test:f1', 'Regex': 'F1: ([0-9\\\\.]+)'},\n", " {'Name': 'test:f2', 'Regex': 'F2: ([0-9\\\\.]+)'},\n", " {'Name': 'test:precision', 'Regex': 'Precision: ([0-9\\\\.]+)'},\n", " {'Name': 'test:recall', 'Regex': 'Recall: ([0-9\\\\.]+)'},\n", " {'Name': 'test:accuracy', 'Regex': 'Accuracy: ([0-9\\\\.]+)'}\n", "]\n", "\n", "hyperparameter_ranges = {\n", " 'lr': sagemaker.tuner.ContinuousParameter(0.0001, 0.1),\n", " 'gamma': sagemaker.tuner.ContinuousParameter(0.001, 1),\n", " 'weight-decay': sagemaker.tuner.CategoricalParameter([0.000001, 0.00001, 0.001]), \n", " 'stepsize': sagemaker.tuner.CategoricalParameter([1,5,10])\n", "}\n", "\n", "\n", "tuner = sagemaker.tuner.HyperparameterTuner(pytorch_estimator,\n", " objective_metric_name,\n", " hyperparameter_ranges,\n", " metric_definitions,\n", " max_jobs=2,\n", " max_parallel_jobs=2,\n", " objective_type=objective_type)\n", "\n", "tuner.fit({'training': train_input})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.pytorch import PyTorchModel\n", "\n", "pytorch_model = PyTorchModel(model_data=pytorch_estimator.model_data, \n", " role=role, \n", " entry_point='inference.py',\n", " source_dir='./',\n", " py_version='py3',\n", " framework_version='1.6.0',\n", " )\n", "predictor = pytorch_model.deploy(initial_instance_count=1, instance_type='ml.c5.2xlarge', wait=True)\n", "## The inference endpoint name will be used in SageMaker Client\n", "print(\"Inference endpoint name: {}\".format(pytorch_model.endpoint_name))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The voice classification model has been deoployed as a SageMaker inference endpoint. \n", "We will test it below. \n", "First, we will install the dependency: " ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "!pip install torchaudio" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "from coswara_dataset import CoswareDataset\n", "from pathlib import Path\n", "import torch\n", "\n", "datapath = Path(\"/home/ec2-user/SageMaker/Coswara-Data\")\n", "csvpath = datapath / \"breathing-deep-metadata.csv\"\n", "\n", "test_set = CoswareDataset(\n", " csv_path=csvpath,\n", " file_path=datapath,\n", " new_sr=8000,\n", " audio_len=20,\n", " sampling_ratio=5,\n", ")\n", "test_loader = torch.utils.data.DataLoader(test_set, batch_size=5)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "X, y = next(iter(test_loader))\n", "print(X.shape, y)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import numpy as np\n", "prediction = predictor.predict(X.numpy())\n", "print(\"Prediction array: {}\".format(prediction))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Here is a case study for positive asymptomatic COVID-19 voice recording" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## play a sample audio recording for positive_asymp\n", "from IPython.display import Audio\n", "\n", "coswarapath = '/home/ec2-user/SageMaker/Coswara-Data/20200820/20200820'\n", "audioid = 'kBFDtvAVY9QYbi7YHYgd7tNpsWx1'\n", "audiotype = 'counting-normal.wav'\n", "filename = '/'.join([coswarapath, audioid, audiotype])\n", "Audio(filename, autoplay=False)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The probability of positive label is [0.8074303865432739]\n" ] } ], "source": [ "## make a prediction\n", "\n", "import boto3\n", "\n", "client = boto3.client('sagemaker-runtime')\n", "response = client.invoke_endpoint(\n", " EndpointName=pytorch_model.endpoint_name,\n", " Body='s3://sagemaker-audio-classification-{}/Coswara-Data/20200820/20200820/kBFDtvAVY9QYbi7YHYgd7tNpsWx1/counting-normal.wav'.format(account_id),\n", " ContentType='text/csv',\n", ")\n", "\n", "print(\"The probability of positive label is {}\".format(response['Body'].read().decode(\"utf-8\")))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "default:Python", "language": "python", "name": "conda-env-default-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 4 }