# Coswara Audio Classification

In this notebook, we will demonstrate using a custom SagemMaker PyTorch container to train an acoustic classification model in SageMaker script mode.

In this example, the model take reference to the paper VERY DEEP CONVOLUTIONAL NEURAL NETWORKS FOR RAW WAVEFORMS by Wei Dai et al., you can get more information by reading the paper.

[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/aws-samples/applying-voice-classification-in-amazon-connect-contact-flow/blob/main/sagemaker-voice-classification/notebook/coswara-audio-classification.ipynb)

### Dataset

We will use the Coswara dataset to train our network. It is available for free here The data set distribution is here 

The following are the class labels:
```
0 = healthy 
1 = resp_illness_not_identified
1 = no_resp_illness_exposed 
1 = recovered_full
1 = positive_mild
1 = positive_asymp 
1 = positive_moderate
```

The expected directory structure is as follows with respect to this notebook:

```
/home/ec2-user/SageMaker/Coswara-Data/
|-- 20200413
| |-- 20200413.csv
| |-- 20200413.tar.gz.aa
| |-- 20200413.tar.gz.ab
| |-- 20200413.tar.gz.ac
| |-- 20200413.tar.gz.ad
...
| 
`-- combined_data.csv
```

Let's take a look at a sample file to ensure dataset is downloaded to the correct location.

### first process the raw Coswara data
uncompress audio recordings and generate metadata file for each type of recording, including: 
- breathing-deep-metadata.csv 
- breathing-shallow-metadata.csv 
- cough-heavy-metadata.csv 
- cough-shallow-metadata.csv 
- counting-fast-metadata.csv 
- counting-normal-metadata.csv 
- vowel-a-metadata.csv 
- vowel-e-metadata.csv 
- vowel-o-metadata.csv 

In [28]:
!chmod u+x ../preprocess.sh
!../preprocess.sh

In [29]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.pytorch import PyTorch
import warnings
warnings.filterwarnings('ignore')

role = get_execution_role()
ecr_repository_name = 'coswara-audio-classification'
account_id = role.split(':')[4]
region = boto3.Session().region_name
sagemaker_session = sagemaker.Session(default_bucket='sagemaker-audio-classification-{}'.format(account_id)) ## this S3 bucket was created by the same CloudFormation stack for creating this notebook instance
bucket = sagemaker_session.default_bucket()


print('Account: {}'.format(account_id))
print('Region: {}'.format(region))
print('Role: {}'.format(role))
print('S3 Bucket: {}'.format(bucket))

In [4]:
with open('Dockerfile', 'w') as f:
 f.write("FROM 763104351884.dkr.ecr.{}.amazonaws.com/pytorch-training:1.5.1-gpu-py3\n".format(region))
 f.write("RUN apt-get update && apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends libsndfile1")

In [30]:
%%writefile build_and_push.sh

ACCOUNT_ID=$1
REGION=$2
REPO_NAME=$3
DOCKERFILE=$4
SERVER="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"

echo "ACCOUNT_ID: ${ACCOUNT_ID}"
echo "REPO_NAME: ${REPO_NAME}"
echo "REGION: ${REGION}"
echo "DOCKERFILE: ${DOCKERFILE}"

# Login to retrieve base container
aws ecr get-login-password | docker login --username AWS --password-stdin 763104351884.dkr.ecr.${REGION}.amazonaws.com

docker build -q -f ${DOCKERFILE} -t ${REPO_NAME} .

docker tag ${REPO_NAME} ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:latest

aws ecr get-login-password | docker login --username AWS --password-stdin ${SERVER}
aws ecr describe-repositories --repository-names ${REPO_NAME} || aws ecr create-repository --repository-name ${REPO_NAME}

docker push ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO_NAME}:latest

In [31]:
!bash build_and_push.sh $account_id $region $ecr_repository_name Dockerfile

In [7]:
## first time run this to upload data to S3
train_data = sagemaker_session.upload_data(
 "/home/ec2-user/SageMaker/Coswara-Data/",
 bucket=bucket,
 key_prefix="Coswara-Data",
)

In [32]:
## following run this to avoid upload
train_data = "s3://sagemaker-audio-classification-{}/Coswara-Data".format(account_id)

train_input = sagemaker.session.TrainingInput(train_data,
 distribution='FullyReplicated',
 content_type='csv',
 s3_data_type='S3Prefix')

train_image_uri = '{0}.dkr.ecr.{1}.amazonaws.com/{2}:latest'.format(account_id, region, ecr_repository_name)
print('ECR training container ARN: {}'.format(train_image_uri))

hyperparams = {'lr': 0.0001388900761687841, # learning rate
 'gamma': 0.6165182113724552, # Learning rate step gamma
 'weight-decay': 0.001, # Optimizer regularization
 'stepsize': 5, # Optimizer stepsize
 'epochs': 30, # iterations to stablize
 'batch-size': 256, # train batch size
 'num-workers': 30,
 'csv-file': 'counting-normal-metadata.csv' ## breathing-deep-metadata.csv, breathing-shallow-metadata.csv, cough-heavy-metadata.csv, cough-shallow-metadata.csv, counting-fast-metadata.csv, counting-normal-metadata.csv, vowel-a-metadata.csv, vowel-e-metadata.csv, vowel-o-metadata.csv
 }

pytorch_estimator = PyTorch(image_uri=train_image_uri,
 entry_point='train.py',
 source_dir='./',
 role=role,
 instance_type='ml.c5.2xlarge',
 instance_count=1,
 output_path = "s3://{}/".format(bucket),
 hyperparameters = hyperparams,
 metric_definitions = [
 {'Name': 'test:loss', 'Regex': 'Average loss: ([0-9\\.]+)'},
 {'Name': 'test:f1', 'Regex': 'F1: ([0-9\\.]+)'},
 {'Name': 'test:f2', 'Regex': 'F2: ([0-9\\.]+)'},
 {'Name': 'test:precision', 'Regex': 'Precision: ([0-9\\.]+)'},
 {'Name': 'test:recall', 'Regex': 'Recall: ([0-9\\.]+)'},
 {'Name': 'test:accuracy', 'Regex': 'Accuracy: ([0-9\\.]+)'}
 ]
 )


pytorch_estimator.fit({'training': train_input}, wait=True)

In [7]:
## hyperparameter tuning (optional to run)

objective_metric_name = 'test:f2'
objective_type = 'Maximize'
metric_definitions = [
 {'Name': 'test:loss', 'Regex': 'Average loss: ([0-9\\.]+)'},
 {'Name': 'test:f1', 'Regex': 'F1: ([0-9\\.]+)'},
 {'Name': 'test:f2', 'Regex': 'F2: ([0-9\\.]+)'},
 {'Name': 'test:precision', 'Regex': 'Precision: ([0-9\\.]+)'},
 {'Name': 'test:recall', 'Regex': 'Recall: ([0-9\\.]+)'},
 {'Name': 'test:accuracy', 'Regex': 'Accuracy: ([0-9\\.]+)'}
]

hyperparameter_ranges = {
 'lr': sagemaker.tuner.ContinuousParameter(0.0001, 0.1),
 'gamma': sagemaker.tuner.ContinuousParameter(0.001, 1),
 'weight-decay': sagemaker.tuner.CategoricalParameter([0.000001, 0.00001, 0.001]), 
 'stepsize': sagemaker.tuner.CategoricalParameter([1,5,10])
}


tuner = sagemaker.tuner.HyperparameterTuner(pytorch_estimator,
 objective_metric_name,
 hyperparameter_ranges,
 metric_definitions,
 max_jobs=2,
 max_parallel_jobs=2,
 objective_type=objective_type)

tuner.fit({'training': train_input})

In [None]:
from sagemaker.pytorch import PyTorchModel

pytorch_model = PyTorchModel(model_data=pytorch_estimator.model_data, 
 role=role, 
 entry_point='inference.py',
 source_dir='./',
 py_version='py3',
 framework_version='1.6.0',
 )
predictor = pytorch_model.deploy(initial_instance_count=1, instance_type='ml.c5.2xlarge', wait=True)
## The inference endpoint name will be used in SageMaker Client
print("Inference endpoint name: {}".format(pytorch_model.endpoint_name))

The voice classification model has been deoployed as a SageMaker inference endpoint. 
We will test it below. 
First, we will install the dependency: 

In [33]:
!pip install torchaudio

In [14]:
from coswara_dataset import CoswareDataset
from pathlib import Path
import torch

datapath = Path("/home/ec2-user/SageMaker/Coswara-Data")
csvpath = datapath / "breathing-deep-metadata.csv"

test_set = CoswareDataset(
 csv_path=csvpath,
 file_path=datapath,
 new_sr=8000,
 audio_len=20,
 sampling_ratio=5,
)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=5)

In [34]:
X, y = next(iter(test_loader))
print(X.shape, y)

In [35]:
import numpy as np
prediction = predictor.predict(X.numpy())
print("Prediction array: {}".format(prediction))

### Here is a case study for positive asymptomatic COVID-19 voice recording

In [26]:
## play a sample audio recording for positive_asymp
from IPython.display import Audio

coswarapath = '/home/ec2-user/SageMaker/Coswara-Data/20200820/20200820'
audioid = 'kBFDtvAVY9QYbi7YHYgd7tNpsWx1'
audiotype = 'counting-normal.wav'
filename = '/'.join([coswarapath, audioid, audiotype])
Audio(filename, autoplay=False)

In [27]:
## make a prediction

import boto3

client = boto3.client('sagemaker-runtime')
response = client.invoke_endpoint(
 EndpointName=pytorch_model.endpoint_name,
 Body='s3://sagemaker-audio-classification-{}/Coswara-Data/20200820/20200820/kBFDtvAVY9QYbi7YHYgd7tNpsWx1/counting-normal.wav'.format(account_id),
 ContentType='text/csv',
)

print("The probability of positive label is {}".format(response['Body'].read().decode("utf-8")))

The probability of positive label is [0.8074303865432739]
