{ "cells": [ { "cell_type": "markdown", "id": "every-incidence", "metadata": {}, "source": [ "Rekognition custom label training is a customized mechanism that allows customers to train Rekognition for business case specific datasets. \n", "\n", "The notebook assumes that you have already run your custom label training and created a model using Rekognition custom labels. If you have not, please refer the README file to get started with our sample dataset.\n", "\n", "This notebook is aimed at generating model training metrics for this custom label training to enrich the available metrics for the benefit of the data scientist community to be better informed of the model performance.\n", "\n", "The metrics that are generated from this notebook are\n", "\n", "Confusion Matrix\n", "Heat map\n", "Classification report\n", "Micro / Macro / Weighted precision\n", "Micro / Macro / Weighted recall\n", "Micro / Macro / Weighted F1 score\n", "Raw accuracy\n", "Balanced accuracy\n", "Hamming loss\n", "Jaccard score\n", "Matthew's correlation coefficient\n", "\n", "The intent here is to showcase a mechanism that can be taken as a baseline and customized to generate any ML metrics of interest to the data scientists and ML engineers." ] }, { "cell_type": "code", "execution_count": null, "id": "defined-leader", "metadata": {}, "outputs": [], "source": [ "## Import all the python libraries to be used \n", "import boto3\n", "import botocore\n", "import json\n", "import os\n", "from requests import request\n", "import pandas as pd\n", "from io import BytesIO\n", "import seaborn as sn\n", "import matplotlib.pyplot as plt\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.metrics import precision_score \n", "from sklearn.metrics import average_precision_score\n", "from sklearn.metrics import recall_score\n", "from sklearn.metrics import f1_score\n", "from sklearn.metrics import balanced_accuracy_score\n", "from sklearn.metrics import multilabel_confusion_matrix\n", "import numpy as np\n", "from sklearn.metrics import classification_report\n", "from sklearn.metrics import hamming_loss\n", "from sklearn.metrics import matthews_corrcoef\n", "from sklearn.metrics import jaccard_score\n" ] }, { "cell_type": "markdown", "id": "recognized-attack", "metadata": {}, "source": [ "Rekognition custom label training input and output datasets are stored in specific s3 buckets. In order to derive the model metrics, we need the below summary datasets to be accessed.\n", "1. GroundTruthManifest\n", "2. EvaluationResult\n", "\n", "Please refer the README file to know how to get project arn and version name if you are not sure" ] }, { "cell_type": "code", "execution_count": null, "id": "aquatic-substitute", "metadata": { "scrolled": true }, "outputs": [], "source": [ "client=boto3.client('rekognition')\n", "\n", "##User input needed in this section\n", "## Replace the project arn and version name here\n", "project_arn = 'Add your project arn here in the format : arn:aws:rekognition:region:acc-id:project/'\n", "version_name = 'Add your version name here in the format : version-name.timestamp'\n", "\n" ] }, { "cell_type": "markdown", "id": "sudden-preliminary", "metadata": {}, "source": [ "Using the project arn and version, we can get the s3 path of the testing output and evaluation dataset as below" ] }, { "cell_type": "code", "execution_count": null, "id": "loved-miami", "metadata": {}, "outputs": [], "source": [ "response=client.describe_project_versions(ProjectArn=project_arn, VersionNames=[version_name])\n", "\n", "## Here we retrieve the dataset that is generated as the output from testing\n", "for test_ds in response['ProjectVersionDescriptions'] :\n", " test_s3_path = test_ds['TestingDataResult']['Output']['Assets'][0]['GroundTruthManifest']['S3Object']\n", " test_ds_bucket = test_s3_path.get('Bucket')\n", " test_ds_key = test_s3_path.get('Name')\n", "\n", " print(test_ds_bucket)\n", " print(test_ds_key)\n", "\n", "\n", "## Here we retrieve the evaluation dataset details\n", "for eval_ds in response['ProjectVersionDescriptions'] :\n", " eval_s3_path = eval_ds['EvaluationResult']['Summary']['S3Object']\n", " eval_ds_bucket = eval_s3_path.get('Bucket')\n", " eval_ds_key = eval_s3_path.get('Name')\n", "\n", " print(eval_ds_bucket)\n", " print(eval_ds_key)" ] }, { "cell_type": "markdown", "id": "internal-wonder", "metadata": {}, "source": [ "Now that we have the summary JSON documents, we have to loop thru the json to derive the values for true class and predicted class for each label. The JSON key names are hard coded here to make it easy for the audience." ] }, { "cell_type": "code", "execution_count": null, "id": "statutory-september", "metadata": {}, "outputs": [], "source": [ "##read the evaluation dataset to get the list of labels defined as part of custom training\n", "s3_client=boto3.client('s3')\n", "eval_result = s3_client.get_object(Bucket=eval_ds_bucket, Key=eval_ds_key) \n", "\n", "eval_content = eval_result['Body'].read().decode('utf-8')\n", "eval_dict = json.loads(eval_content)\n", "label_list = eval_dict['EvaluationDetails']['Labels']\n", "\n", "## read test summary dataset\n", "test_result = s3_client.get_object(Bucket=test_ds_bucket, Key=test_ds_key) \n", "\n", "## For scikit learn libraries, we need a list of true classes and predicted classes. This is available in the test dataset\n", "true_class = []\n", "pred_class = []\n", "\n", "for line in test_result[\"Body\"].read().splitlines():\n", " \n", " json_dict = json.loads(line)\n", " \n", " for index in range(len(label_list)) :\n", " #Json record has keys that get appended with numbers to indicate that the evaluation is for a specific class. This number corresponds to the position of the class in the label list. However we need to standardize the keys for ease of querying \n", " if ('rekognition-custom-labels-evaluation-' + str(index) in json_dict ) :\n", " true_class.append(json_dict['rekognition-custom-labels-training-0-metadata']['class-name']) \n", " pred_class.append(json_dict['rekognition-custom-labels-evaluation-' + str(index) + '-metadata'] ['class-name'])\n", "\n", "print(true_class)\n", "print(pred_class)\n" ] }, { "cell_type": "markdown", "id": "billion-inspiration", "metadata": {}, "source": [ "Most ML metrics would need the true class values and predicted class values as input. We will pass these values to the sklearn metrics libraries in python to get some of the important and insightful metrics as below." ] }, { "cell_type": "code", "execution_count": null, "id": "cloudy-flash", "metadata": {}, "outputs": [], "source": [ "\n", "#precision\n", " \n", "micro_precision = precision_score(true_class, pred_class,labels=label_list,average='micro')\n", "macro_precision = precision_score(true_class, pred_class,labels=label_list,average='macro')\n", "weighted_precision = precision_score(true_class, pred_class,labels=label_list,average='weighted')\n", "\n", "print(\"=========\")\n", "print(\"Precision \")\n", "print(\"=========\")\n", "print\n", "print(\"micro precision = \" + str(micro_precision))\n", "print(\"macro precision = \" + str(macro_precision))\n", "print(\"weighted precision = \" + str(weighted_precision))\n", "\n", "#recall\n", "\n", "micro_recall = recall_score(true_class, pred_class, labels=label_list,average='micro')\n", "macro_recall = recall_score(true_class, pred_class, labels=label_list,average='macro')\n", "weighted_recall = recall_score(true_class, pred_class, labels=label_list,average='weighted')\n", "\n", "print(\"=========\")\n", "print(\"Recall \")\n", "print(\"=========\")\n", "print(\"micro recall = \" + str(micro_recall))\n", "print(\"macro recall = \" + str(macro_recall))\n", "print(\"weighted recall = \" + str(weighted_recall))\n", "\n", "\n", "# F1 score:\n", "\n", "micro_f1 = f1_score(true_class, pred_class,labels=label_list,average='micro')\n", "macro_f1 = f1_score(true_class, pred_class,labels=label_list,average='macro')\n", "weighted_f1 = f1_score(true_class, pred_class,labels=label_list,average='weighted')\n", "\n", "print(\"=========\")\n", "print(\"F1 score\")\n", "print(\"=========\")\n", "print(\"micro f1 = \" + str(micro_f1))\n", "print(\"macro f1 = \" + str(macro_f1))\n", "print(\"weighted f1 = \" + str(weighted_f1))\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "african-federation", "metadata": {}, "outputs": [], "source": [ "## Raw accuracy\n", "\n", "print(\"=========\")\n", "print(\"ACCURACY\")\n", "print(\"=========\")\n", "\n", "accuracy = accuracy_score(true_class,pred_class)\n", "print(\"raw accuracy = \" + str(accuracy))\n", "\n", "balanced_accuracy = balanced_accuracy_score(true_class,pred_class)\n", "print(\"balanced accuracy = \" + str(balanced_accuracy))\n", "\n", "##Hamming loss\n", "hamming = hamming_loss(true_class,pred_class)\n", "\n", "print(\"============\")\n", "print(\"Hamming loss\")\n", "print(\"============\")\n", "print(str(hamming))\n", "\n", "##Jaccard score\n", "print(\"==============\")\n", "print(\"Jaccard score\")\n", "print(\"==============\")\n", "jaccard_micro = jaccard_score(true_class, pred_class, average='micro')\n", "print(\"Jaccard score micro average = \" + str(jaccard_micro))\n", "\n", "jaccard_macro = jaccard_score(true_class, pred_class, average='macro')\n", "print(\"Jaccard score macro average = \" + str(jaccard_macro))\n", "\n", "jaccard_weighted = jaccard_score(true_class, pred_class, average='weighted')\n", "print(\"Jaccard score weighted average = \" + str(jaccard_weighted))\n", "\n", "## Matthews Correlation coefficient\n", "print(\"================================\")\n", "print(\"Matthews Correlation coefficient\")\n", "print(\"================================\")\n", "matthews_corr = matthews_corrcoef(true_class, pred_class)\n", "print(matthews_corr)\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "offensive-export", "metadata": {}, "outputs": [], "source": [ "##Classification report\n", "target = label_list\n", "print(\"=====================\")\n", "print(\"Classification Report\")\n", "print(\"=====================\")\n", "print(classification_report(true_class, pred_class, target_names=target))\n" ] }, { "cell_type": "code", "execution_count": null, "id": "prospective-immunology", "metadata": {}, "outputs": [], "source": [ "##Confusion matrix\n", "\n", "cf_matrix = confusion_matrix(true_class, pred_class,labels=label_list)\n", "\n", "print(\"================\")\n", "print(\"Confusion matrix \")\n", "print(\"================\")\n", "print(cf_matrix)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "advised-notice", "metadata": {}, "outputs": [], "source": [ "##Heat map\n", "\n", "counts = [\"{0:0.0f}\".format(value) for value in\n", " cf_matrix.flatten()]\n", "percentages = [\"{0:.2%}\".format(value) for value in cf_matrix.flatten()/np.sum(cf_matrix)]\n", "labels = [f\"{val1}\\n{val2}\" for val1, val2 in\n", " zip(counts,percentages)]\n", " \n", "labels = np.asarray(labels).reshape(len(label_list),len(label_list))\n", "\n", "\n", "print(\"========\")\n", "print(\"Heat map\")\n", "print(\"========\")\n", "\n", "sn.heatmap(cf_matrix, annot=labels, xticklabels=label_list, yticklabels=label_list,fmt='', cmap='coolwarm')\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 }