{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Inferencia sobre el Dataset Simplified Human Activity Tracking\n", "Dataset simplificado (el que vamos a usar): https://www.kaggle.com/mboaglio/simplifiedhuarus\n", "\n", "Clasificaremos la actividad que se estaba realizando: WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING y LAYING\n", "\n", "Para cada registro en el dataset se proporciona:\n", "- Aceleración triaxial del acelerómetro (aceleración total) y la aceleración corporal estimada.\n", "- Velocidad angular triaxial desde el giroscopio.\n", "- Un vector de características 561 con variables de dominio de tiempo y frecuencia.\n", "- Su etiqueta de actividad.\n", "- Un identificador del sujeto que realizó el experimento.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nombre del Endpoint con el modelo que armó Autopilot" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "ep_name = 'AutoML-Autopilot--notebook-run-01-15-10-53'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import boto3,sys\n", "sm_rt = boto3.Session().client('runtime.sagemaker')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Loop para hacer inferencia de cada item del dataset. Calculo metricas del modelo.\n", "### Lo ideal aca es hacer un BATCH TRANSFORM con el dataset entero ... pero eso queda para otra demo :D" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Infiriendo y armando matriz de confusion:\n", ".....................................................................................................................................................................................\n", "Listo!\n" ] } ], "source": [ "import numpy\n", "\n", "activities = ['SITTING', 'WALKING', 'LAYING', 'WALKING_UPSTAIRS', 'WALKING_DOWNSTAIRS', 'STANDING']\n", "cm = numpy.zeros(shape=(6,6))\n", " \n", "print (\"Infiriendo y armando matriz de confusion:\")\n", "\n", "with open('had-autopilot-test.csv') as f:\n", " lines = f.readlines()\n", " for l in lines[1:]: \n", "\n", " # Saco el campo \"activity\" del registro y mando al endpoint el resto de los campos separados por una coma.\n", " # El campo \"activity\" es el \"label\" que vamos a predecir con el endpoint.\n", " l = l.split(',') \n", " label = l[0] \n", " l = l[1:] \n", " l = ','.join(l)\n", " \n", " response = sm_rt.invoke_endpoint(EndpointName=ep_name, ContentType='text/csv', Accept='text/csv', Body=l)\n", "\n", " # Saco el newline del body\n", " response = response['Body'].read().decode(\"utf-8\")\n", " response = response.rstrip(\"\\n\")\n", " \n", " # Armo la matriz de confusion sumando las labels vs las labels inferidas por el modelo\n", " cm[activities.index(label), activities.index(response)] = cm[activities.index(label), activities.index(response)] + 1\n", " \n", " print (\".\", end =\"\")\n", "\n", " \n", "print (\"\\nListo!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Armo y muestro Matriz de confusión\n", "## La muestro como Heatmap" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import seaborn as sn\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "# Imprimo la matriz de confusion como Heatmap\n", "df_cm = pd.DataFrame(cm, index = [i for i in activities],\n", " columns = [i for i in activities])\n", "plt.figure(figsize = (10,7))\n", "sn.heatmap(df_cm, annot=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Accuracy del modelo" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.9779005524861878\n" ] } ], "source": [ "accuracy = cm.diagonal().sum() / cm.sum()\n", "print(\"Accuracy: {}\".format(accuracy))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-2:429704687514:environment/datascience" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }