{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Invoking SageMarker-Spark Trained XGBoost Multi-Class Classification Model With Boto3 \n",
    "_**Invoking your sagemaker-spark SDK trained model from boto3 to show how it can be leveraged in a web or mobile app**_\n",
    "\n",
    "---\n",
    "\n",
    "## Introduction\n",
    "\n",
    "\n",
    "This notebook demonstrates how you can invoke your SageMaker-Spark trained XGBoost model deployed in the `MNIST-xgboost-train.ipynb` notebook.\n",
    "\n",
    "---\n",
    "\n",
    "## Download Test Record\n",
    "\n",
    "For the purposes of this example we are downloading an existing test record that has already been converted to libsvm format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "\n",
    "url = 'https://raw.githubusercontent.com/aws/sagemaker-spark/master/examples/notebooks/jupyter/xgboost/test.data'\n",
    "response = urllib.request.urlopen(url)\n",
    "data = response.read().decode('utf-8') # This record's true label is 7.0\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prep Data\n",
    "\n",
    "As you can see this is a very sparse dataset, meaning most of the features are `0.0`. The model endpoint expects the sparse entries to be removed before invocation. A simple approach to this data preparation problem is to treat the features as strings and remove any feature that ends with '0.0'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'203:0.328125 204:0.72265625 205:0.62109375 206:0.58984375 207:0.234375 208:0.140625 231:0.8671875 232:0.9921875 233:0.9921875 234:0.9921875 235:0.9921875 236:0.94140625 237:0.7734375 238:0.7734375 239:0.7734375 240:0.7734375 241:0.7734375 242:0.7734375 243:0.7734375 244:0.7734375 245:0.6640625 246:0.203125 259:0.26171875 260:0.4453125 261:0.28125 262:0.4453125 263:0.63671875 264:0.88671875 265:0.9921875 266:0.87890625 267:0.9921875 268:0.9921875 269:0.9921875 270:0.9765625 271:0.89453125 272:0.9921875 273:0.9921875 274:0.546875 292:0.06640625 293:0.2578125 294:0.0546875 295:0.26171875 296:0.26171875 297:0.26171875 298:0.23046875 299:0.08203125 300:0.921875 301:0.9921875 302:0.4140625 327:0.32421875 328:0.98828125 329:0.81640625 330:0.0703125 354:0.0859375 355:0.91015625 356:0.99609375 357:0.32421875 382:0.50390625 383:0.9921875 384:0.9296875 385:0.171875 409:0.23046875 410:0.97265625 411:0.9921875 412:0.2421875 437:0.51953125 438:0.9921875 439:0.73046875 440:0.01953125 464:0.03515625 465:0.80078125 466:0.96875 467:0.2265625 492:0.4921875 493:0.9921875 494:0.7109375 519:0.29296875 520:0.98046875 521:0.9375 522:0.22265625 546:0.07421875 547:0.86328125 548:0.9921875 549:0.6484375 573:0.01171875 574:0.79296875 575:0.9921875 576:0.85546875 577:0.13671875 601:0.1484375 602:0.9921875 603:0.9921875 604:0.30078125 628:0.12109375 629:0.875 630:0.9921875 631:0.44921875 632:0.00390625 656:0.51953125 657:0.9921875 658:0.9921875 659:0.203125 683:0.23828125 684:0.9453125 685:0.9921875 686:0.9921875 687:0.203125 711:0.47265625 712:0.9921875 713:0.9921875 714:0.85546875 715:0.15625 739:0.47265625 740:0.9921875 741:0.80859375 742:0.0703125'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Naive way of removing sparse elements of\n",
    "\n",
    "sparse_data = ' '.join([i for i in data.split(' ') if not i.endswith('0.0')])\n",
    "sparse_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Invoke Endpoint\n",
    "\n",
    "Now that our data has been prepared we can invoke the endpoint. Be sure to replace `endpoint_name` with the name of the endpoint created in the previous `MNIST-xgboost-train.ipynb` notebook (or feel free to use an alternative XGBoost MNIST classification endpoint you've created)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicted label is 7.0.\n"
     ]
    }
   ],
   "source": [
    "import boto3\n",
    "\n",
    "runtime_client = boto3.client('runtime.sagemaker')\n",
    "\n",
    "endpoint_name = 'name_of_endpoint_created_in_training_notebook'\n",
    "\n",
    "payload = sparse_data.encode('utf-8').strip()\n",
    "\n",
    "response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, \n",
    "                                   ContentType='text/x-libsvm', \n",
    "                                   Body=payload)\n",
    "result = response['Body'].read().decode('ascii')\n",
    "print('Predicted label is {}.'.format(result))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}