{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Invoking SageMarker-Spark Trained XGBoost Multi-Class Classification Model With Boto3 \n", "_**Invoking your sagemaker-spark SDK trained model from boto3 to show how it can be leveraged in a web or mobile app**_\n", "\n", "---\n", "\n", "## Introduction\n", "\n", "\n", "This notebook demonstrates how you can invoke your SageMaker-Spark trained XGBoost model deployed in the `MNIST-xgboost-train.ipynb` notebook.\n", "\n", "---\n", "\n", "## Download Test Record\n", "\n", "For the purposes of this example we are downloading an existing test record that has already been converted to libsvm format." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import urllib.request\n", "\n", "url = 'https://raw.githubusercontent.com/aws/sagemaker-spark/master/examples/notebooks/jupyter/xgboost/test.data'\n", "response = urllib.request.urlopen(url)\n", "data = response.read().decode('utf-8') # This record's true label is 7.0\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prep Data\n", "\n", "As you can see this is a very sparse dataset, meaning most of the features are `0.0`. The model endpoint expects the sparse entries to be removed before invocation. A simple approach to this data preparation problem is to treat the features as strings and remove any feature that ends with '0.0'." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'203:0.328125 204:0.72265625 205:0.62109375 206:0.58984375 207:0.234375 208:0.140625 231:0.8671875 232:0.9921875 233:0.9921875 234:0.9921875 235:0.9921875 236:0.94140625 237:0.7734375 238:0.7734375 239:0.7734375 240:0.7734375 241:0.7734375 242:0.7734375 243:0.7734375 244:0.7734375 245:0.6640625 246:0.203125 259:0.26171875 260:0.4453125 261:0.28125 262:0.4453125 263:0.63671875 264:0.88671875 265:0.9921875 266:0.87890625 267:0.9921875 268:0.9921875 269:0.9921875 270:0.9765625 271:0.89453125 272:0.9921875 273:0.9921875 274:0.546875 292:0.06640625 293:0.2578125 294:0.0546875 295:0.26171875 296:0.26171875 297:0.26171875 298:0.23046875 299:0.08203125 300:0.921875 301:0.9921875 302:0.4140625 327:0.32421875 328:0.98828125 329:0.81640625 330:0.0703125 354:0.0859375 355:0.91015625 356:0.99609375 357:0.32421875 382:0.50390625 383:0.9921875 384:0.9296875 385:0.171875 409:0.23046875 410:0.97265625 411:0.9921875 412:0.2421875 437:0.51953125 438:0.9921875 439:0.73046875 440:0.01953125 464:0.03515625 465:0.80078125 466:0.96875 467:0.2265625 492:0.4921875 493:0.9921875 494:0.7109375 519:0.29296875 520:0.98046875 521:0.9375 522:0.22265625 546:0.07421875 547:0.86328125 548:0.9921875 549:0.6484375 573:0.01171875 574:0.79296875 575:0.9921875 576:0.85546875 577:0.13671875 601:0.1484375 602:0.9921875 603:0.9921875 604:0.30078125 628:0.12109375 629:0.875 630:0.9921875 631:0.44921875 632:0.00390625 656:0.51953125 657:0.9921875 658:0.9921875 659:0.203125 683:0.23828125 684:0.9453125 685:0.9921875 686:0.9921875 687:0.203125 711:0.47265625 712:0.9921875 713:0.9921875 714:0.85546875 715:0.15625 739:0.47265625 740:0.9921875 741:0.80859375 742:0.0703125'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Naive way of removing sparse elements of\n", "\n", "sparse_data = ' '.join([i for i in data.split(' ') if not i.endswith('0.0')])\n", "sparse_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Invoke Endpoint\n", "\n", "Now that our data has been prepared we can invoke the endpoint. Be sure to replace `endpoint_name` with the name of the endpoint created in the previous `MNIST-xgboost-train.ipynb` notebook (or feel free to use an alternative XGBoost MNIST classification endpoint you've created)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predicted label is 7.0.\n" ] } ], "source": [ "import boto3\n", "\n", "runtime_client = boto3.client('runtime.sagemaker')\n", "\n", "endpoint_name = 'name_of_endpoint_created_in_training_notebook'\n", "\n", "payload = sparse_data.encode('utf-8').strip()\n", "\n", "response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, \n", " ContentType='text/x-libsvm', \n", " Body=payload)\n", "result = response['Body'].read().decode('ascii')\n", "print('Predicted label is {}.'.format(result))" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }