{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Serve a TensorFlow hub model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The model for this example was trained using this sample notebook on sagemaker - https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb\n", "\n", "It is certainly easiler to do estimator.deploy() using the standard Sagemaker SDK if you are following that example, but cinsider this one if you have a pytorch model (or two) on S3 and you are looking for an easy way to test and deploy this model. Using tensorflow-gpu==2.0.0 instead of normal tf because of a live issue regarding libinfer.so" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --upgrade pip\n", "!pip install wrapt --upgrade --ignore-installed\n", "!pip install --upgrade tensorflow-gpu==2.0.0 tensorflow-hub" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "inputs = \"The quick brown fox jumps over the lazy dog.\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import tensorflow\n", "import tensorflow_hub as hub\n", "\n", "embed = hub.load(\"https://tfhub.dev/google/universal-sentence-encoder/4\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "embeddings = embed([inputs])\n", "print(embeddings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 : Write a model transform script\n", "\n", "#### Make sure you have a ...\n", "\n", "- \"load_model\" function\n", " - input args are model path\n", " - returns loaded model object\n", " - model name is the same as what you saved the model file as (see above step)\n", "

\n", "- \"predict\" function\n", " - input args are the loaded model object and a payload\n", " - returns the result of model.predict\n", " - make sure you format it as a single (or multiple) string return inside a list for real time (for mini batch)\n", " - from a client, a list or string or np.array that is sent for prediction is interpreted as bytes. Do what you have to for converting back to list or string or np.array\n", " - return the error for debugging\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%writefile modelscript_tensorflow.py\n", "import tensorflow as tf\n", "import numpy as np\n", "import tensorflow_hub as hub\n", "import json\n", "\n", "#Return loaded model\n", "def load_model(modelpath):\n", " model = hub.load(\"https://tfhub.dev/google/universal-sentence-encoder/4\") \n", " return model\n", "\n", "# return prediction based on loaded model (from the step above) and an input payload\n", "def predict(model, payload):\n", " try:\n", " if(type(payload) == str):\n", " data = [payload]\n", " else:\n", " data = [payload.decode()]# Multi model endpoints -> [payload[0]['body'].decode()]\n", " \n", " out = np.asarray(model(data)).tolist()\n", " except Exception as e:\n", " out = str(e)\n", " return [json.dumps({'output':[out],'tfeager': tf.executing_eagerly()})]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Does this work locally? (not \"_in a container locally_\", but _actually_ in local)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from modelscript_tensorflow import *\n", "model = load_model('./') # path doesn't matter here since we're loading the model directly in the script" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predict(model,inputs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ok great! Now let's install ezsmdeploy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_[To Do]_: currently local; replace with pip version!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip uninstall -y ezsmdeploy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install ezsmdeploy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ezsmdeploy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### If you have been running other inference containers in local mode, stop existing containers to avoid conflict" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!docker container stop $(docker container ls -aq) >/dev/null" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy locally" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Large models take longer to download and deploy (check TF hub source code to check. Also, keep in mind that hub models are downloaded in each worker; TF hub will recognize that all workers are set to download the same model and will not repeat the download; it will instead give you a _already being downloaded by \"worker id\"_ " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ez = ezsmdeploy.Deploy(model = None, #Since we are loading a model from TF hub\n", " script = 'modelscript_tensorflow.py',\n", " requirements = ['numpy','tensorflow-gpu==2.0.0','tensorflow_hub'], #or pass in the path to requirements.txt\n", " instance_type = 'local_gpu',\n", " wait = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test containerized version locally" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since you are downloading this model from a hub, the first time you invoke it will be slow, so invoke again to get an inference without all of the container logs. Prediction will especially be slow if your model is still downloading!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "out = ez.predictor.predict(inputs.encode()).decode()\n", "out" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!docker container stop $(docker container ls -aq) >/dev/null" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy on SageMaker" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ezonsm = ezsmdeploy.Deploy(model = None, #Since we are loading a model from TF hub,\n", " script = 'modelscript_tensorflow.py',\n", " requirements = ['numpy','tensorflow-gpu==2.0.0','tensorflow_hub'],\n", " wait = True,\n", " instance_type = 'ml.p3.2xlarge',\n", " monitor = True) # turn on model monitoring " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "out = ezonsm.predictor.predict(inputs).decode()\n", "out" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ezonsm.predictor.delete_endpoint()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 4 }