# Serve a TensorFlow hub model

The model for this example was trained using this sample notebook on sagemaker - https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb

It is certainly easiler to do estimator.deploy() using the standard Sagemaker SDK if you are following that example, but cinsider this one if you have a pytorch model (or two) on S3 and you are looking for an easy way to test and deploy this model. Using tensorflow-gpu==2.0.0 instead of normal tf because of a live issue regarding libinfer.so

In [None]:
!pip install --upgrade pip
!pip install wrapt --upgrade --ignore-installed
!pip install --upgrade tensorflow-gpu==2.0.0 tensorflow-hub

In [None]:
inputs = "The quick brown fox jumps over the lazy dog."

In [None]:
import tensorflow
import tensorflow_hub as hub

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

In [None]:
embeddings = embed([inputs])
print(embeddings)

## Step 1 : Write a model transform script

#### Make sure you have a ...

- "load_model" function
    - input args are model path
    - returns loaded model object
    - model name is the same as what you saved the model file as (see above step)
<br><br>
- "predict" function
    - input args are the loaded model object and a payload
    - returns the result of model.predict
    - make sure you format it as a single (or multiple) string return inside a list for real time (for mini batch)
    - from a client, a list  or string or np.array that is sent for prediction is interpreted as bytes. Do what you have to for converting back to list or string or np.array
    - return the error for debugging


In [None]:
%%writefile modelscript_tensorflow.py
import tensorflow as tf
import numpy as np
import tensorflow_hub as hub
import json

#Return loaded model
def load_model(modelpath):
    model = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4") 
    return model

# return prediction based on loaded model (from the step above) and an input payload
def predict(model, payload):
    try:
        if(type(payload) == str):
            data = [payload]
        else:
            data = [payload.decode()]# Multi model endpoints -> [payload[0]['body'].decode()]
            
        out = np.asarray(model(data)).tolist()
    except Exception as e:
        out = str(e)
    return [json.dumps({'output':[out],'tfeager': tf.executing_eagerly()})]

## Does this work locally? (not "_in a container locally_", but _actually_ in local)

In [None]:
from modelscript_tensorflow import *
model = load_model('./') # path doesn't matter here since we're loading the model directly in the script

In [None]:
predict(model,inputs)

### ok great! Now let's install ezsmdeploy

_[To Do]_: currently local; replace with pip version!

In [None]:
!pip uninstall -y ezsmdeploy

In [None]:
!pip install ezsmdeploy

In [None]:
import ezsmdeploy

#### If you have been running other inference containers in local mode, stop existing containers to avoid conflict

In [None]:
!docker container stop $(docker container ls -aq) >/dev/null

## Deploy locally

Large models take longer to download and deploy (check TF hub source code to check. Also, keep in mind that hub models are downloaded in each worker; TF hub will recognize that all workers are set to download the same model and will not repeat the download; it will instead give you a _already being downloaded by "worker id"_ 

In [None]:
ez = ezsmdeploy.Deploy(model = None, #Since we are loading a model from TF hub
                  script = 'modelscript_tensorflow.py',
                  requirements = ['numpy','tensorflow-gpu==2.0.0','tensorflow_hub'], #or pass in the path to requirements.txt
                  instance_type = 'local_gpu',
                  wait = True)

## Test containerized version locally

Since you are downloading this model from a hub, the first time you invoke it will be slow, so invoke again to get an inference without all of the container logs. Prediction will especially be slow if your model is still downloading!

In [None]:
out = ez.predictor.predict(inputs.encode()).decode()
out

In [None]:
!docker container stop $(docker container ls -aq) >/dev/null

## Deploy on SageMaker

In [None]:
ezonsm = ezsmdeploy.Deploy(model = None, #Since we are loading a model from TF hub,
                  script = 'modelscript_tensorflow.py',
                  requirements = ['numpy','tensorflow-gpu==2.0.0','tensorflow_hub'],
                  wait = True,
                  instance_type = 'ml.p3.2xlarge',
                  monitor = True) # turn on model monitoring 

In [None]:
out = ezonsm.predictor.predict(inputs).decode()
out

In [None]:
ezonsm.predictor.delete_endpoint()