# bring-your-own-container for NGC TF&TRT BERT 
## Overview
You can build a docker image with NGC assets for a "bring your own container" approach for a NGC BERT on AWS SageMaker. 
- With launch arg `train`: the container fine-tunes a pre-trained TF BERT and converts the fine-tuned model to TRT
- With launch arg `serve`: the container can take inference requests. 

The NGC assets used can be seen in the `Dockerfile` and `entrypoint.sh`. 

> Note: the container used in this notebook assumes that the instance contains V100 GPUs, and will run the training job on the number of V100 GPUs available. Please select your instance type accordingly.

## Details
### About the Docker image

The Docker image is built from base image from NGC `nvcr.io/nvidia/tensorflow:19.12-tf1-py3`. Additional packages are installed for TensorRT, as indicated in the Dockerfile. This example using TensorRT6, same as the public repo of NVIDIA TF BERT - see the [trt directory](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT/trt).

### About the NGC model
The NGC model is a BERT large model. You can explore the model [here on NGC](https://ngc.nvidia.com/catalog/models/nvidia:bert_tf_pretraining_lamb_16n).

### About the entrypoint script
- For `train`, `entrypoint.sh` downloads a pre-trained BERT TF model from NGC, clones the [GitHub repo](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT/) (equivalent to [NGC model scripts](https://ngc.nvidia.com/catalog/model-scripts/nvidia:bert_for_tensorflow)), modifies some of the scripts a little for this use case, runs fine-tuning, then converts the fine-tuned model into a TensorRT engine. 

 - Note: the command in entrypoint.sh line 41 ```bash scripts/run_bert_squad.sh 5 5e-6 fp16 true $num_V100 384 128 large 1.1 $pretrained_modeldir/model.ckpt-1564 0.2 $finetuned_modeldir true true``` finetunes for 0.2 epochs to save time for demo purposes. If you want to fully fine-tune, change that 0.2 to 1.5 or more.

- For `serve`, `entrypoint.sh` calls `serve.py`, which starts and defines a server for the model. The place that defines how inference requests are handled is `predictor.py`, imported to `wsgi.py`, and called by `serve.py`. Serving scripts are modified from the [trt directory](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT/trt) in the GitHub repo.

As a first step, let's build the docker image and push it to ECR.

In [None]:
# build the image and push it to ECR
# build-and-push.sh takes in one arg: the tag. Here we tag the image with 0.1, but feel free to change the tag
# see docker/Dockerfile.sagemaker.gpu for details about the image
!cd docker && bash build-and-push.sh 0.1

## Environment Imports

For this notebook, you can use the kernel conda_pytorch_p36. 

In [None]:
import numpy as np
import sagemaker as sage

from sagemaker import get_execution_role

## Sagemaker Configuration

In [None]:
role = get_execution_role()
session = sage.Session()

TRAIN_INSTANCE_TYPE_ID = 'ml.p3.16xlarge'
TRAIN_INSTANCE_COUNT = 1

INFERENCE_INSTANCE_TYPE_ID = 'ml.p3.2xlarge'
INFERENCE_INSTANCE_COUNT = 1


OUTPUT_BUCKET = 's3://{bucket}/output'.format(bucket=session.default_bucket())


## Create Estimator and Train

In [None]:
account = session.boto_session.client('sts').get_caller_identity()['Account']
region = session.boto_session.region_name
image_name = '{acct}.dkr.ecr.{region}.amazonaws.com/ngc-tf-bert-sagemaker-demo:0.1'.format(acct=account, region=region)

estimator = sage.estimator.Estimator(image_name=image_name,
 role=role,
 train_instance_count=TRAIN_INSTANCE_COUNT,
 train_instance_type=TRAIN_INSTANCE_TYPE_ID,
 output_path=OUTPUT_BUCKET,
 sagemaker_session=session)

estimator.fit(inputs=None)

## Deploy the Estimator

In [None]:
predictor = estimator.deploy(initial_instance_count=INFERENCE_INSTANCE_COUNT,
 instance_type=INFERENCE_INSTANCE_TYPE_ID)

## Run one inference
Define the context and question, and run inference for one query.

In [None]:
from sagemaker.predictor import json_serializer
from sagemaker.content_types import CONTENT_TYPE_JSON
import numpy as np
short_paragraph_text = "The Apollo program was the third United States human spaceflight program. First conceived as a three-man spacecraft to follow the one-man Project Mercury which put the first Americans in space, Apollo was dedicated to President John F. Kennedy's national goal of landing a man on the Moon. The first manned flight of Apollo was in 1968. Apollo ran from 1961 to 1972 followed by the Apollo-Soyuz Test Project a joint Earth orbit mission with the Soviet Union in 1975."
question_text = "What project put the first Americans into space?"
qa_test_sample = {'short_paragraph_text':short_paragraph_text, 'question_text':question_text}

predictor.content_type=CONTENT_TYPE_JSON
predictor.serializer= json_serializer

predictor.predict(qa_test_sample).decode("utf-8") 

# Cleanup Endpoint

In [None]:
session.delete_endpoint(predictor.endpoint)