### Instructions to build the the Huggingface model inference container with Neuron SDK #### Step1: Launch an EC2 instance to build and run the container: Launch https://console.aws.amazon.com/ec2/v2/home Select "Launch instance" In "Choose an Amazon Machine Image (AMI) - search for DLAMI Select images "in AWS Marketplace" Select AWS Deep Learning AMI (Amazon Linux 2) In "Choose an instance type" choose "inf1.2xlarge" and select "Review and Launch" and then "Launch" to launch the instance It may take 2-10 mins for instance to launch. once instance in is in "Running" state follow the instructions in Connect to your instance to connect to the instance #### Step2: Download and unzip file with contents to local folder (e.g. "Huggingface_Transformers") #### Step 3: Open dockerfile, edit and replace code as below 1: replace - with location of model_store folder (e.g. ./model_store) in step 2 oftorchserve/model_store 2: replace with location of config.properties from folder (e.g. ./) in step 2 3: replace dockerd-entrypoint.sh with location of dockerd-entrypoint.sh location from step 2 (./) 4: to build: docker docker build . -f Dockerfile -t torch-neuron-rtd Prior to running the container, make sure that the Neuron runtime on the instance is turned off, by running the command: ``` sudo service neuron-rtd stop ``` Docker Run : ``` docker run --device=/dev/neuron0 -p 8443:8443 -p 8081:8081 -p 8082:8082 --cap-add IPC_LOCK -it torch-neuron-rtd ``` Test torchserve health ``` curl http://127.0.0.1:8443/ping ``` Output should be "Healthy" ``` { "status": "Healthy" } ``` Load model ``` MAX_LENGTH=$(jq '.max_length' config.json) BATCH_SIZE=$(jq '.batch_size' config.json) MODEL_NAME=bert-max_length$MAX_LENGTH-batch_size$BATCH_SIZE MAX_BATCH_DELAY=5000 # ms timeout before a partial batch is processed INITIAL_WORKERS=4 # number of models that will be loaded at launch curl -X POST "http://127.0.0.1:8081/models?url=$MODEL_NAME.mar&batch_size=$BATCH_SIZE&initial_workers=$INITIAL_WORKERS&max_batch_delay=$MAX_BATCH_DELAY" ``` output will be similar to below ``` { "status": "Model \"bert-max_length128-batch_size6\" Version: 1.0 registered with 4 initial workers" } ``` Test inference using infer_bert.py included in the zip ``` python3 infer_bert.py ``` Output : ``` 0 ['paraphrase'] 2 ['paraphrase'] 3 ['not paraphrase'] 5 ['not paraphrase'] 4 ['not paraphrase'] 1 ['paraphrase'] ``` ## Security See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. ## License This library is licensed under the MIT-0 License. See the LICENSE file.