With model serving framework(released in 2.4 as experimental feature), user can upload deep learning NLP model(support text embedding model only now) to OpenSearch cluster and run on [ML node](https://opensearch.org/docs/latest/ml-commons-plugin/index/#ml-node). To get better performance, we need GPU acceleration. We will support GPU ML node from 2.5. This doc explains how to prepare GPU ML node to run model serving framework (the setup is one-time effort). This doc focus on two types of GPU device: NVIDIA GPU and AWS Inferentia. # 1. NVIDIA GPU Test on AWS EC2 `g5.xlarge`, 64-bit(x86) - Ubuntu AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Ubuntu 20.04) 20221114` - Amazon Linux AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Amazon Linux 2) 20221114` - PyTorch: 1.12.1 - CUDA: 11.6 ## 1.1 mount nvidia-uvm device Check if you can see `nvidia-uvm` and `nvidia-uvm-tools` under `/dev` by running ``` ls -al /dev | grep nvidia-uvm ``` If not found, run script `nvidia-uvm-init.sh`. You may need to run with sudo. Content of `nvidia-uvm-init.sh` (refer to [nvidia doc](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications)): ``` #!/bin/bash ## Script to initialize nvidia device nodes. ## https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications /sbin/modprobe nvidia if [ "$?" -eq 0 ]; then # Count the number of NVIDIA controllers found. NVDEVS=`lspci | grep -i NVIDIA` N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l` NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l` N=`expr $N3D + $NVGA - 1` for i in `seq 0 $N`; do mknod -m 666 /dev/nvidia$i c 195 $i done mknod -m 666 /dev/nvidiactl c 195 255 else exit 1 fi /sbin/modprobe nvidia-uvm if [ "$?" -eq 0 ]; then # Find out the major device number used by the nvidia-uvm driver D=`grep nvidia-uvm /proc/devices | awk '{print $1}'` mknod -m 666 /dev/nvidia-uvm c $D 0 mknod -m 666 /dev/nvidia-uvm-tools c $D 0 else exit 1 fi ``` If you can see `nvidia-uvm` and `nvidia-uvm-tools` under `/dev`, then you can start OpenSearch. # 2. AWS Inferentia Test on AWS EC2 `inf1.xlarge`, 64-bit(x86) - Ubuntu AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Ubuntu 20.04) 20221114` - Amazon Linux AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Amazon Linux 2) 20221114` - PyTorch: 1.12.1 - CUDA: 11.6 ## 2.1 Fresh setup script You can use these scripts to setup new ML node. You can also check [2.2 Manual way](#22-manual-way) for more details. ### 2.1.1 Ubuntu 20.04 Test on AWS EC2 `inf1.xlarge`, 64-bit(x86) Ubuntu AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Ubuntu 20.04) 20221114` Download OpenSearch and set `OS_HOME` first. In this example, we install OpenSearch in home folder. ``` cd ~; wget https://artifacts.opensearch.org/releases/bundle/opensearch/2.5.0/opensearch-2.5.0-linux-x64.tar.gz tar -xvf opensearch-2.5.0-linux-x64.tar.gz echo "export OS_HOME=~/opensearch-2.5.0" | tee -a ~/.bash_profile echo "export PYTORCH_VERSION=1.12.1" | tee -a ~/.bash_profile source ~/.bash_profile ``` Create shell script file `prepare_torch_neuron.sh` and run it. Content of `prepare_torch_neuron.sh`: ``` # Configure Linux for Neuron repository updates . /etc/os-release sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <=2MB echo "-Xss2m" | tee -a $OS_HOME/config/jvm.options # Increase max file descriptors to 65535 echo "$(whoami) - nofile 65535" | sudo tee -a /etc/security/limits.conf # max virtual memory areas vm.max_map_count to 262144 sudo sysctl -w vm.max_map_count=262144 ``` Exit current terminal or open a new terminal to start OpenSearch. ### 2.1.2 Amazon Linux2 Test on AWS EC2 `inf1.xlarge`, 64-bit(x86) Amazon Linux AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Amazon Linux 2) 20221114` Download OpenSearch and set `OS_HOME` first. In this example, we install OpenSearch in home folder. ``` cd ~; wget https://artifacts.opensearch.org/releases/bundle/opensearch/2.5.0/opensearch-2.5.0-linux-x64.tar.gz tar -xvf opensearch-2.5.0-linux-x64.tar.gz echo "export OS_HOME=~/opensearch-2.5.0" | tee -a ~/.bash_profile echo "export PYTORCH_VERSION=1.12.1" | tee -a ~/.bash_profile source ~/.bash_profile ``` Create shell script file `prepare_torch_neuron.sh` and run it. Content of `prepare_torch_neuron.sh`: ``` # Configure Linux for Neuron repository updates sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <=2MB echo "-Xss2m" | tee -a $OS_HOME/config/jvm.options # Increase max file descriptors to 65535 echo "$(whoami) - nofile 65535" | sudo tee -a /etc/security/limits.conf # max virtual memory areas vm.max_map_count to 262144 sudo sysctl -w vm.max_map_count=262144 ``` Exit current terminal or open a new terminal to start OpenSearch. ## 2.2 Manual way ### 2.2.1 Install Driver Refer to [Deploy on AWS ML accelerator instance](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuron/setup/pytorch-install.html#deploy-on-aws-ml-accelerator-instance), choose tab “**Ubuntu 18 AMI/Ubuntu 20 AMI**”, if you are using different operation system, choose different tab accordingly. Copy the content here for easy reference. ``` # Configure Linux for Neuron repository updates . /etc/os-release sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null < # For example, if you install OS_HOME in your home folder, it will be # OS_HOME=~/opensearch-2.5.0 # Activate pytorch_venv first if you haven't. Refer to "Install Driver" part source pytorch_venv/bin/activate # Set pytorch neuron lib path. In this example, we create pytorch_venv in home folder, so PYTORCH_NEURON_LIB_PATH=~/pytorch_venv/lib/python3.7/site-packages/torch_neuron/lib/ mkdir -p $OS_HOME/lib/torch_neuron; cp -r $PYTORCH_NEURON_LIB_PATH/ $OS_HOME/lib/torch_neuron export PYTORCH_EXTRA_LIBRARY_PATH=$OS_HOME/lib/torch_neuron/lib/libtorchneuron.so ``` Increase JVM stack size to >=2MB ``` echo "-Xss2m" | sudo tee -a $OS_HOME/config/jvm.options ``` Then you can start OpenSearch and upload/load traced neuron model. You may see such error when start OpenSearch ``` [1]: max file descriptors [8192] for opensearch process is too low, increase to at least [65535] [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] ``` For the first one, run this command (need to login a new terminal to take effect) ``` echo "$(whoami) - nofile 65535" | sudo tee -a /etc/security/limits.conf ``` For the second one run this ``` sudo sysctl -w vm.max_map_count=262144 ``` # 3. DOCKER Tested on AWS EC2 `g5.xlarge`, 64-bit(x86) - Amazon Linux AMI: `AWS Deep Learning Base AMI GPU CUDA 11 (Ubuntu 20.04) 20221104` - PyTorch: 1.12.1 - Docker: 20.10.21 - CUDA: 11.6 - CUDA Driver: 510.47.03 - Docker Image: nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04 Some example commands. Start nivida/cuda docker container: ``` sudo sysctl -w vm.max_map_count=262144 docker run -it --runtime=nvidia --gpus all -p 9200:9200 nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04 /bin/bash ``` Start OpenSearch in nivida/cuda docker container: ``` wget https://artifacts.opensearch.org/releases/bundle/opensearch/2.5.0/opensearch-2.5.0-linux-x64.tar.gz tar -xvf opensearch-2.5.0-linux-x64.tar.gz cd opensearch-2.5.0 bash opensearch-tar-install.sh ```