# Inference workload deployment sample with optional bin-packing The aws-do-inference repository contains an end-to-end example for running model inference locally on Docker or at scale on EKS. It supports CPU, GPU, and Inferentia processors and can pack multiple models in a single processor core for improved cost efficiency. While this example focuses on one processor target at a time, iterating over the steps below for CPU/GPU and Inferentia enables hybrid deployments where the best processor/accelerator is used to serve each model depending on its resource consumption profile. In this sample repository, we use a [bert-base](https://huggingface.co/distilbert-base-multilingual-cased) NLP model from [huggingface.co](https://huggingface.co/), however the project structure and workflow is generic and can be adapted for use with other models.

Fig. 1 - Sample EKS infrastructure for inference workloads

The inference workloads in this sample project are deployed on the CPU, GPU, or Inferentia nodes as shown on Fig. 1. The control scripts run in any location that has access to the cluster API. To eliminate latency concern related to the cluster ingress, load tests run in a pod within the cluster and send requests to the models directly through the cluster pod network.

Fig. 2 - aws-do-inference video walkthrough

See an end-to-end accelerated [video walkthrough](https://bit.ly/aws-do-inference-video) (7 min) or follow the instructions below to build and run your own inference solution. ## Prerequisites It is assumed that an [EKS cluster exists](https://github.com/aws-samples/aws-do-eks) and contains nodegroups of the desired target instance types. In addition it is assumed that the following basic tools are present: [docker](https://docs.docker.com/get-docker/), [kubectl](https://kubernetes.io/docs/tasks/tools/), [envsubst](https://command-not-found.com/envsubst), [kubetail](https://github.com/johanhaleby/kubetail), [bc](https://howtoinstall.co/en/bc). ## Operation The project is operated through a set of action scripts as described below. To complete a full cycle from beginning-to-end, first configure the project, then follow steps 1 through 5 executing the corresponding action scripts. Each of the action scripts has a help screen, which can be invoked by passing "help" as argument: `