{ "cells": [ { "cell_type": "markdown", "id": "1b2aebda", "metadata": {}, "source": [ "# BART Large model deployment on Amazon SageMaker Multi-model endpoints (MME) with GPU \n", "\n", "\n", "\n", "Amazon SageMaker multi-model endpoints(MME) provide a scalable and cost-effective way to deploy large number of deep learning models. Previously, customers had limited options to deploy 100s of deep learning models that need accelerated compute with GPUs. Now customers can deploy 1000s of deep learning models behind one SageMaker endpoint. Now, MME will run multiple models on a GPU, share GPU instances behind an endpoint across multiple models and dynamically load/unload models based on the incoming traffic. With this, customers can significantly save cost and achieve best price performance.\n", "\n", "\n", "\n", "