{ "cells": [ { "cell_type": "markdown", "id": "dab554c6", "metadata": {}, "source": [ "# Deploy Stable Diffusion on SageMaker with Triton Business Logic Scripting (BLS)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "\n", "\n", "---" ] }, { "cell_type": "markdown", "id": "5640df9b", "metadata": {}, "source": [ "In this notebook we will take most of the [example](https://github.com/triton-inference-server/server/tree/main/docs/examples/stable_diffusion) to host Stable Diffusion on Triton Inference Server provided by NVIDIA and adapt it to SageMaker.\n", "\n", "[Business Logic Scripting (BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting) is a Triton Inference Server feature that allows you to create complex inference logic, where loops, conditionals, data-dependent control flow and other custom logic needs to be intertwined with model execution. From within a Python script that runs on Triton's [Python backend](https://github.com/triton-inference-server/python_backend), you can run some of the required inference steps (light processing, even ML models that are not fit to be run on framework-specific backends), but also call other models hosted indepedently in the same server. This enables you to optimize some of the model component's execution performance (using TensorRT for example), while orchestrating the end-to-end inference flow with a comfortable Python interface.\n", "\n", "
model_repository/pipeline
directory in this example and access their path using the args['model_repository']
that Triton passes to the initialize()
method. An example of retrieving the path for a saved model artifact: f\"{args['model_repository']}/my_saved_model_dir/model.pt\"
\n",
"