{ "cells": [ { "cell_type": "markdown", "id": "9ff4b41e-c9bc-4c15-9536-3f3e6fa783fb", "metadata": {}, "source": [ "# PyTorch Inference using KServe on AWS Kubeflow Distribution\n", "\n", "\n", "In this example, we deploy a trained PyTorch CNN image classification model to predict CIFAR 10 images by running an InferenceService with TorchServe runtime which is the default installed serving runtime for PyTorch models.\n", "\n", "We will create KServe InferenceService yaml, pipeline, and will run the inference all from this notebook " ] }, { "cell_type": "markdown", "id": "baaf7442-969a-4cae-87c0-76b790c2d606", "metadata": {}, "source": [ "The KServe/TorchServe integration expects following model store layout on the storage with TorchServe Model Archive and Model Configuration.\n" ] }, { "cell_type": "raw", "id": "1d92be2b-02e0-4ea8-8033-39c721631a29", "metadata": {}, "source": [ "├── config\n", "│   ├── config.properties\n", "├── model-store\n", "│   ├── densenet_161.mar\n", "│   ├── mnist.mar\n" ] }, { "cell_type": "markdown", "id": "4dc7ba1d-8dd6-43eb-a4db-82211b8d4f96", "metadata": {}, "source": [ "TorchServe provides a utility to package all the model artifacts into a single TorchServe Model Archive Files (MAR), after model artifacts are packaged into MAR file you then upload to the model-store under model storage path which will be S3 in this example." ] }, { "cell_type": "code", "execution_count": null, "id": "e03112ae-20e0-41a7-bfba-7f793902a849", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Restart Kernel after you have installed the utility\n", "\n", "# Install kserve python sdk to create and run InferenceService yaml\n", "!pip install kserve==0.7.0\n", "\n", "# Install torch-model-archiver, a TorchServe utility to package all the model artifacts into a single TorchServe Model Archive Files (MAR). \n", "!pip install torch-model-archiver" ] }, { "cell_type": "code", "execution_count": null, "id": "0dc9c965-6d11-4de2-b19d-64ad3544d12a", "metadata": {}, "outputs": [], "source": [ "import argparse\n", "import json\n", "import logging\n", "import os\n", "import sys\n", "import torch\n", "import torch.distributed as dist\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "import torch.utils.data\n", "import torch.utils.data.distributed\n", "import torchvision\n", "from torchvision import datasets, transforms\n", "import json\n", "#import subprocess\n", "from model import Net\n", "\n", "from kubernetes import client \n", "from kserve import KServeClient\n", "from kserve import constants\n", "from kserve import utils\n", "from kserve import V1beta1InferenceService\n", "from kserve import V1beta1InferenceServiceSpec\n", "from kserve import V1beta1PredictorSpec\n", "from kserve import V1beta1SKLearnSpec\n", "from kserve import V1beta1TorchServeSpec\n", "from kserve import V1beta1TorchServeSpec\n", "from kubernetes.client import models as k8s\n", "\n", "import kfp\n", "from kfp import components\n", "from kubeflow.training.utils import utils\n", "from kfp import dsl\n", "from kfp import compiler\n", "\n", "logger = logging.getLogger(__name__)\n", "logger.setLevel(logging.DEBUG)\n", "logger.addHandler(logging.StreamHandler(sys.stdout))" ] }, { "cell_type": "code", "execution_count": null, "id": "325ea145-e904-4787-a031-c909dc5264ee", "metadata": {}, "outputs": [], "source": [ "# Load the model created by training step \n", "model = Net()" ] }, { "cell_type": "markdown", "id": "171e3a50-bb00-41e2-af02-1f19a3d5ae45", "metadata": {}, "source": [ "## Load the model " ] }, { "cell_type": "code", "execution_count": null, "id": "d04d870e-01d5-4fb4-a447-321252023796", "metadata": {}, "outputs": [], "source": [ "# Same bucket name that we used in notebook 1_submit_pytorchdist_k8s.ipynb\n", "s3_bucket_name='kserve-model-20230527042622'\n", "model_name='model-kserve.pth'\n", "!aws s3 cp s3://{s3_bucket_name}/{model_name} ./kserve_inference" ] }, { "cell_type": "markdown", "id": "e769232f-d8f6-490f-a434-38e378f37452", "metadata": {}, "source": [ "## Create model store required by KServe/TorchServe integration" ] }, { "cell_type": "code", "execution_count": null, "id": "d93563dc-ca8e-4c88-965f-096bdd3f2e6e", "metadata": {}, "outputs": [], "source": [ "# Generate the MAR file with torch-model-archiver\n", "!$HOME/.local/bin/torch-model-archiver -f --model-name cifar --version 1.0 --model-file ./kserve_inference/model.py --serialized-file ./kserve_inference/model-kserve.pth --handler ./kserve_inference/model_handler.py && mv cifar.mar ./kserve_inference/" ] }, { "cell_type": "code", "execution_count": null, "id": "1b55a3b6-6699-4450-9ba9-a49cedb8cb66", "metadata": {}, "outputs": [], "source": [ "#Clean up the madel_layout package if it is alread there\n", "#!rm -r model_layout" ] }, { "cell_type": "code", "execution_count": null, "id": "02d8693f-9832-49c5-b5f2-4e71256369c4", "metadata": {}, "outputs": [], "source": [ "#Create a model store layout required by the PyTorch Serve\n", "!mkdir -p kserve_inference/model_layout/config \n", "!mkdir -p kserve_inference/model_layout/model-store" ] }, { "cell_type": "code", "execution_count": null, "id": "6cce085a-bbb4-437e-aa27-289180b78747", "metadata": {}, "outputs": [], "source": [ "# Copy the model archive\n", "!cp kserve_inference/cifar.mar kserve_inference/model_layout/model-store/\n", "!cp kserve_inference/config.properties kserve_inference/model_layout/config/" ] }, { "cell_type": "code", "execution_count": null, "id": "d78eed34-e351-43d9-8880-ca2873a6459b", "metadata": {}, "outputs": [], "source": [ "# Load the model package to S3 bucket \n", "!aws s3 cp ./kserve_inference/model_layout s3://{s3_bucket_name}/ --recursive" ] }, { "cell_type": "code", "execution_count": null, "id": "2134bd98-f213-4ed6-95fb-999cca795d78", "metadata": {}, "outputs": [], "source": [ "namespace = utils.get_default_target_namespace()\n", "namespace" ] }, { "cell_type": "markdown", "id": "5a7388d6-a7e9-4a40-bbcd-530e9ab6f692", "metadata": {}, "source": [ "## Create InferenceService yaml" ] }, { "cell_type": "code", "execution_count": null, "id": "4cbf38c9-4175-4247-995c-75a70c4707e4", "metadata": { "tags": [] }, "outputs": [], "source": [ "resource_req=k8s.V1ResourceRequirements(requests={'cpu':'100m'})\n", "predictor_spec=V1beta1PredictorSpec(service_account_name=\"aws-sa\",\n", " pytorch=(V1beta1TorchServeSpec(\n", " resources=resource_req,\n", " storage_uri=f's3://{s3_bucket_name}')))\n", " \n", "#predictor_spec" ] }, { "cell_type": "code", "execution_count": null, "id": "0a6009bd-f151-4fad-8ec3-c6221dd1e257", "metadata": {}, "outputs": [], "source": [ "inference_endpoint='image-classify'\n", "kserve_version='v1beta1'\n", "api_version = constants.KSERVE_GROUP + '/' + kserve_version\n", "\n", "isvc = V1beta1InferenceService(api_version=api_version,\n", " kind=constants.KSERVE_KIND,\n", " metadata=client.V1ObjectMeta(\n", " name=inference_endpoint, namespace=namespace, annotations={'sidecar.istio.io/inject':'false'}),\n", " spec=V1beta1InferenceServiceSpec(\n", " predictor=predictor_spec)\n", ")\n", "\n", "#isvc" ] }, { "cell_type": "code", "execution_count": null, "id": "39c1d531-7ac9-49c3-a6c2-daa0b90ba0a5", "metadata": { "tags": [] }, "outputs": [], "source": [ "#Create KServe Inference EndPoint\n", "KServe = KServeClient()\n", "KServe.create(isvc)" ] }, { "cell_type": "code", "execution_count": null, "id": "7780d061-ebcc-4afb-9d47-7b03847405f5", "metadata": { "tags": [] }, "outputs": [], "source": [ "#Delete the KServe Inference EndPoint \n", "#name=\"image-classify\"\n", "#KServe.delete(name, namespace=namespace)" ] }, { "cell_type": "code", "execution_count": null, "id": "f6e84477-c825-48f6-a4ba-fd9f90b5f17e", "metadata": { "tags": [] }, "outputs": [], "source": [ "#KServe.get(inference_endpoint, namespace=namespace)" ] }, { "cell_type": "code", "execution_count": null, "id": "a05ec6b9-110e-4504-9307-829d36d428fe", "metadata": { "tags": [] }, "outputs": [], "source": [ "KServe.get(inference_endpoint, namespace=namespace, timeout_seconds=120)" ] }, { "cell_type": "code", "execution_count": null, "id": "2055f186-ecfb-44f1-b618-24788be101ad", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 5 }