{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# KFServing Sample \n", "\n", "In this notebook, we provide two samples for demonstrating KFServing SDK and YAML versions.\n", "\n", "### Setup\n", "1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).\n", "2. Your cluster's Istio Ingress gateway must be network accessible, you can do: \n", " `kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80`. \n", "\n", "## 1. KFServing SDK sample\n", "\n", "Below is a sample for KFServing SDK. \n", "\n", "It shows how to use KFServing SDK to create, get, rollout_canary, promote and delete InferenceService." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prerequisites" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install kfserving kubernetes --user" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from kubernetes import client\n", "\n", "from kfserving import KFServingClient\n", "from kfserving import constants\n", "from kfserving import utils\n", "from kfserving import V1alpha2EndpointSpec\n", "from kfserving import V1alpha2PredictorSpec\n", "from kfserving import V1alpha2TensorflowSpec\n", "from kfserving import V1alpha2InferenceServiceSpec\n", "from kfserving import V1alpha2InferenceService\n", "from kubernetes.client import V1ResourceRequirements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define namespace where InferenceService needs to be deployed to. If not specified, below function defines namespace to the current one where SDK is running in the cluster, otherwise it will deploy to default namespace." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "namespace = utils.get_default_target_namespace()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Label namespace so you can run inference tasks in it" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!kubectl label namespace $namespace serving.kubeflow.org/inferenceservice=enabled" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define InferenceService\n", "Firstly define default endpoint spec, and then define the inferenceservice basic on the endpoint spec." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "api_version = constants.KFSERVING_GROUP + '/' + constants.KFSERVING_VERSION\n", "default_endpoint_spec = V1alpha2EndpointSpec(\n", " predictor=V1alpha2PredictorSpec(\n", " tensorflow=V1alpha2TensorflowSpec(\n", " storage_uri='gs://kfserving-samples/models/tensorflow/flowers',\n", " resources=V1ResourceRequirements(\n", " requests={'cpu':'100m','memory':'1Gi'},\n", " limits={'cpu':'100m', 'memory':'1Gi'}\n", " )\n", " )\n", " )\n", " )\n", " \n", "isvc = V1alpha2InferenceService(\n", " api_version=api_version,\n", " kind=constants.KFSERVING_KIND,\n", " metadata=client.V1ObjectMeta(name='flower-sample', namespace=namespace),\n", " spec=V1alpha2InferenceServiceSpec(default=default_endpoint_spec)\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create InferenceService\n", "Call KFServingClient to create InferenceService." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "KFServing = KFServingClient()\n", "KFServing.create(isvc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check the InferenceService" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "KFServing.get('flower-sample', namespace=namespace, watch=True, timeout_seconds=120)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Invoke Endpoint\n", "\n", "If you want to invoke endpoint by yourself, you can copy and paste below code block and execute in your local environment. Remember you should have a `kfserving-flowers-input.json` file in the same directory when you execute. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "MODEL_NAME=flower-sample\n", "INPUT_PATH=@./kfserving-flowers-input.json\n", "INGRESS_GATEWAY=istio-ingressgateway\n", "SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n $namespace -o jsonpath='{.status.url}' | cut -d \"/\" -f 3)\n", "\n", "curl -v -H \"Host: ${SERVICE_HOSTNAME}\" http://localhost:8080/v1/models/$MODEL_NAME:predict -d $INPUT_PATH" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Expected Output\n", "```\n", "* Trying 34.83.190.188...\n", "* TCP_NODELAY set\n", "* Connected to 34.83.190.188 (34.83.190.188) port 80 (#0)\n", "> POST /v1/models/flowers-sample:predict HTTP/1.1\n", "> Host: flowers-sample.default.svc.cluster.local\n", "> User-Agent: curl/7.60.0\n", "> Accept: */*\n", "> Content-Length: 16201\n", "> Content-Type: application/x-www-form-urlencoded\n", "> Expect: 100-continue\n", "> \n", "< HTTP/1.1 100 Continue\n", "* We are completely uploaded and fine\n", "< HTTP/1.1 200 OK\n", "< content-length: 204\n", "< content-type: application/json\n", "< date: Fri, 10 May 2019 23:22:04 GMT\n", "< server: envoy\n", "< x-envoy-upstream-service-time: 19162\n", "< \n", "{\n", " \"predictions\": [\n", " {\n", " \"scores\": [0.999115, 9.20988e-05, 0.000136786, 0.000337257, 0.000300533, 1.84814e-05],\n", " \"prediction\": 0,\n", " \"key\": \" 1\"\n", " }\n", " ]\n", "* Connection #0 to host 34.83.190.188 left intact\n", "}%\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add Canary to InferenceService\n", "Firstly define canary endpoint spec, and then rollout 10% traffic to the canary version, watch the rollout process." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canary_endpoint_spec = V1alpha2EndpointSpec(\n", " predictor=V1alpha2PredictorSpec(\n", " tensorflow=V1alpha2TensorflowSpec(\n", " storage_uri='gs://kfserving-samples/models/tensorflow/flowers-2',\n", " resources=V1ResourceRequirements(\n", " requests={'cpu':'100m','memory':'1Gi'},\n", " limits={'cpu':'100m', 'memory':'1Gi'}\n", " )\n", " )\n", " )\n", " )\n", "\n", "KFServing.rollout_canary('flower-sample', canary=canary_endpoint_spec, percent=10,\n", " namespace=namespace, watch=True, timeout_seconds=120)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rollout more traffic to canary of the InferenceService\n", "Rollout traffice percent to 50% to canary version." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "KFServing.rollout_canary('flower-sample', percent=50, namespace=namespace,\n", " watch=True, timeout_seconds=120)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Users send request to service 100 times." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "MODEL_NAME=flowers-sample\n", "INPUT_PATH=@./kfserving-flowers-input.json\n", "INGRESS_GATEWAY=istio-ingressgateway\n", "SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n $namespace -o jsonpath='{.status.url}' | cut -d \"/\" -f 3)\n", "\n", "for i in {0..100};\n", "do\n", " curl -v -H \"Host: ${SERVICE_HOSTNAME}\" http://localhost:8080/v1/models/$MODEL_NAME:predict -d $INPUT_PATH;\n", "done" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "check if traffic is split" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "default_count=$(kubectl get replicaset -n $namespace -l serving.knative.dev/configuration=flowers-sample-predictor-default -o jsonpath='{.items[0].status.observedGeneration}')\n", "canary_count=$(kubectl get replicaset -n $namespace -l serving.knative.dev/configuration=flowers-sample-predictor-canary -o jsonpath='{.items[0].status.observedGeneration}')\n", "\n", "echo \"\\nThe count of traffic route to default: $default_count\"\n", "echo \"The count of traffic route to canary: $canary_count\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Promote Canary to Default" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "KFServing.promote('flower-sample', namespace=namespace, watch=True, timeout_seconds=120)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Delete the InferenceService" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "KFServing.delete('flower-sample', namespace=namespace)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Sample for Kfserving YAML\n", "\n", "Note: You should execute all the code blocks in your local environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the InferenceService\n", "Apply the CRD" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!kubectl apply -n $namespace -f kfserving-flowers.yaml " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Expected Output\n", "```\n", "$ inferenceservice.serving.kubeflow.org/flowers-sample configured\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run a prediction\n", "\n", "Use `istio-ingressgateway` as your `INGRESS_GATEWAY` if you are deploying KFServing as part of Kubeflow install, and not independently.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "MODEL_NAME=flowers-sample\n", "INPUT_PATH=@./kfserving-flowers-input.json\n", "INGRESS_GATEWAY=istio-ingressgateway\n", "SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n $namespace -o jsonpath='{.status.url}' | cut -d \"/\" -f 3)\n", "\n", "curl -v -H \"Host: ${SERVICE_HOSTNAME}\" http://localhost:8080/v1/models/$MODEL_NAME:predict -d $INPUT_PATH" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you stop making requests to the application, you should eventually see that your application scales itself back down to zero. Watch the pod until you see that it is `Terminating`. This should take approximately 90 seconds." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!kubectl get pods --watch -n $namespace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: To exit the watch, use `ctrl + c`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Canary Rollout\n", "\n", "To test a canary rollout, you can use the tensorflow-canary.yaml \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Apply the CRD" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!kubectl apply -n $namespace -f kfserving-flowers-canary.yaml " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To verify if your traffic split percenage is applied correctly, you can use the following command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!kubectl get inferenceservices -n $namespace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output should looks the similar as below:\n", "```\n", "NAME READY URL DEFAULT TRAFFIC CANARY TRAFFIC AGE\n", "flowers-sample True http://flowers-sample.default.example.com 90 10 48s\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "MODEL_NAME=flowers-sample\n", "INPUT_PATH=@./kfserving-flowers-input.json\n", "INGRESS_GATEWAY=istio-ingressgateway\n", "SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n $namespace -o jsonpath='{.status.url}' | cut -d \"/\" -f 3)\n", "\n", "for i in {0..100};\n", "do\n", " curl -v -H \"Host: ${SERVICE_HOSTNAME}\" http://localhost:8080/v1/models/$MODEL_NAME:predict -d $INPUT_PATH;\n", "done" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Verify if traffic split" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "default_count=$(kubectl get replicaset -n $namespace -l serving.knative.dev/configuration=flowers-sample-predictor-default -o jsonpath='{.items[0].status.observedGeneration}')\n", "canary_count=$(kubectl get replicaset -n $namespace -l serving.knative.dev/configuration=flowers-sample-predictor-canary -o jsonpath='{.items[0].status.observedGeneration}')\n", "\n", "echo \"\\nThe count of traffic route to default: $default_count\"\n", "echo \"The count of traffic route to canary: $canary_count\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Clean Up Resources" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!kubectl delete inferenceservices flowers-sample -n $namespace" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 }