# KFServing Sample 

In this notebook, we provide two samples for demonstrating KFServing SDK and YAML versions.

### Setup
1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/blob/master/docs/DEVELOPER_GUIDE.md#deploy-kfserving).
2. Your cluster's Istio Ingress gateway must be network accessible, you can do: 
 `kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80`. 

## 1. KFServing SDK sample

Below is a sample for KFServing SDK. 

It shows how to use KFServing SDK to create, get, rollout_canary, promote and delete InferenceService.

### Prerequisites

In [None]:
!pip install kfserving kubernetes --user

In [None]:
from kubernetes import client

from kfserving import KFServingClient
from kfserving import constants
from kfserving import utils
from kfserving import V1alpha2EndpointSpec
from kfserving import V1alpha2PredictorSpec
from kfserving import V1alpha2TensorflowSpec
from kfserving import V1alpha2InferenceServiceSpec
from kfserving import V1alpha2InferenceService
from kubernetes.client import V1ResourceRequirements

Define namespace where InferenceService needs to be deployed to. If not specified, below function defines namespace to the current one where SDK is running in the cluster, otherwise it will deploy to default namespace.

In [None]:
namespace = utils.get_default_target_namespace()

### Label namespace so you can run inference tasks in it

In [None]:
!kubectl label namespace $namespace serving.kubeflow.org/inferenceservice=enabled

### Define InferenceService
Firstly define default endpoint spec, and then define the inferenceservice basic on the endpoint spec.

In [None]:
api_version = constants.KFSERVING_GROUP + '/' + constants.KFSERVING_VERSION
default_endpoint_spec = V1alpha2EndpointSpec(
 predictor=V1alpha2PredictorSpec(
 tensorflow=V1alpha2TensorflowSpec(
 storage_uri='gs://kfserving-samples/models/tensorflow/flowers',
 resources=V1ResourceRequirements(
 requests={'cpu':'100m','memory':'1Gi'},
 limits={'cpu':'100m', 'memory':'1Gi'}
 )
 )
 )
 )
 
isvc = V1alpha2InferenceService(
 api_version=api_version,
 kind=constants.KFSERVING_KIND,
 metadata=client.V1ObjectMeta(name='flower-sample', namespace=namespace),
 spec=V1alpha2InferenceServiceSpec(default=default_endpoint_spec)
 )

### Create InferenceService
Call KFServingClient to create InferenceService.

In [None]:
KFServing = KFServingClient()
KFServing.create(isvc)

### Check the InferenceService

In [None]:
KFServing.get('flower-sample', namespace=namespace, watch=True, timeout_seconds=120)

### Invoke Endpoint

If you want to invoke endpoint by yourself, you can copy and paste below code block and execute in your local environment. Remember you should have a `kfserving-flowers-input.json` file in the same directory when you execute. 

In [None]:
%%bash

MODEL_NAME=flower-sample
INPUT_PATH=@./kfserving-flowers-input.json
INGRESS_GATEWAY=istio-ingressgateway
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n $namespace -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://localhost:8080/v1/models/$MODEL_NAME:predict -d $INPUT_PATH

Expected Output
```
* Trying 34.83.190.188...
* TCP_NODELAY set
* Connected to 34.83.190.188 (34.83.190.188) port 80 (#0)
> POST /v1/models/flowers-sample:predict HTTP/1.1
> Host: flowers-sample.default.svc.cluster.local
> User-Agent: curl/7.60.0
> Accept: */*
> Content-Length: 16201
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< content-length: 204
< content-type: application/json
< date: Fri, 10 May 2019 23:22:04 GMT
< server: envoy
< x-envoy-upstream-service-time: 19162
< 
{
 "predictions": [
 {
 "scores": [0.999115, 9.20988e-05, 0.000136786, 0.000337257, 0.000300533, 1.84814e-05],
 "prediction": 0,
 "key": " 1"
 }
 ]
* Connection #0 to host 34.83.190.188 left intact
}%
```

### Add Canary to InferenceService
Firstly define canary endpoint spec, and then rollout 10% traffic to the canary version, watch the rollout process.

In [None]:
canary_endpoint_spec = V1alpha2EndpointSpec(
 predictor=V1alpha2PredictorSpec(
 tensorflow=V1alpha2TensorflowSpec(
 storage_uri='gs://kfserving-samples/models/tensorflow/flowers-2',
 resources=V1ResourceRequirements(
 requests={'cpu':'100m','memory':'1Gi'},
 limits={'cpu':'100m', 'memory':'1Gi'}
 )
 )
 )
 )

KFServing.rollout_canary('flower-sample', canary=canary_endpoint_spec, percent=10,
 namespace=namespace, watch=True, timeout_seconds=120)

### Rollout more traffic to canary of the InferenceService
Rollout traffice percent to 50% to canary version.

In [None]:
KFServing.rollout_canary('flower-sample', percent=50, namespace=namespace,
 watch=True, timeout_seconds=120)

Users send request to service 100 times.

In [None]:
%%bash

MODEL_NAME=flowers-sample
INPUT_PATH=@./kfserving-flowers-input.json
INGRESS_GATEWAY=istio-ingressgateway
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n $namespace -o jsonpath='{.status.url}' | cut -d "/" -f 3)

for i in {0..100};
do
 curl -v -H "Host: ${SERVICE_HOSTNAME}" http://localhost:8080/v1/models/$MODEL_NAME:predict -d $INPUT_PATH;
done

check if traffic is split

In [None]:
%%bash

default_count=$(kubectl get replicaset -n $namespace -l serving.knative.dev/configuration=flowers-sample-predictor-default -o jsonpath='{.items[0].status.observedGeneration}')
canary_count=$(kubectl get replicaset -n $namespace -l serving.knative.dev/configuration=flowers-sample-predictor-canary -o jsonpath='{.items[0].status.observedGeneration}')

echo "\nThe count of traffic route to default: $default_count"
echo "The count of traffic route to canary: $canary_count"

### Promote Canary to Default

In [None]:
KFServing.promote('flower-sample', namespace=namespace, watch=True, timeout_seconds=120)

### Delete the InferenceService

In [None]:
KFServing.delete('flower-sample', namespace=namespace)

## 2. Sample for Kfserving YAML

Note: You should execute all the code blocks in your local environment.

### Create the InferenceService
Apply the CRD

In [None]:
!kubectl apply -n $namespace -f kfserving-flowers.yaml 

Expected Output
```
$ inferenceservice.serving.kubeflow.org/flowers-sample configured
```

### Run a prediction

Use `istio-ingressgateway` as your `INGRESS_GATEWAY` if you are deploying KFServing as part of Kubeflow install, and not independently.


In [None]:
%%bash

MODEL_NAME=flowers-sample
INPUT_PATH=@./kfserving-flowers-input.json
INGRESS_GATEWAY=istio-ingressgateway
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n $namespace -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://localhost:8080/v1/models/$MODEL_NAME:predict -d $INPUT_PATH

If you stop making requests to the application, you should eventually see that your application scales itself back down to zero. Watch the pod until you see that it is `Terminating`. This should take approximately 90 seconds.

In [None]:
!kubectl get pods --watch -n $namespace

Note: To exit the watch, use `ctrl + c`.

### Canary Rollout

To test a canary rollout, you can use the tensorflow-canary.yaml 


Apply the CRD

In [None]:
!kubectl apply -n $namespace -f kfserving-flowers-canary.yaml 

To verify if your traffic split percenage is applied correctly, you can use the following command:

In [None]:
!kubectl get inferenceservices -n $namespace

The output should looks the similar as below:
```
NAME READY URL DEFAULT TRAFFIC CANARY TRAFFIC AGE
flowers-sample True http://flowers-sample.default.example.com 90 10 48s
```

In [None]:
%%bash

MODEL_NAME=flowers-sample
INPUT_PATH=@./kfserving-flowers-input.json
INGRESS_GATEWAY=istio-ingressgateway
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -n $namespace -o jsonpath='{.status.url}' | cut -d "/" -f 3)

for i in {0..100};
do
 curl -v -H "Host: ${SERVICE_HOSTNAME}" http://localhost:8080/v1/models/$MODEL_NAME:predict -d $INPUT_PATH;
done

Verify if traffic split

In [None]:
%%bash

default_count=$(kubectl get replicaset -n $namespace -l serving.knative.dev/configuration=flowers-sample-predictor-default -o jsonpath='{.items[0].status.observedGeneration}')
canary_count=$(kubectl get replicaset -n $namespace -l serving.knative.dev/configuration=flowers-sample-predictor-canary -o jsonpath='{.items[0].status.observedGeneration}')

echo "\nThe count of traffic route to default: $default_count"
echo "The count of traffic route to canary: $canary_count"

### Clean Up Resources

In [None]:
!kubectl delete inferenceservices flowers-sample -n $namespace