## Introduction to the Pipelines SDK

The [Kubeflow Pipelines SDK](https://github.com/kubeflow/pipelines/tree/master/sdk) provides a set of Python packages that you can use to specify and run your machine learning (ML) workflows. A pipeline is a description of an ML workflow, including all of the components that make up the steps in the workflow and how the components interact with each other.


Kubeflow website has a very detail expaination of kubeflow components, please go to [Introduction to the Pipelines SDK](https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/) for details

This guide tells you how to use the [Kubeflow Pipelines SDK](https://github.com/kubeflow/pipelines/tree/master/sdk) to build machine learning pipelines. You can use the SDK to execute your pipeline, or alternatively you can upload the pipeline to the Kubeflow Pipelines UI for execution.

All of the SDK’s classes and methods are described in the auto-generated [SDK reference docs](https://kubeflow-pipelines.readthedocs.io/en/latest/).


Run the following command to install the Kubeflow Pipelines SDK


In [None]:
!pip install kfp --upgrade --user

After successful installation, the command `dsl-compile` should be available. You can use this command to verify it

In [None]:
!which dsl-compile

> Note: Please check official documentation to understand Pipline concetps before your move forward. [Introduction to Pipelines SDK](https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/)

## Build simple components and pipelines

In this example, we want to calculate sum of three numbers. 

1. Let's assume we have a python image to use. It accepts two arguments and return sum of them. 

2. The sum of a and b will be used to calculate final result with sum of c and d. In total, we will have three arithmetical operators. Then we use another echo operator to print the result. 

### 1. Create a container image for each component

Assumes that you have already created a program to perform the task required in a particular step of your ML workflow. For example, if the task is to train an ML model, then you must have a program that does the training,

Your component can create `outputs` that the downstream components can use as `inputs`. This will be used to build Job Directed Acyclic Graph (DAG)


> In this case, we will use a python base image to do the calculation. We skip buiding our own image.

### 2. Create a Python function to wrap your component

Define a Python function to describe the interactions with the Docker container image that contains your pipeline component.

Here, in order to simplify the process, we use simple way to calculate sum. Ideally, you need to build a new container image for your code change.

In [None]:
import kfp
from kfp import dsl

def add_two_numbers(a, b):
 return dsl.ContainerOp(
 name='calculate_sum',
 image='python:3.6.8',
 command=['python', '-c'],
 arguments=['with open("/tmp/results.txt", "a") as file: file.write(str({} + {}))'.format(a, b)],
 file_outputs={
 'data': '/tmp/results.txt',
 }
 )

def echo_op(text):
 return dsl.ContainerOp(
 name='echo',
 image='library/bash:4.4.23',
 command=['sh', '-c'],
 arguments=['echo "Result: {}"'.format(text)]
 )

### 3. Define your pipeline as a Python function

Describe each pipeline as a Python function.

In [None]:
@dsl.pipeline(
 name='Calcualte sum pipeline',
 description='Calculate sum of numbers and prints the result.'
)
def calculate_sum(
 a=7,
 b=10,
 c=4,
 d=7
):
 """A four-step pipeline with first two running in parallel."""

 sum1 = add_two_numbers(a, b)
 sum2 = add_two_numbers(c, d)
 sum = add_two_numbers(sum1.output, sum2.output)

 echo_task = echo_op(sum.output)

### 4. Compile the pipeline

Compile the pipeline to generate a compressed YAML definition of the pipeline. The Kubeflow Pipelines service converts the static configuration into a set of Kubernetes resources for execution.

There are two ways to compile the pipeline. Either use python lib `kfp.compiler.Compiler.compile ` or use binary `dsl-compile` command.

In [None]:
kfp.compiler.Compiler().compile(calculate_sum, 'calculate-sum-pipeline.zip')

In [None]:
# If you have a python file, you can also try build pipeline using `dsl-compile` command.
# dsl-compile --py [path/to/python/file] --output my-pipeline.zip

### 5. Deploy pipeline

There're two ways to deploy the pipeline. Either upload the generate `.tar.gz` file through the `Kubeflow Pipelines UI`, or use `Kubeflow Pipeline SDK` to deploy it.

We will only show sdk usage here.

In [None]:
client = kfp.Client()
aws_experiment = client.create_experiment(name='aws')
my_run = client.run_pipeline(aws_experiment.id, 'calculate-sum-pipeline', 
 'calculate-sum-pipeline.zip')