{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "deletable": true, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "# Tutorial: CTR Modeling on EKS" ] }, { "cell_type": "markdown", "metadata": { "button": false, "deletable": true, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "In this tutorial, we will use the classic [Multilayer Perception (MLP)](https://spark.apache.org/docs/latest/ml-classification-regression.html#multilayer-perceptron-classifier) model to illustrate the workflow of deep learning model with Spark on Kubernetes. We'll see how to create a Jupyter notebook server Pod in an EKS cluster, and how to connect to the notebook server from local browser. Then inside the notebook, we will complete end-to-end feature engineering and Deep learning model training using [Spark MLlib](https://spark.apache.org/mllib/)." ] }, { "cell_type": "markdown", "metadata": { "button": false, "deletable": true, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "### Get current namespace" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "button": false, "collapsed": false, "deletable": true, "execution": { "iopub.execute_input": "2021-01-13T06:44:44.329657Z", "iopub.status.busy": "2021-01-13T06:44:44.329351Z", "iopub.status.idle": "2021-01-13T06:44:45.037958Z", "shell.execute_reply": "2021-01-13T06:44:45.037042Z", "shell.execute_reply.started": "2021-01-13T06:44:44.329620Z" }, "jupyter": { "outputs_hidden": false }, "new_sheet": false, "run_control": { "read_only": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "notebook" ] } ], "source": [ "cat /var/run/secrets/kubernetes.io/serviceaccount/namespace" ] }, { "cell_type": "markdown", "metadata": { "button": false, "deletable": true, "new_sheet": false, "run_control": { "read_only": false } }, "source": [ "### Spark Docker\n", "Please refer to the ```README``` on how to build the ```docker/spark/Dockerfile``` available in the GitHub repository.\n", "
SparkSession - in-memory
\n", " \n", "SparkContext
\n", "\n", " \n", "\n", "v2.4.4
k8s://https://kubernetes.default.svc:443
spark-on-k8s
\n", " | categoric_0 | \n", "
---|---|
0 | \n", "A | \n", "
1 | \n", "B | \n", "
2 | \n", "A | \n", "
3 | \n", "B | \n", "
4 | \n", "B | \n", "
... | \n", "... | \n", "
99995 | \n", "B | \n", "
99996 | \n", "A | \n", "
99997 | \n", "B | \n", "
99998 | \n", "B | \n", "
99999 | \n", "B | \n", "
100000 rows × 1 columns
\n", "SparkSession - in-memory
\n", " \n", "SparkContext
\n", "\n", " \n", "\n", "v2.4.4
k8s://https://kubernetes.default.svc:443
spark-on-k8s