{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TensorFlow Locally Develop a Model\n",
    "\n",
    "This notebook is tested using `TensorFlow 2.6 Python 3.8 CPU Optimized - Python 3 Kernel` running on a `ml.t3.medium` instance. Please ensure that you see `Python 3 (TensorFlow 2.6 Python 3.8 CPU Optimized)` in the top right on your notebook.\n",
    "\n",
    "------------------------------\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "![img](https://user-images.githubusercontent.com/18154355/216501180-3d5b258b-b856-4900-b352-47d129dac43e.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "In this notebook, we'll use a Studio notebook to protype our data loading and model architecture.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading stored variables\n",
    " Run the cell below to load any prevously created variables from the prior notebook in this lab. You should see a print-out of the existing variables. If you don't see anything printed then you missed the final cell of the previous notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%store -r\n",
    "%store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Ensure updated SageMaker SDK version\n",
    "%pip install -U -q sagemaker"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Important: You must have run the previous sequential notebooks to retrieve variables using the StoreMagic command.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download Sample of data for local model building"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "\n",
    "data_bucket_s3_uri = \"s3://\" + data_bucket\n",
    "\n",
    "# Filter directory for csv files\n",
    "csv_files = [\n",
    "    x for x in sagemaker.s3.S3Downloader.list(data_bucket_s3_uri) if x[-4:] == \".csv\"\n",
    "]\n",
    "\n",
    "# Download one csv file\n",
    "sagemaker.s3.S3Downloader.download(csv_files[0], \"demo_data\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import glob\n",
    "import pandas as pd\n",
    "\n",
    "csv_file = glob.glob(\"demo_data/*.csv\")[0]\n",
    "\n",
    "column_headers = [\n",
    "    \"day_of_week\",\n",
    "    \"month\",\n",
    "    \"hour\",\n",
    "    \"pickup_location_id\",\n",
    "    \"dropoff_location_id\",\n",
    "    \"trip_distance\",\n",
    "    \"fare_amount\",\n",
    "]\n",
    "\n",
    "raw_dataset = pd.read_csv(csv_file, names=column_headers)\n",
    "raw_dataset.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "linear_input = raw_dataset[[\"day_of_week\", \"month\", \"hour\", \"trip_distance\"]]\n",
    "dnn_input = raw_dataset[\n",
    "    [\n",
    "        \"pickup_location_id\",\n",
    "        \"dropoff_location_id\",\n",
    "        \"trip_distance\",\n",
    "    ]\n",
    "]\n",
    "y = raw_dataset[[\"fare_amount\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Architecture Prototyping\n",
    "![image](https://1.bp.blogspot.com/-Dw1mB9am1l8/V3MgtOzp3uI/AAAAAAAABGs/mP-3nZQCjWwdk6qCa5WraSpK8A7rSPj3ACLcB/s1600/image04.png)\n",
    "\n",
    "https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "from tensorflow.keras.experimental import LinearModel, WideDeepModel\n",
    "from tensorflow import keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TF Native File Reader\n",
    "After an acceptable model tested using our pandas dataset, we need to think about what dataset we'll have when we scale this up to our entire dataset as a submitted SageMaker Training Job. To do this, we can prototype a notoriously tricky process right here in our local notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def pack(features, label):\n",
    "    linear_features = [\n",
    "        tf.cast(features[\"day_of_week\"], tf.float32),\n",
    "        tf.cast(features[\"month\"], tf.float32),\n",
    "        tf.cast(features[\"hour\"], tf.float32),\n",
    "        features[\"trip_distance\"],\n",
    "    ]\n",
    "\n",
    "    dnn_features = [\n",
    "        tf.cast(features[\"pickup_location_id\"], tf.float32),\n",
    "        tf.cast(features[\"dropoff_location_id\"], tf.float32),\n",
    "        features[\"trip_distance\"],\n",
    "    ]\n",
    "\n",
    "    return (tf.stack(linear_features, axis=-1), tf.stack(dnn_features, axis=-1)), label\n",
    "\n",
    "\n",
    "ds = tf.data.experimental.make_csv_dataset(\n",
    "    csv_file,\n",
    "    batch_size=1,\n",
    "    column_names=column_headers,\n",
    "    num_epochs=5,\n",
    "    shuffle=False,\n",
    "    label_name=\"fare_amount\",\n",
    ")\n",
    "ds = ds.map(pack)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iterator = iter(ds)\n",
    "(x1, x2), y = next(iterator)\n",
    "\n",
    "print(x1)\n",
    "print(x2)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build Regression Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Increase Batch Size\n",
    "ds = tf.data.experimental.make_csv_dataset(\n",
    "    csv_file,\n",
    "    batch_size=128,\n",
    "    column_names=column_headers,\n",
    "    num_epochs=1,\n",
    "    shuffle=False,\n",
    "    label_name=\"fare_amount\",\n",
    ")\n",
    "ds = ds.map(pack)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "class SageMakerExperimentCallback(keras.callbacks.Callback):\n",
    "    def __init__(self, run):\n",
    "        super().__init__()\n",
    "        self.run = run\n",
    "\n",
    "    def on_epoch_end(self, epoch, logs=None):\n",
    "        self.run.log_metric(name=\"loss\", value=logs[\"loss\"], step=epoch)\n",
    "        self.run.log_metric(name=\"mse\", value=logs[\"mse\"], step=epoch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.experiments import Run\n",
    "\n",
    "experiment_name = \"TaxiFare-Experiment\"\n",
    "run_name = \"Local-Notebook-Run\"\n",
    "optimizer = \"Adam\"\n",
    "epochs = 5\n",
    "\n",
    "with Run(experiment_name=experiment_name, run_name=run_name) as run:\n",
    "    run.log_parameters({\"optimizer\": optimizer, \"epochs\": epochs})\n",
    "\n",
    "    linear_model = LinearModel()\n",
    "    dnn_model = keras.Sequential(\n",
    "        [\n",
    "            keras.layers.Flatten(),\n",
    "            keras.layers.Dense(128, activation=\"elu\"),\n",
    "            keras.layers.Dense(64, activation=\"elu\"),\n",
    "            keras.layers.Dense(32, activation=\"elu\"),\n",
    "            keras.layers.Dense(1, activation=\"sigmoid\"),\n",
    "        ]\n",
    "    )\n",
    "    combined_model = WideDeepModel(linear_model, dnn_model)\n",
    "    combined_model.compile(optimizer=optimizer, loss=\"mse\", metrics=[\"mse\"])\n",
    "\n",
    "    combined_model.fit(ds, epochs=epochs, callbacks=SageMakerExperimentCallback(run))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lets Scale it out in the next notebook!"
   ]
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (TensorFlow 2.6 Python 3.8 CPU Optimized)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/tensorflow-2.6-cpu-py38-ubuntu20.04-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}