# TensorFlow Locally Develop a Model

This notebook is tested using `TensorFlow 2.6 Python 3.8 CPU Optimized - Python 3 Kernel` running on a `ml.t3.medium` instance. Please ensure that you see `Python 3 (TensorFlow 2.6 Python 3.8 CPU Optimized)` in the top right on your notebook.

------------------------------


![img](https://user-images.githubusercontent.com/18154355/216501180-3d5b258b-b856-4900-b352-47d129dac43e.png)

## Overview

In this notebook, we'll use a Studio notebook to protype our data loading and model architecture.


## Loading stored variables
 Run the cell below to load any prevously created variables from the prior notebook in this lab. You should see a print-out of the existing variables. If you don't see anything printed then you missed the final cell of the previous notebook.

In [None]:
%store -r
%store

In [None]:
# Ensure updated SageMaker SDK version
%pip install -U -q sagemaker

Important: You must have run the previous sequential notebooks to retrieve variables using the StoreMagic command.



## Download Sample of data for local model building

In [None]:
import sagemaker

data_bucket_s3_uri = "s3://" + data_bucket

# Filter directory for csv files
csv_files = [
 x for x in sagemaker.s3.S3Downloader.list(data_bucket_s3_uri) if x[-4:] == ".csv"
]

# Download one csv file
sagemaker.s3.S3Downloader.download(csv_files[0], "demo_data")

In [None]:
import glob
import pandas as pd

csv_file = glob.glob("demo_data/*.csv")[0]

column_headers = [
 "day_of_week",
 "month",
 "hour",
 "pickup_location_id",
 "dropoff_location_id",
 "trip_distance",
 "fare_amount",
]

raw_dataset = pd.read_csv(csv_file, names=column_headers)
raw_dataset.head()

In [None]:
linear_input = raw_dataset[["day_of_week", "month", "hour", "trip_distance"]]
dnn_input = raw_dataset[
 [
 "pickup_location_id",
 "dropoff_location_id",
 "trip_distance",
 ]
]
y = raw_dataset[["fare_amount"]]

# Architecture Prototyping
![image](https://1.bp.blogspot.com/-Dw1mB9am1l8/V3MgtOzp3uI/AAAAAAAABGs/mP-3nZQCjWwdk6qCa5WraSpK8A7rSPj3ACLcB/s1600/image04.png)

https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html

In [None]:
import tensorflow as tf
from tensorflow.keras.experimental import LinearModel, WideDeepModel
from tensorflow import keras

# TF Native File Reader
After an acceptable model tested using our pandas dataset, we need to think about what dataset we'll have when we scale this up to our entire dataset as a submitted SageMaker Training Job. To do this, we can prototype a notoriously tricky process right here in our local notebook.

In [None]:
def pack(features, label):
 linear_features = [
 tf.cast(features["day_of_week"], tf.float32),
 tf.cast(features["month"], tf.float32),
 tf.cast(features["hour"], tf.float32),
 features["trip_distance"],
 ]

 dnn_features = [
 tf.cast(features["pickup_location_id"], tf.float32),
 tf.cast(features["dropoff_location_id"], tf.float32),
 features["trip_distance"],
 ]

 return (tf.stack(linear_features, axis=-1), tf.stack(dnn_features, axis=-1)), label


ds = tf.data.experimental.make_csv_dataset(
 csv_file,
 batch_size=1,
 column_names=column_headers,
 num_epochs=5,
 shuffle=False,
 label_name="fare_amount",
)
ds = ds.map(pack)

In [None]:
iterator = iter(ds)
(x1, x2), y = next(iterator)

print(x1)
print(x2)
print(y)

## Build Regression Model

In [None]:
# Increase Batch Size
ds = tf.data.experimental.make_csv_dataset(
 csv_file,
 batch_size=128,
 column_names=column_headers,
 num_epochs=1,
 shuffle=False,
 label_name="fare_amount",
)
ds = ds.map(pack)

In [None]:
class SageMakerExperimentCallback(keras.callbacks.Callback):
 def __init__(self, run):
 super().__init__()
 self.run = run

 def on_epoch_end(self, epoch, logs=None):
 self.run.log_metric(name="loss", value=logs["loss"], step=epoch)
 self.run.log_metric(name="mse", value=logs["mse"], step=epoch)

In [None]:
from sagemaker.experiments import Run

experiment_name = "TaxiFare-Experiment"
run_name = "Local-Notebook-Run"
optimizer = "Adam"
epochs = 5

with Run(experiment_name=experiment_name, run_name=run_name) as run:
 run.log_parameters({"optimizer": optimizer, "epochs": epochs})

 linear_model = LinearModel()
 dnn_model = keras.Sequential(
 [
 keras.layers.Flatten(),
 keras.layers.Dense(128, activation="elu"),
 keras.layers.Dense(64, activation="elu"),
 keras.layers.Dense(32, activation="elu"),
 keras.layers.Dense(1, activation="sigmoid"),
 ]
 )
 combined_model = WideDeepModel(linear_model, dnn_model)
 combined_model.compile(optimizer=optimizer, loss="mse", metrics=["mse"])

 combined_model.fit(ds, epochs=epochs, callbacks=SageMakerExperimentCallback(run))

## Lets Scale it out in the next notebook!