# Getting the dataset prepared in Lab `1-DataPrep`

Let's load the dataset with the features we engineered in the previous lab 1-DataPrep.

(If you want, just run all cells. Go to the top toolbar click on `Run -> Run All Cells`)

In [None]:
import pandas as pd
import boto3
import sagemaker

sess = boto3.Session()
sm = sess.client('sagemaker')
role = sagemaker.get_execution_role()

In [None]:
# Set the paths for the datasets saved locally
local_train_path = 'train.csv'
train_df = pd.read_csv(local_train_path, header=None)
train_df.head()

pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 10)         # Keep the output on one page
train_df.head()

In [None]:
# Let's check the validation dataset
local_validation_path = 'validation.csv'
validation_df = pd.read_csv(local_validation_path, header=None)
validation_df.head()

If you remember from previous lab, we saved the CSV without headers. CSV with headers are stored in `config/training-dataset-with-header.csv`.

To see our train set with headers:

In [None]:
pd.read_csv("training-dataset-with-header.csv").head()

Now we'll upload the files to S3 for training.

In [None]:
%store -r bucket
%store -r prefix

In [None]:
train_dir = f"{prefix}/data/train"
val_dir = f"{prefix}/data/validation"

In [None]:
# Return the URLs of the uploaded file, so they can be reviewed or used elsewhere
s3uri_train = sagemaker.s3.S3Uploader.upload(local_train_path, 's3://{}/{}'.format(bucket, train_dir))
s3uri_validation = sagemaker.s3.S3Uploader.upload(local_validation_path, 's3://{}/{}'.format(bucket, val_dir))

If you want to see in the console, go to S3 and verify the 2 CSV files are there:

In [None]:
from IPython.core.display import display, HTML
s3_url_placeholder = "https://s3.console.aws.amazon.com/s3/buckets/{}?&prefix={}/"

In [None]:
display(HTML(f"<a href={s3_url_placeholder.format(bucket,train_dir)}>S3 Train object</a>"))

In [None]:
display(HTML(f"<a href={s3_url_placeholder.format(bucket,val_dir)}>S3 Validation object</a>"))

### Saving variables to use in the main notebook for this lab

In [None]:
%store train_dir
%store val_dir
%store s3uri_train
%store s3uri_validation

[You can now go back to modeling.ipynb](../modeling.ipynb)