Explore and Preprocess Data

Before using a dataset to train a model, data scientists typically explore and preprocess it. For example, in one of the exercises in this guide, you use the MNIST dataset, a commonly available dataset of handwritten numbers, for model training. Before you begin training, you transform the data into a format that is more efficient for training. For more information, see Step 4.3: Transform the Training Dataset and Upload It to Amazon S3.

To preprocess data use one of the following methods: + Use a Jupyter notebook on an Amazon SageMaker notebook instance. You can also use the notebook instance to do the following: + Write code to create model training jobs + Deploy models to Amazon SageMaker hosting + Test or validate your models

For more information, see Use Notebook Instances + You can use a model to transform data by using Amazon SageMaker batch transform. For more information, see Step 6.2: Deploy the Model with Batch Transform.

Train a Model with Amazon SageMaker