# SageMaker Studio + EMR Workshop

Workshop Link: [https://catalog.workshops.aws/sagemaker-studio-emr/](https://catalog.workshops.aws/sagemaker-studio-emr/)

## Welcome to the SageMaker Studio + EMR Workshop!

Analyzing, transforming, and preparing large amounts of data is a foundational step of any data
science and ML workflow. Data workers such as data scientists and data engineers use Apache
Spark, Hive, and Presto running on Amazon EMR for to achieve these capabilities at scale.

## Workshop Content
In this workshop, we'll learn how to utilize distributed processing at scale to prepare data and
subsequently train machine learning models. We'll cover:

1. How to instantiate and terminate EMR clusters from the SageMaker Studio by use of pre-defined configurable templates
2. How to seamlessly interact with remote EMR clusters from SageMaker Studio notebooks.
3. How to do exploratory data analysis and build datasets to be used with built-in SageMaker algorithms. Specifically:
    * Natural Language Processing (NLP) sentiment analysis
    * DeepAR time series analysis
4. How to build datasets using EMR, prototype TensorFlow models and data loaders in Studio Notebooks, and scale training on SageMaker ephemeral training jobs
5. How to train models on EMR, and subsequently host them on SageMaker