Optional: Creating a Development Endpoint for Glue

This part is OPTIONAL for our workshop. You don’t need it if you already have a local Docker setup and have your data files ready in your local machine.

Development endpoints incur costs whether or not you are using them. Please delete the endpoints AND notebooks after usage.

In AWS Glue, you can create an environment — known as a development endpoint — that you can use to iteratively develop and test your extract, transform, and load (ETL) scripts. For more info, please refer to Developing Scripts Using Development Endpoints

The advantages of having a development endpoint compared to the local Docker method are:

  • Multiple people can use the same endpoint and notebook for development.
  • You have access to other AWS resources such as S3 from this endpoint, whereas you should have your files locally with the Docker option.

Creating a Development Endpoint and Notebook - Step 1

You can then create a notebook that connects to the endpoint, and use your notebook to author and test your ETL script. When you’re satisfied with the results of your development process, you can create an ETL job that runs your script. With this process, you can add functions and debug your scripts in an interactive manner.

It is also possible to connect your local IDE to this endpoint, which is explained here: Tutorial: Set Up PyCharm Professional with a Development Endpoint

How to create an endpoint and use it from a notebook:

Go to Glue in the console https://console.aws.amazon.com/glue/

  1. On the left menu, click in Dev endpoints and Add endpoint.
  2. Development endpoint name: byod
  3. IAM role: glue-processor-role
  4. Click Next
  5. Select Skip networking information
  6. Click Next
  7. Click Next - No need to Add SSH public key for now
  8. Click Finish

It will take a while to create the endpoint.

Creating a Development Endpoint and Notebook - Step 2

  1. In the glue console, Go to Notebooks, click Create notebook
  2. Notebook name: aws-glue-byod
  3. Attach to development: choose the endpoint created some steps back
  4. Create a new IAM Role.
  5. Create notebook