{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Part 4-0: Create a custom container" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will be using [SageMaker Studio Pipelines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html) to automate all of our steps that we have done so far. SageMaker Pipelines uses purpose built docker containers behind the scene to run jobs (aka [Steps](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html)) in a sequence that you define (much like a DevOps CI/CD pipeline). You can build our own docker container with Python3, [Boto3 SDK](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) and [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) installs, so that you can make use of them to make calls to Amazon Fraud Detector APIs via Boto3 library and access SageMaker constructs such as Feature Store etc. via custom data processing scripts.\n", "\n", "To achieve that, you will first have to build a docker image and push it to an [ECR (Elastic Container Registry)](https://aws.amazon.com/ecr/) repo in your account. Typically this can be done using `docker` CLI and `aws cli` in your local machine pretty easily. However, SageMaker makes it even easier to use this studio environment to build, create, and push any custom container to your ECR repository using a purpose built tool known as `sagemaker-studio-image-build` and use the custom container image in your Notebooks for your ML projects. \n", "\n", "For more information on this, refer to [this blog post](https://aws.amazon.com/blogs/machine-learning/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks/)\n", "\n", "Next, install this required CLI tool into our SageMaker environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install sagemaker-studio-image-build CLI tool\n", "!pip install sagemaker-studio-image-build" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Grant appropriate permissions to SageMaker\n", "---\n", "In order to be able to use `sagemaker-studio-image-build`, we need to first add permission to SageMaker's IAM role so that it may perform actions on your behalf. Specifically, you would add Amazon ECR and Amazon CodeBuild permissions to it. Add the AmazonEC2ContainerRegistryFullAccess and AWSCodeBuildAdminAccess policies to your Sagemaker default role.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to this, you will also have to add `iam:PassRole` permission to the SageMaker Studio execution role. Add the Policy document below as an inline policy to the SageMaker Studio Execution role in IAM console." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```json\n", "{\n", " \"Version\": \"2012-10-17\",\n", " \"Statement\": [\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Action\": \"iam:PassRole\",\n", " \"Resource\": \"arn:aws:iam::*:role/*\",\n", " \"Condition\": {\n", " \"StringLikeIfExists\": {\n", " \"iam:PassedToService\": \"codebuild.amazonaws.com\"\n", " }\n", " }\n", " }\n", " ]\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a last and final step, you must also add a trust relationship in the SageMaker Studio Execution role to allow CodeBuild to assume this role. To add a trust relationship\n", "* Navigate to IAM Console\n", "* Search for your SageMaker execution role. (You can find your Sagemaker execution role name from Sagemaker Studio console)\n", "* Click on the \"Trust Relationships\" tab > Click the \"Edit Trust relationship\" button\n", "* Add the following Trust relationship to any pre-existing trust relationship\n", "\n", "```json\n", "{\n", " \"Effect\": \"Allow\",\n", " \"Principal\": {\n", " \"Service\": \"codebuild.amazonaws.com\"\n", " },\n", " \"Action\": \"sts:AssumeRole\"\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a normal situation, your final trust relationship should look something like this\n", "\n", "```json\n", "{\n", " \"Version\": \"2012-10-17\",\n", " \"Statement\": [\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Principal\": {\n", " \"Service\": \"sagemaker.amazonaws.com\"\n", " },\n", " \"Action\": \"sts:AssumeRole\"\n", " },\n", " {\n", " \"Effect\": \"Allow\",\n", " \"Principal\": {\n", " \"Service\": \"codebuild.amazonaws.com\"\n", " },\n", " \"Action\": \"sts:AssumeRole\"\n", " }\n", " ]\n", "}\n", "```\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

💡 NOTE

\n", "IAM Policies described in this notebook can be overly permissive. Please practice caution in setting up IAM Roles with them. For fine grained permissions for the sagemaker-studio-image-build tool, please refer to this post. For best practices on SageMaker security, IAM roles and policies refer to this document.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Build a custom Docker image\n", "---\n", "\n", "We will now build a create a custom [Dockerfile](https://docs.docker.com/engine/reference/builder/) and use the CLI tool to build the image from the Dockerfile. Our docker image is going to be pretty simple, it will be a copy of the open source [python:3.7-slim-buster](https://github.com/docker-library/python/blob/117d4e375b86cdbe1853930478d0d07d7d5701f7/3.7/buster/slim/Dockerfile) image and contain an installation of [Boto3 SDK](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html), [SageMaker SDK](https://github.com/aws/sagemaker-python-sdk), Pandas, and Numpy." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting Dockerfile\n" ] } ], "source": [ "%%writefile Dockerfile\n", "FROM python:3.7-slim-buster\n", "\n", "RUN pip3 install boto3>=1.15.0 sagemaker pandas numpy s3fs\n", "ENV PYTHONUNBUFFERED=TRUE\n", "\n", "ENTRYPOINT [ \"python3\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code cell above will create a `Dockerfile` in the local project's directory, we can then run the `sm-docker build` command to build, and publish our image. This single command will take care of building the Docker image and publishing it to a [private ECR Repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/Repositories.html) in your current region (i.e. your SageMaker Studio's default Region). \n", "\n", " NOTE: You must execute the code cell above in order to be able to run the following cell. `sm-docker build` reads the `Dcokerfile` to create the docker image. To ensure that the code above ran successfully, please verify that you have a file named `Dockerfile` under the project's root directory in the \"File Browser\" in the left panel of Studio. This project already includes the Dockerfile, however, if you modify the code cell above, it would be a good idea to verify if the contents of the Dockerfile were updated correctly.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!sm-docker build ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Running the command in the above code cell will print log lines in the notebook ending with three lines that look like this-\n", "\n", "```sh\n", "[Container] 2021/05/15 03:19:43 Phase complete: POST_BUILD State: SUCCEEDED\n", "[Container] 2021/05/15 03:19:43 Phase context status code: Message:\n", "Image URI: .dkr.ecr..amazonaws.com/sagemaker-studio-d-xxxxxxxxx:default-\n", "```\n", "We will need the `Image URI` for our SageMaker pipeline setup. You can also find this image URI from the [ECR Console](https://console.aws.amazon.com/ecr/repositories) (make sure the correct region is selected in the ECR console).\n", "\n", "Initialize a variable with the Docker image URI that we built in the previous step, we will also store this variable in cache" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Stored 'CONTAINER_IMAGE_URI' (str)\n" ] } ], "source": [ "# Copy the Image URI from the log output below\n", "#------------------------------------------------\n", "\n", "CONTAINER_IMAGE_URI= \".dkr.ecr..amazonaws.com/sagemaker-studio-d-xxxxxxxxx:default-\"\n", "%store CONTAINER_IMAGE_URI\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Conclusion\n", "---\n", "\n", "In this notebook, we\n", "\n", "* Installed the sagemaker-studio-image-build CLI tool that helps us bulding and publishing custom docker images for our ML Workstream\n", "* Setup IAM permissions for the CLI tool\n", "* Built the docker image using the `sm-docker build` command that includes Boto3 and SageMaker SDK libraries\n", "* Initialized the variable `CONTAINER_IMAGE_URI` with the resulting Image URI and stored it in cache for use later" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (Data Science)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-2:429704687514:image/datascience-1.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }