{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating docker images" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pre-requisites \n", "\n", "### Imports\n", "\n", "To get started, we'll import the Python libraries as needed, set up the environment with a few prerequisites for permissions and configurations." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import sagemaker\n", "import boto3\n", "import sys\n", "import os\n", "import glob\n", "import re\n", "import subprocess\n", "from IPython.display import HTML\n", "import time\n", "from time import gmtime, strftime\n", "\n", "sys.path.append(\"common\")\n", "from misc import get_execution_role, wait_for_s3_object, wait_for_training_job_to_complete\n", "from sagemaker.rl import RLEstimator, RLToolkit, RLFramework\n", "from docker_utils import build_and_push_docker_image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup S3 bucket\n", "\n", "Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "sage_session = sagemaker.session.Session()\n", "s3_bucket = sage_session.default_bucket() \n", "s3_output_path = 's3://{}/'.format(s3_bucket)\n", "print(\"S3 bucket path: {}\".format(s3_output_path))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define Variables \n", "\n", "We define variables such as the job prefix for the training jobs *and the image path for the container (only when this is BYOC).*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a descriptive job name \n", "job_name_prefix = 'sagemaker-rl-ray-container'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Configure where training happens\n", "\n", "You can train your RL training jobs using the SageMaker notebook instance or local notebook instance. In both of these scenarios, you can run the following in either local or SageMaker modes. The local mode uses the SageMaker Python SDK to run your code in a local container before deploying to SageMaker. This can speed up iterative testing and debugging while using the same familiar Python SDK interface. You just need to set `local_mode = True`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "# run in local_mode on this machine, or as a SageMaker TrainingJob?\n", "local_mode = False\n", "\n", "if local_mode:\n", " instance_type = 'local'\n", "else:\n", " # If on SageMaker, pick the instance type\n", " instance_type = \"ml.c5.18xlarge\" # CPU\n", "# instance_type = \"ml.p3.8xlarge\" # GPU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create an IAM role\n", "\n", "Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "try:\n", " role = sagemaker.get_execution_role()\n", "except:\n", " role = get_execution_role()\n", "\n", "print(\"Using IAM role arn: {}\".format(role))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install docker for `local` mode\n", "\n", "In order to work in `local` mode, you need to have docker installed. When running from you local machine, please make sure that you have docker and docker-compose (for local CPU machines) and nvidia-docker (for local GPU machines) installed. Alternatively, when running from a SageMaker notebook instance, you can simply run the following script to install dependenceis.\n", "\n", "Note, you can only run a single local notebook at one time." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# only run from SageMaker notebook instance\n", "if local_mode:\n", " !/bin/bash ./common/setup.sh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build docker container" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "framework = \"tf\"\n", "# framework = \"torch\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# default as tensorflow\n", "if framework == 'tf':\n", " framework_fullname = 'tensorflow'\n", " framework_version = \"2.5.0\" # for training\n", " python_version = \"py37\"\n", "elif framework == 'torch':\n", " framework_fullname = 'pytorch'\n", " framework_version = \"1.8.1\" # PyTorch \"1.8.1\"\n", " python_version = \"py36\"\n", "\n", "\n", "aws_region = boto3.Session().region_name\n", "suffix = python_version\n", "\n", "if 'ml.p' in instance_type:\n", " CPU_OR_GPU = \"gpu\"\n", " if framework == \"tf\" and framework_version.startswith(\"1.15\"):\n", " suffix += \"-cu100-ubuntu18.04\"\n", " if framework == \"tf\" and framework_version.startswith(\"2.3\"):\n", " suffix += \"-cu102-ubuntu18.04\"\n", " if framework == \"tf\" and framework_version.startswith(\"2.5\"):\n", " suffix += \"-cu112-ubuntu18.04\"\n", " if framework == \"tf\" and framework_version.startswith(\"2.6\"):\n", " suffix += \"-cu112-ubuntu20.04\"\n", " if framework == \"torch\" and framework_version.startswith(\"1.7\"):\n", " suffix += \"-cu110-ubuntu18.04\"\n", "else:\n", " CPU_OR_GPU = \"cpu\"\n", "\n", "repository_short_name = \"{}:ray-1.6.0-{}-{}-{}\".format(job_name_prefix, framework, CPU_OR_GPU, python_version)\n", "print('repository_short_name is : {}'.format(repository_short_name))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# !docker stop $(docker ps -aq)\n", "# !docker rm $(docker ps -aq)\n", "# !docker rmi -f $(docker images -a -q)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(CPU_OR_GPU, aws_region, framework_fullname, framework_version, suffix)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "docker_build_args = {\n", " 'CPU_OR_GPU': CPU_OR_GPU, \n", " 'AWS_REGION': aws_region,\n", " 'FRAMEWORK': framework_fullname,\n", " 'VERSION': framework_version, \n", " 'SUFFIX': suffix,\n", "}\n", "custom_image_name = build_and_push_docker_image(repository_short_name, build_args=docker_build_args)\n", "print(\"Using ECR image %s\" % custom_image_name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" }, "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." }, "nbformat": 4, "nbformat_minor": 4 }