{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# Quick Start - Run local code as SageMaker training job\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "---" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "\n", "We're introducing a new capability in AWS SageMaker Python SDK that allows data scientists to run their Python functions as SageMaker Jobs to take advantage of the compute power offered by SageMaker. This sample notebook is a quick introduction to this new capability with dummy Python functions.\n", "\n", "For more details about the feature, please checkout the AWS Developer Guide [here](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator.html)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install the dependencies" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -r ./requirements.txt" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Setup Configuration file path\n", "We need set the directory in which the config.yaml file resides so that remote decorator can make use of the settings." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "# Set path to config file\n", "os.environ[\"SAGEMAKER_USER_CONFIG_OVERRIDE\"] = os.getcwd()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## Invoke Python function as a SageMaker job\n", "\n", "There are two ways users could invoke a function as a job.\n", "\n", "* Using the decorator. When the decorated function is invoked, it executes synchronously" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sagemaker.remote_function import remote\n", "\n", "\n", "@remote\n", "def divide(x, y):\n", " print(f\"Calculating {x}/{y}\")\n", " return x / y\n", "\n", "\n", "divide(3, 2)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "* Using the RemoteExecutor APIs. They follow the pattern of the Python [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) APIs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def divide(x, y):\n", " print(f\"Calculating {x}/{y}\")\n", " return x / y\n", "\n", "\n", "from sagemaker.remote_function import RemoteExecutor\n", "\n", "with RemoteExecutor(max_parallel_jobs=1, keep_alive_period_in_seconds=30) as executor:\n", " futures = [executor.submit(divide, x, 2) for x in [1, 2, 3]]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "[future.result() for future in futures]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## Customize the decorator settings\n", "\n", "In the examples above, the @remote decorator and RemoteExecutor looked for configurations in the path set in the environment variable `SAGEMAKER_USER_CONFIG_OVERRIDE`/\n", "We set the value to use the local configuration file `config.yaml` present in the current directory `./`\n", "\n", "You can override the configurations by specifying the decorator arguments directly. In the following example, instead of launching the job with `ml.m5.xlarge`,\n", "as specified in the `./config.yaml`, run the function with a more powerful instance" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "@remote(instance_type=\"ml.m5.4xlarge\")\n", "def divide(x, y):\n", " print(f\"Calculating {x}/{y}\")\n", " return x / y\n", "\n", "\n", "divide(3, 2)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## Add extra dependencies using conda environment yml file\n", "\n", "In the example below, the function will run in a new conda environment where pandas and sagemaker will be installed.\n", "(Note that the cell will not run if executed in SageMaker Studio.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "\n", "@remote(dependencies=\"./environment.yml\")\n", "def multiply(dataframe: pd.DataFrame, factor: float):\n", " return dataframe * factor\n", "\n", "\n", "df = pd.DataFrame(\n", " {\n", " \"A\": [14, 4, 5, 4, 1],\n", " \"B\": [5, 2, 54, 3, 2],\n", " \"C\": [20, 20, 7, 3, 8],\n", " \"D\": [14, 3, 6, 2, 6],\n", " }\n", ")\n", "\n", "multiply(df, 10.0)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "## Common errors" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "### SerializationError and DeserializationError\n", "\n", "Behind the scenes, the function and function arguments and returns are serialized and deserialized using Pickle.\n", "\n", "While SerializationError occurs when passing an unpickle-able object as function argument, such as XGBoost DMatrix, open file object. DeserializationError typically happens when there is discrepancy between the dependencies in the local environments and dependencies in\n", "the job environments. In the following example, the latest pandas is used to declare a dataframe. The dataframe is passed to the function\n", "call. On the job side, an older version of pandas is installed. The dataframe can't be deserialized." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "\n", "@remote(dependencies=\"./incompatible_requirements.txt\")\n", "def multiply(dataframe: pd.DataFrame, factor: float):\n", " return dataframe * factor\n", "\n", "\n", "df = pd.DataFrame(\n", " {\n", " \"A\": [14, 4, 5, 4, 1],\n", " \"B\": [5, 2, 54, 3, 2],\n", " \"C\": [20, 20, 7, 3, 8],\n", " \"D\": [14, 3, 6, 2, 6],\n", " }\n", ")\n", "\n", "multiply(df, 10.0)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-remote-function|quick_start|quick_start.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-remote-function|quick_start|quick_start.ipynb)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (Base Python 3.0)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/sagemaker-base-python-310-v1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 0 }