{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  TensorFlow Distributed Training Options \n",
    "\n",
    "Sometimes it makes sense to perform training on a single machine. For large datasets, however, it may be necessary to perform distributed training on a cluster of multiple machines. Cluster managment often is a pain point in the machine learning pipeline. Fortunately, Amazon SageMaker makes it easy to run distributed training without having to manage cluster setup and tear down. In this notebook, we'll examine some different options for performing distributed training with TensorFlow in Amazon SageMaker. In particular, we'll look at the following options:\n",
    "\n",
    "- **Parameter Servers**: processes that receive asynchronous updates from worker nodes and distribute updated gradients to all workers.\n",
    "\n",
    "- **Horovod**:  a framework based on Ring-AllReduce wherein worker nodes synchronously exchange gradient updates only with two other workers at a time.  \n",
    "\n",
    "The model used for this notebook is a basic Convolutional Neural Network (CNN) based on [the Keras examples](https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py), although we will be using the tf.keras implementation of Keras rather than the separate Keras reference implementation.  We'll train the CNN to classify images using the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html), a well-known computer vision dataset. It consists of 60,000 32x32 images belonging to 10 different classes, with 6,000 images per class. Here is a graphic of the classes in the dataset, as well as 10 random images from each:\n",
    "\n",
    "![cifar10](https://maet3608.github.io/nuts-ml/_images/cifar10.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup \n",
    "\n",
    "We'll begin with some necessary imports, and get an Amazon SageMaker session to help perform certain tasks, as well as an IAM role with the necessary permissions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import os\n",
    "import sagemaker\n",
    "from sagemaker import get_execution_role\n",
    "\n",
    "sagemaker_session = sagemaker.Session()\n",
    "role = get_execution_role()\n",
    "\n",
    "bucket = sagemaker_session.default_bucket()\n",
    "prefix = 'sagemaker/DEMO-tf-distribution-options'\n",
    "print('Bucket:\\n{}'.format(bucket))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll run a script that fetches the dataset and converts it to the TFRecord format, which provides several conveniences for training models in TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget -q https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-script-mode/master/tf-distribution-options/generate_cifar10_tfrecords.py\n",
    "!python generate_cifar10_tfrecords.py --data-dir ./data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For Amazon SageMaker hosted training on a cluster separate from this notebook instance, training data must be stored in Amazon S3, so we'll upload the data to S3 now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inputs = sagemaker_session.upload_data(path='data', key_prefix='data/tf-distribution-options')\n",
    "display(inputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distributed Training with Parameter Servers\n",
    "\n",
    "A common pattern in distributed training is to use dedicated processes to collect gradients computed by “worker” processes, then aggregate them and distribute the updated gradients back to the workers. These processes are known as parameter servers. In general, they can be run either on their own machines or co-located on the same machines as the workers. In a parameter server cluster, each parameter server communicates with all workers (“all-to-all”). The Amazon SageMaker prebuilt TensorFlow container comes with a built-in option to use parameter servers for distributed training. The container runs a parameter server thread in each training instance, so there is a 1:1 ratio of parameter servers to workers. With this built-in option, gradient updates are made asynchronously (though some other versions of parameters servers use synchronous updates). \n",
    "\n",
    "Script Mode requires a training script, which in this case is the `train_ps.py` file in the */code* subdirectory of the [related distributed training example GitHub repository](https://github.com/aws-samples/amazon-sagemaker-script-mode/tree/master/tf-distribution-options).  Once a training script is ready, the next step is to set up an Amazon SageMaker TensorFlow Estimator object with the details of the training job.  It is very similar to an Estimator for training on a single machine, except we specify a `distributions` parameter to enable starting a parameter server on each training instance. We'll reference the GitHub repository so we can keep our training code version controlled and avoid downloading it locally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.tensorflow import TensorFlow\n",
    "\n",
    "git_config = {'repo': 'https://github.com/aws-samples/amazon-sagemaker-script-mode', \n",
    "              'branch': 'master'}\n",
    "\n",
    "ps_instance_type = 'ml.p3.2xlarge'\n",
    "ps_instance_count = 2\n",
    "\n",
    "model_dir = \"/opt/ml/model\"\n",
    "\n",
    "distributions = {'parameter_server': {\n",
    "                    'enabled': True}\n",
    "                }\n",
    "hyperparameters = {'epochs': 60, 'batch-size' : 256}\n",
    "\n",
    "estimator_ps = TensorFlow(\n",
    "                       git_config=git_config,\n",
    "                       source_dir='tf-distribution-options/code',\n",
    "                       entry_point='train_ps.py', \n",
    "                       base_job_name='ps-cifar10-tf',\n",
    "                       role=role,\n",
    "                       framework_version='1.13',\n",
    "                       py_version='py3',\n",
    "                       hyperparameters=hyperparameters,\n",
    "                       train_instance_count=ps_instance_count, \n",
    "                       train_instance_type=ps_instance_type,\n",
    "                       model_dir=model_dir,\n",
    "                       tags = [{'Key' : 'Project', 'Value' : 'cifar10'},{'Key' : 'TensorBoard', 'Value' : 'dist'}],\n",
    "                       distributions=distributions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can call the `fit` method of the Estimator object to start training. After training completes, the tf.keras model will be saved in the SavedModel .pb format so it can be served by a TensorFlow Serving container. Note that the model is only saved by the the lead node (disregard any warnings about the model not being saved by all the processes)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "remote_inputs = {'train' : inputs+'/train', 'validation' : inputs+'/validation', 'eval' : inputs+'/eval'}\n",
    "estimator_ps.fit(remote_inputs, wait=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training is complete, it is always a good idea to take a look at training curves to diagnose problems, if any, during training and determine the representativeness of the training and validation datasets.  We can do this with TensorBoard, and also with the Keras API: conveniently, the Keras `fit` invocation returns a data structure with the training history. In our training script, this history is saved on the lead training node, then uploaded with the model when training is complete.\n",
    "\n",
    "To retrieve the history, we first download the model locally, then unzip it to gain access to the history data structure. We can then simply load the history as JSON:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json \n",
    "\n",
    "!aws s3 cp {estimator_ps.model_data} ./ps_model/model.tar.gz\n",
    "!tar -xzf ./ps_model/model.tar.gz -C ./ps_model\n",
    "\n",
    "with open('./ps_model/ps_history.p', \"r\") as f:\n",
    "    ps_history = json.load(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can plot the history with two graphs, one for accuracy and another for loss. Each graph shows the results for both the training and validation datasets. Although training is a stochastic process that can vary significantly between training jobs, overall you are likely to see that the training curves are converging smoothly and steadily to higher accuracy and lower loss, while the validation curves are more jagged. This is due to the validation dataset being relatively small and thus not as representative as the training dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def plot_training_curves(history): \n",
    "    \n",
    "    fig, axes = plt.subplots(1, 2, figsize=(12, 4), sharex=True)\n",
    "    ax = axes[0]\n",
    "    ax.plot(history['acc'], label='train')\n",
    "    ax.plot(history['val_acc'], label='validation')\n",
    "    ax.set(\n",
    "        title='model accuracy',\n",
    "        ylabel='accuracy',\n",
    "        xlabel='epoch')\n",
    "    ax.legend()\n",
    "\n",
    "    ax = axes[1]\n",
    "    ax.plot(history['loss'], label='train')\n",
    "    ax.plot(history['val_loss'], label='validation')\n",
    "    ax.set(\n",
    "        title='model loss',\n",
    "        ylabel='loss',\n",
    "        xlabel='epoch')\n",
    "    ax.legend()\n",
    "    fig.tight_layout()\n",
    "    \n",
    "plot_training_curves(ps_history)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Besides saving the training history, we also can save other artifacts and data from the training job. For example, we can include a callback in the training script to save each model checkpoint after each training epoch is complete. These checkpoints will be saved to the same Amazon S3 folder as the model, in a zipped file named `output.tar.gz` as shown below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "artifacts_dir = estimator_ps.model_data.replace('model.tar.gz', '')\n",
    "print(artifacts_dir)\n",
    "!aws s3 ls --human-readable {artifacts_dir}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distributed Training with Horovod\n",
    "\n",
    "Horovod is an open source distributed training framework for TensorFlow, Keras, PyTorch, and MXNet. It is an alternative to the more \"traditional\" parameter servers method of performing distributed training demonstrated above.  Horovod can be more performant than parameter servers in large, GPU-based clusters where large models are trained. In Amazon SageMaker, Horovod is only available with TensorFlow version 1.12 or newer. \n",
    "\n",
    "Only a few lines of code are necessary to use Horovod for distributed training of a Keras model defined by the tf.keras API.  For details, see the `train_hvd.py` script included with this notebook; the changes primarily relate to:\n",
    "\n",
    "- importing Horovod.\n",
    "- initializing Horovod.\n",
    "- configuring GPU options and setting a Keras/tf.session with those options.\n",
    "\n",
    "The Estimator object for Horovod training is very similar to the parameter servers Estimator above, except we specify a `distributions` parameter describing Horovod attributes such as the number of process per host, which is set here to the number of GPUs per machine.  Beyond these few simple parameters and the few lines of code in the training script, there is nothing else you need to do to use distributed training with Horovod; Amazon SageMaker handles the heavy lifting for you and manages the underlying cluster setup."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.tensorflow import TensorFlow\n",
    "\n",
    "hvd_instance_type = 'ml.p3.2xlarge'\n",
    "hvd_processes_per_host = 1\n",
    "hvd_instance_count = 2\n",
    "\n",
    "distributions = {'mpi': {\n",
    "                    'enabled': True,\n",
    "                    'processes_per_host': hvd_processes_per_host,\n",
    "                    'custom_mpi_options': '-verbose --NCCL_DEBUG=INFO -x OMPI_MCA_btl_vader_single_copy_mechanism=none'\n",
    "                        }\n",
    "                }\n",
    "\n",
    "hyperparameters = {'epochs': 60, 'batch-size' : 256}\n",
    "\n",
    "estimator_hvd = TensorFlow(\n",
    "                       git_config=git_config,\n",
    "                       source_dir='tf-distribution-options/code',\n",
    "                       entry_point='train_hvd.py',\n",
    "                       base_job_name='hvd-cifar10-tf', \n",
    "                       role=role,\n",
    "                       framework_version='1.13',\n",
    "                       py_version='py3',\n",
    "                       hyperparameters=hyperparameters,\n",
    "                       train_instance_count=hvd_instance_count, \n",
    "                       train_instance_type=hvd_instance_type,\n",
    "                       tags = [{'Key' : 'Project', 'Value' : 'cifar10'},{'Key' : 'TensorBoard', 'Value' : 'dist'}],\n",
    "                       distributions=distributions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With these changes to the Estimator, we can call its `fit` method to start training. After training completes, the tf.keras model will be saved in the SavedModel .pb format so it can be served by a TensorFlow Serving container.  Note that the model is only saved by the the master, Horovod rank = 0 process (once again disregard any warnings about the model not being saved by all the processes)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "remote_inputs = {'train' : inputs+'/train', 'validation' : inputs+'/validation', 'eval' : inputs+'/eval'}\n",
    "estimator_hvd.fit(remote_inputs, wait=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now plot training curves for the Horovod training job similar to the curves we plotted for the parameter servers training job:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp {estimator_hvd.model_data} ./hvd_model/model.tar.gz\n",
    "!tar -xzf ./hvd_model/model.tar.gz -C ./hvd_model\n",
    "\n",
    "with open('./hvd_model/hvd_history.p', \"r\") as f:\n",
    "    hvd_history = json.load(f)\n",
    "    \n",
    "plot_training_curves(hvd_history)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Comparing Results from Parameter Servers and Horovod\n",
    "\n",
    "Now that both training jobs are complete, we can compare the results.  The CIFAR-10 dataset is relatively small so the training jobs do not run long enough to draw detailed conclusions. However, it is likely that you can observe some differences even though training time was short, and training is a stochastic process with different results every time. The training time can be approximated for our purposes by looking at the \"Billable seconds\" log output at the end of each training job (at the bottom of the log output beneath the `fit` invocation code cells).\n",
    "\n",
    "The Horovod training job tends to take a bit longer than the parameter server training job, while producing a somewhat more accurate model. The relative speed result is consistent with research showing that Horovod is more performant for larger clusters and models, while parameter servers have the edge for smaller clusters and models such as this one. Also, asynchronous model updates like those used by the parameter servers here require more epochs to converge to more accurate models, so it is not surprising if the parameter server model training completed faster for the same number of epochs as Horovord, but was less accurate. It also is likely that you can observe that the training curves for the Horovod training job are a bit smoother, reflecting the fact that synchronous gradient updates typically are less noisy than asynchronous updates.\n",
    "\n",
    "The following table summarizes some general guidelines regarding performance for each option. These rules aren’t absolute, and ultimately, the best choice depends on the specific use case. Typically, the performance significantly depends on how long it takes to share gradient updates during training. In turn, this is affected by the model size, gradients size, GPU specifications, and network speed.\n",
    "\n",
    "|  | Better CPU Performance | Better GPU Performance |\n",
    "| :--- | :--- | :--- |\n",
    "| larger number of gradients/bigger model size | Parameter Servers | Parameter Servers, OR Horovod on a single instance with multiple GPUs |\n",
    "| smaller number of gradients/lesser model size | Parameter Servers | Horovod |\n",
    "\n",
    "Complexity is another consideration. Parameter servers are straightforward to use for one GPU per instance. However, to use multi-GPU instances, you must set up multiple towers, with each tower assigned to a different GPU. A “tower” is a function for computing inference and gradients for a single model replica, which in turn is a copy of a model training on a subset of the complete dataset. Towers involve a form of data parallelism. Horovod also employs data parallelism but abstracts away the implementation details.\n",
    "\n",
    "Finally, cluster size makes a difference. Given larger clusters with many GPUs, parameter server all-to-all communication can overwhelm network bandwidth. Reduced scaling efficiency can result, among other adverse effects. In such situations, you might find Horovod a better option."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model Deployment with Amazon Elastic Inference\n",
    "\n",
    "Amazon SageMaker supports both real time inference and batch inference. In this notebook, we will focus on setting up an Amazon SageMaker hosted endpoint for real time inference with TensorFlow Serving (TFS).  Additionally, we will discuss why and how to use Amazon Elastic Inference with the hosted endpoint.\n",
    "\n",
    "### Deploying the Model\n",
    "\n",
    "When considering the overall cost of a machine learning workload, inference often is the largest part, up to 90% of the total.  If a GPU instance type is used for real time inference, it typically is not fully utilized because, unlike training, real time inference usually does not involve continuously sending large batches of data to the model.  Elastic Inference provides GPU acceleration suited for inference, allowing you to add just the right amount of inference acceleration to a hosted endpoint for a fraction of the cost of using a full GPU instance.\n",
    "\n",
    "The `deploy` method of the Estimator object creates an endpoint which serves prediction requests in near real time.  To utilize Elastic Inference with the SageMaker TFS container, simply provide an `accelerator_type` parameter, which determines the type of accelerator that is attached to your endpoint. Refer to the **Inference Acceleration** section of the [instance types chart](https://aws.amazon.com/sagemaker/pricing/instance-types) for a listing of the supported types of accelerators. \n",
    "\n",
    "Here we'll use a general purpose CPU compute instance type along with an Elastic Inference accelerator:  together they are much cheaper than the smallest P3 GPU instance type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor = estimator_hvd.deploy(initial_instance_count=1,\n",
    "                                  instance_type='ml.m5.xlarge',\n",
    "                                  accelerator_type='ml.eia1.medium')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By using a general purpose CPU instance with an Elastic Inference accelerator instead of a GPU intance, substantial costs savings are achieved.  As of Q4 2019, On-Demand pricing for those resources is \\\\$0.269 per hour (ml.m5.xlarge), plus \\\\$0.182 per hour (ml.eia1.medium), for a total of \\\\$0.451 per hour. The total cost compared to the pricing of the smallest P3 family (NVIDIA Volta V100) GPU instance is as follows:\n",
    "\n",
    "- Elastic Inference solution: \\\\$0.451 per hour\n",
    "- GPU instance ml.p3.2xlarge: \\\\$4.284 per hour\n",
    "\n",
    "To summarize, the Elastic Inference solution cost is about 10% of the cost of using a full P3 family GPU instance. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Labels and Sample Data\n",
    "  \n",
    "Now that we have a Predictor object wrapping a real time Amazon SageMaker hosted enpoint, we'll define the label names and look at a sample of 10 images, one from each class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import Image, display\n",
    "\n",
    "os.system(\"aws s3 cp s3://sagemaker-workshop-pdx/cifar-10-module/sample-img ./sample-img --recursive --quiet\")\n",
    "\n",
    "labels = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']\n",
    "\n",
    "images = []\n",
    "for entry in os.scandir('sample-img'):\n",
    "    if entry.is_file() and entry.name.endswith(\"png\"):\n",
    "        images.append('sample-img/' + entry.name)\n",
    "\n",
    "for image in images:\n",
    "    display(Image(image))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pre/post-postprocessing Script\n",
    "\n",
    "The TFS container in Amazon SageMaker by default uses the TFS REST API to serve prediction requests. This requires the input data to be converted to JSON format.  One way to do this is to create a Docker container to do the conversion, then create an overall Amazon SageMaker model that links the conversion container to the TFS container with the model. This is known as an Amazon SageMaker Inference Pipeline, as demonstrated in another [sample notebook](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_batch_transform/working_with_tfrecords).  \n",
    "\n",
    "However, as a more convenient alternative for many use cases, the Amazon SageMaker TFS container provides a data pre/post-processing script feature that allows you to simply supply a data transformation script.  Using such a script, there is no need to build containers or directly work with Docker.  The simplest form of a script must only (1) implement an `input_handler` and `output_handler` interface, as shown in the code below, (2) be named `inference.py`, and (3) be placed in a `/code` directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget -q https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-script-mode/master/tf-distribution-options/code/inference.py\n",
    "!cat ./inference.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On the input preprocessing side, the code takes an image read from Amazon S3 and converts it to the required TFS REST API input format.  On the output postprocessing side, the script simply passes through the predictions in the standard TFS format without modifying them. Alternatively, we could have just returned a class label for the class with the highest score, or performed other postprocessing that would be helpful to the application consuming the predictions. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Requirements.txt\n",
    "\n",
    "Besides an `inference.py` script implementing the handler interface, it also may be necessary to supply a `requirements.txt` file to ensure any necessary dependencies are installed in the container along with the script.  For this script, in addition to the Python standard libraries we used the Pillow and Numpy libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget -q https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-script-mode/master/tf-distribution-options/code/requirements.txt\n",
    "!cat ./requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make Predictions\n",
    "\n",
    "Next we'll set up the Predictor object created by the `deploy` method call above. Since we are using a preprocessing script, we need to specify the Predictor's content type as `application/x-image` and override the default (JSON) serializer. We can now get predictions about the sample data displayed above simply by providing the raw .png image bytes to the Predictor.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.content_type = 'application/x-image'\n",
    "predictor.serializer = None\n",
    "\n",
    "def get_prediction(file_path):\n",
    "    \n",
    "    with open(file_path, \"rb\") as image:\n",
    "        f = image.read()\n",
    "    b = bytearray(f)\n",
    "    return labels[np.argmax(predictor.predict(b)['predictions'], axis=1)[0]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictions = [get_prediction(image) for image in images]\n",
    "print(predictions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extensions\n",
    "\n",
    "Although we did not demonstrate them in this notebook, Amazon SageMaker provides additional ways to make distributed training more efficient and cost effective for very large datasets:\n",
    "\n",
    "- **Sharding data**:  in this example we used one large TFRecord file containing the entire CIFAR-10 dataset, which is relatively small. However, larger datasets might require that you shard the data into multiple files, particularly if Pipe Mode is used (see below). Sharding may be accomplished by specifying an Amazon S3 data source as a [manifest file or ShardedByS3Key](https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html). \n",
    "\n",
    "- **VPC training**:  performing Horovod training inside a VPC improves the network latency between nodes, leading to higher performance and stability of Horovod training jobs.\n",
    "\n",
    "- **Pipe Mode**:  using [Pipe Mode](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-inputdataconfig) reduces startup and training times.  Pipe Mode streams training data from S3 as a Linux FIFO directly to the algorithm, without saving to disk.  For a small dataset such as CIFAR-10, Pipe Mode does not provide any advantage, but for very large datasets where training is I/O bound rather than CPU/GPU bound, Pipe Mode can substantially reduce startup and training times.\n",
    "\n",
    "- **Amazon FSx for Lustre and Amazon EFS**:  performance on large datasets in File Mode may be improved in some circumstances using either Amazon FSx for Lustre or Amazon EFS. For more details, please refer to the related [blog post](https://aws.amazon.com/blogs/machine-learning/speed-up-training-on-amazon-sagemaker-using-amazon-efs-or-amazon-fsx-for-lustre-file-systems).\n",
    "\n",
    "- **Managed Spot Training**:  uses Amazon EC2 Spot instances to run training jobs instead of on-demand instances. Managed spot training can optimize the cost of training models up to 90% over on-demand instances. For further details, please refer to the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cleanup\n",
    "\n",
    "To avoid incurring charges due to a stray endpoint, delete the Amazon SageMaker endpoint if you no longer need it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sagemaker_session.delete_endpoint(predictor.endpoint)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_tensorflow_p36",
   "language": "python",
   "name": "conda_tensorflow_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.  Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
 },
 "nbformat": 4,
 "nbformat_minor": 2
}