{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "verified-thirty",
   "metadata": {},
   "source": [
    "# Introduction to MPI on Amazon SageMaker\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "fc50ba05",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
    "\n",
    "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8772df8",
   "metadata": {},
   "source": [
    "\n",
    "Message Passing Interface (MPI) is the fundamental communication protocol for programming parallel computer programs. See its [wiki page](https://en.wikipedia.org/wiki/Message_Passing_Interface). [Open MPI](https://www.open-mpi.org/projects/user-docs/) is the implementation that's used as a basic building block for distributed training systems. \n",
    "\n",
    "In Python programs, you can interact with Open MPI APIs via [mpi4py](https://mpi4py.readthedocs.io/en/stable/overview.html) and easily convert your single-process python program into a parallel python program. \n",
    "\n",
    "Parallel processes can exist on one host (e.g. one EC2 instance) or multiple hosts (e.g. many EC2 instances). It's trivial to set up a parallel cluster (comm world, in MPI parlance) on one host via Open MPI, but it is less straight-forward to set up an MPI comm world across multiple instances. \n",
    "\n",
    "SageMaker does it for you. In this tutorial, you will go through a few basic (but exceeding important) [MPI communications](https://mpi4py.readthedocs.io/en/stable/tutorial.html) on SageMaker with **multiple instances** and you will verify that parallel processes across instances are indeed talking to each other. Those basic communications are the fundamental building blocks for distributed training."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "loose-google",
   "metadata": {},
   "source": [
    "## Environment \n",
    "We assume Open MPI and mpi4py have been installed in your environment. This is the case for SageMaker Notebook Instance or Studio. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bored-challenge",
   "metadata": {},
   "source": [
    "## Inspect the Python Program"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "funky-synthesis",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mmpi4py\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m MPI\n",
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mnumpy\u001b[39;49;00m \u001b[34mas\u001b[39;49;00m \u001b[04m\u001b[36mnp\u001b[39;49;00m\n",
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mtime\u001b[39;49;00m\n",
      "\n",
      "comm = MPI.COMM_WORLD\n",
      "size = comm.Get_size()\n",
      "rank = comm.Get_rank()\n",
      "\n",
      "\u001b[34mif\u001b[39;49;00m rank == \u001b[34m0\u001b[39;49;00m:\n",
      "    \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33mNumber of MPI processes that will talk to each other:\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, size)\n",
      "\n",
      "\n",
      "\u001b[34mdef\u001b[39;49;00m \u001b[32mpoint_to_point\u001b[39;49;00m():\n",
      "    \u001b[33m\"\"\"Point to point communication\u001b[39;49;00m\n",
      "\u001b[33m    Send a numpy array (buffer like object) from rank 0 to rank 1\u001b[39;49;00m\n",
      "\u001b[33m    \"\"\"\u001b[39;49;00m\n",
      "    \u001b[34mif\u001b[39;49;00m rank == \u001b[34m0\u001b[39;49;00m:\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mpoint to point\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\n",
      "        data = np.array([\u001b[34m0\u001b[39;49;00m, \u001b[34m1\u001b[39;49;00m, \u001b[34m2\u001b[39;49;00m], dtype=np.intc) \u001b[37m# int in C\u001b[39;49;00m\n",
      "\n",
      "        \u001b[37m# remember the difference between\u001b[39;49;00m\n",
      "        \u001b[37m# Upper case API and lower case API\u001b[39;49;00m\n",
      "        \u001b[37m# Basically uppper case API directly calls C API\u001b[39;49;00m\n",
      "        \u001b[37m# so it is fast\u001b[39;49;00m\n",
      "        \u001b[37m# checkout https://mpi4py.readthedocs.io/en/stable/\u001b[39;49;00m\n",
      "\n",
      "        comm.Send([data, MPI.INT], dest=\u001b[34m1\u001b[39;49;00m)\n",
      "    \u001b[34melif\u001b[39;49;00m rank == \u001b[34m1\u001b[39;49;00m:\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33mf\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33mHello I am rank \u001b[39;49;00m\u001b[33m{rank}\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\n",
      "        data = np.empty(\u001b[34m3\u001b[39;49;00m, dtype=np.intc)\n",
      "        comm.Recv([data, MPI.INT], source=\u001b[34m0\u001b[39;49;00m)\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mI received some data:\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, data)\n",
      "        \n",
      "    \u001b[34mif\u001b[39;49;00m rank == \u001b[34m0\u001b[39;49;00m:\n",
      "        time.sleep(\u001b[34m1\u001b[39;49;00m) \u001b[37m# give some buffer time for execution to complete\u001b[39;49;00m\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33m=\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m*\u001b[34m50\u001b[39;49;00m)\n",
      "    \u001b[34mreturn\u001b[39;49;00m\n",
      "\n",
      "\n",
      "\u001b[34mdef\u001b[39;49;00m \u001b[32mbroadcast\u001b[39;49;00m():\n",
      "    \u001b[33m\"\"\"Broadcast a numpy array from rank 0 to others\"\"\"\u001b[39;49;00m\n",
      "\n",
      "    \u001b[34mif\u001b[39;49;00m rank == \u001b[34m0\u001b[39;49;00m:\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33mf\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33mBroadcasting from rank \u001b[39;49;00m\u001b[33m{rank}\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\n",
      "        data = np.arange(\u001b[34m10\u001b[39;49;00m, dtype=np.intc)\n",
      "    \u001b[34melse\u001b[39;49;00m:\n",
      "        data = np.empty(\u001b[34m10\u001b[39;49;00m, dtype=np.intc)\n",
      "        \n",
      "    comm.Bcast([data, MPI.INT], root=\u001b[34m0\u001b[39;49;00m)\n",
      "    \u001b[36mprint\u001b[39;49;00m(\u001b[33mf\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mData at rank \u001b[39;49;00m\u001b[33m{rank}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, data)\n",
      "    \n",
      "    \u001b[34mif\u001b[39;49;00m rank ==\u001b[34m0\u001b[39;49;00m:\n",
      "        time.sleep(\u001b[34m1\u001b[39;49;00m)\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33m=\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m*\u001b[34m50\u001b[39;49;00m)\n",
      "    \u001b[34mreturn\u001b[39;49;00m\n",
      "\n",
      "\n",
      "\u001b[34mdef\u001b[39;49;00m \u001b[32mgather_reduce_broadcast\u001b[39;49;00m():\n",
      "    \u001b[33m\"\"\"Gather numpy arrays from all ranks to rank 0\u001b[39;49;00m\n",
      "\u001b[33m    then take average and broadcast result to other ranks\u001b[39;49;00m\n",
      "\u001b[33m    \u001b[39;49;00m\n",
      "\u001b[33m    It is a useful operation in distributed training:\u001b[39;49;00m\n",
      "\u001b[33m    train a model in a few MPI workers with different \u001b[39;49;00m\n",
      "\u001b[33m    input data, then take average weights on rank 0 and \u001b[39;49;00m\n",
      "\u001b[33m    synchroinze weights on other ranks\u001b[39;49;00m\n",
      "\u001b[33m    \"\"\"\u001b[39;49;00m\n",
      "    \n",
      "    \u001b[37m# stuff to gather at each rank\u001b[39;49;00m\n",
      "    sendbuf = np.zeros(\u001b[34m10\u001b[39;49;00m, dtype=np.intc) + rank\n",
      "    recvbuf = \u001b[34mNone\u001b[39;49;00m\n",
      "    \n",
      "    \u001b[34mif\u001b[39;49;00m rank == \u001b[34m0\u001b[39;49;00m:\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mGather and reduce\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\n",
      "        recvbuf = np.empty([size, \u001b[34m10\u001b[39;49;00m], dtype=np.intc)\n",
      "    comm.Gather(sendbuf, recvbuf, root=\u001b[34m0\u001b[39;49;00m)\n",
      "    \n",
      "    \u001b[34mif\u001b[39;49;00m rank == \u001b[34m0\u001b[39;49;00m:\n",
      "        \u001b[36mprint\u001b[39;49;00m(\u001b[33mf\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mI am rank \u001b[39;49;00m\u001b[33m{rank}\u001b[39;49;00m\u001b[33m, data I gathered is: \u001b[39;49;00m\u001b[33m{recvbuf}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n",
      "        \n",
      "        \u001b[37m# take average\u001b[39;49;00m\n",
      "        \u001b[37m# think of it as a prototype of\u001b[39;49;00m\n",
      "        \u001b[37m# average weights, average gradients etc\u001b[39;49;00m\n",
      "        avg = np.mean(recvbuf, axis=\u001b[34m0\u001b[39;49;00m, dtype=np.float)\n",
      "        \n",
      "    \u001b[34melse\u001b[39;49;00m:\n",
      "        \u001b[37m# get averaged array from rank 0\u001b[39;49;00m\n",
      "        \u001b[37m# think of it as a prototype of\u001b[39;49;00m\n",
      "        \u001b[37m# synchronizing weights across different MPI procs\u001b[39;49;00m\n",
      "        avg = np.empty(\u001b[34m10\u001b[39;49;00m, dtype=np.float) \n",
      "    \n",
      "    \u001b[37m# Note that the data type is float here\u001b[39;49;00m\n",
      "    \u001b[37m# because we took average \u001b[39;49;00m\n",
      "    comm.Bcast([avg, MPI.FLOAT], root=\u001b[34m0\u001b[39;49;00m)\n",
      "    \n",
      "    \u001b[36mprint\u001b[39;49;00m(\u001b[33mf\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33mI am rank \u001b[39;49;00m\u001b[33m{rank}\u001b[39;49;00m\u001b[33m, my avg is: \u001b[39;49;00m\u001b[33m{avg}\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\n",
      "    \u001b[34mreturn\u001b[39;49;00m\n",
      "        \n",
      "    \n",
      "\u001b[34mif\u001b[39;49;00m \u001b[31m__name__\u001b[39;49;00m == \u001b[33m'\u001b[39;49;00m\u001b[33m__main__\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m:\n",
      "    point_to_point()    \n",
      "    broadcast()\n",
      "    gather_reduce_broadcast()\n"
     ]
    }
   ],
   "source": [
    "!pygmentize mpi_demo.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sustained-knock",
   "metadata": {},
   "source": [
    "See the program in action with 2 parallel processes on your current environment. Make sure you have at least 2 cores."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ceramic-absence",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of MPI processes that will talk to each other:  2\n",
      "point to point\n",
      "Hello I am rank 1\n",
      "I received some data: [0 1 2]\n",
      "==================================================\n",
      "Broadcasting from rank 0\n",
      "Data at rank 0 [0 1 2 3 4 5 6 7 8 9]\n",
      "Data at rank 1 [0 1 2 3 4 5 6 7 8 9]\n",
      "==================================================\n",
      "Gather and reduce\n",
      "I am rank 0, data I gathered is: [[0 0 0 0 0 0 0 0 0 0]\n",
      " [1 1 1 1 1 1 1 1 1 1]]\n",
      "I am rank 0, my avg is: [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]\n",
      "I am rank 1, my avg is: [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]\n"
     ]
    }
   ],
   "source": [
    "!mpirun -np 2 python mpi_demo.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "useful-territory",
   "metadata": {},
   "source": [
    "## Scale it on SageMaker\n",
    "You can run the above program with $n$ processes per host across $N$ hosts on SageMaker (and get a comm world of size $n\\times N$). In the remaining of this notebook, you will use SageMaker TensorFlow deep learning container to run the above program. There is no particular reason for the choice, all SageMaker deep learning containers have Open MPI installed. So feel free to replace it with your favorite DLC. \n",
    "\n",
    "Check out the [SageMaker Python SDK Docs](https://sagemaker.readthedocs.io/en/stable/api/training/smd_model_parallel_general.html?highlight=mpi%20paramters#mpi-parameters) for the parameters needed to set up a distributed training job with MPI. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "large-country",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "from sagemaker import get_execution_role\n",
    "from sagemaker.tensorflow import TensorFlow\n",
    "\n",
    "role = get_execution_role()\n",
    "\n",
    "# Running 2 processes per host\n",
    "# if we use 3 instances,\n",
    "# then we should see 6 MPI processes\n",
    "\n",
    "distribution = {\"mpi\": {\"enabled\": True, \"processes_per_host\": 2}}\n",
    "\n",
    "tfest = TensorFlow(\n",
    "    entry_point=\"mpi_demo.py\",\n",
    "    role=role,\n",
    "    framework_version=\"2.3.0\",\n",
    "    distribution=distribution,\n",
    "    py_version=\"py37\",\n",
    "    instance_count=3,\n",
    "    instance_type=\"ml.c5.2xlarge\",  # 8 cores\n",
    "    output_path=\"s3://\" + sagemaker.Session().default_bucket() + \"/\" + \"mpi\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "adolescent-difficulty",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2021-05-11 19:56:11 Starting - Starting the training job...\n",
      "2021-05-11 19:56:35 Starting - Launching requested ML instancesProfilerReport-1620762971: InProgress\n",
      "......\n",
      "2021-05-11 19:57:36 Starting - Preparing the instances for training......\n",
      "2021-05-11 19:58:36 Downloading - Downloading input data...\n",
      "2021-05-11 19:59:09 Training - Training image download completed. Training in progress..\u001b[34m2021-05-11 19:59:14,435 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:14,442 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:14,937 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:14,951 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:14,959 sagemaker-training-toolkit INFO     Starting MPI run as worker node.\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:14,959 sagemaker-training-toolkit INFO     Waiting for MPI Master to create SSH daemon.\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:14,961 sagemaker-training-toolkit INFO     Cannot connect to host algo-1\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:14,961 sagemaker-training-toolkit INFO     Connection failed with exception: \n",
      " [Errno None] Unable to connect to port 22 on 10.0.97.76\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:15,967 paramiko.transport INFO     Connected (version 2.0, client OpenSSH_7.6p1)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,062 paramiko.transport INFO     Authentication (publickey) successful!\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,062 sagemaker-training-toolkit INFO     Can connect to host algo-1\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,062 sagemaker-training-toolkit INFO     MPI Master online, creating SSH daemon.\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,062 sagemaker-training-toolkit INFO     Writing environment variables to /etc/environment for the MPI process.\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,072 sagemaker-training-toolkit INFO     Waiting for MPI process to finish.\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:13,928 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:13,935 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:14,398 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:14,413 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:14,421 sagemaker-training-toolkit INFO     Starting MPI run as worker node.\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:14,422 sagemaker-training-toolkit INFO     Waiting for MPI Master to create SSH daemon.\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:14,423 sagemaker-training-toolkit INFO     Cannot connect to host algo-1\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:14,423 sagemaker-training-toolkit INFO     Connection failed with exception: \n",
      " [Errno None] Unable to connect to port 22 on 10.0.97.76\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:15,430 paramiko.transport INFO     Connected (version 2.0, client OpenSSH_7.6p1)\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:15,541 paramiko.transport INFO     Authentication (publickey) successful!\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:15,542 sagemaker-training-toolkit INFO     Can connect to host algo-1\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:15,542 sagemaker-training-toolkit INFO     MPI Master online, creating SSH daemon.\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:15,542 sagemaker-training-toolkit INFO     Writing environment variables to /etc/environment for the MPI process.\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:15,553 sagemaker-training-toolkit INFO     Waiting for MPI process to finish.\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:14,525 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:14,532 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:15,051 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:15,065 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:15,074 sagemaker-training-toolkit INFO     Starting MPI run as worker node.\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:15,074 sagemaker-training-toolkit INFO     Creating SSH daemon.\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:15,080 sagemaker-training-toolkit INFO     Waiting for MPI workers to establish their SSH connections\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:15,082 sagemaker-training-toolkit INFO     Cannot connect to host algo-2\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:15,082 sagemaker-training-toolkit INFO     Connection failed with exception: \n",
      " [Errno None] Unable to connect to port 22 on 10.0.76.118\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,089 paramiko.transport INFO     Connected (version 2.0, client OpenSSH_7.6p1)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,198 paramiko.transport INFO     Authentication (publickey) successful!\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,198 sagemaker-training-toolkit INFO     Can connect to host algo-2\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,198 sagemaker-training-toolkit INFO     Worker algo-2 available for communication\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,204 paramiko.transport INFO     Connected (version 2.0, client OpenSSH_7.6p1)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,315 paramiko.transport INFO     Authentication (publickey) successful!\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,315 sagemaker-training-toolkit INFO     Can connect to host algo-3\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,316 sagemaker-training-toolkit INFO     Worker algo-3 available for communication\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,316 sagemaker-training-toolkit INFO     Env Hosts: ['algo-1', 'algo-2', 'algo-3'] Hosts: ['algo-1:2', 'algo-2:2', 'algo-3:2'] process_per_hosts: 2 num_processes: 6\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,317 sagemaker-training-toolkit INFO     Network interface name: eth0\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,323 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:16,332 sagemaker-training-toolkit INFO     Invoking user script\n",
      "\u001b[0m\n",
      "\u001b[34mTraining Env:\n",
      "\u001b[0m\n",
      "\u001b[34m{\n",
      "    \"additional_framework_parameters\": {\n",
      "        \"sagemaker_mpi_num_of_processes_per_host\": 2,\n",
      "        \"sagemaker_mpi_custom_mpi_options\": \"\",\n",
      "        \"sagemaker_mpi_enabled\": true\n",
      "    },\n",
      "    \"channel_input_dirs\": {},\n",
      "    \"current_host\": \"algo-1\",\n",
      "    \"framework_module\": \"sagemaker_tensorflow_container.training:main\",\n",
      "    \"hosts\": [\n",
      "        \"algo-1\",\n",
      "        \"algo-2\",\n",
      "        \"algo-3\"\n",
      "    ],\n",
      "    \"hyperparameters\": {\n",
      "        \"model_dir\": \"/opt/ml/model\"\n",
      "    },\n",
      "    \"input_config_dir\": \"/opt/ml/input/config\",\n",
      "    \"input_data_config\": {},\n",
      "    \"input_dir\": \"/opt/ml/input\",\n",
      "    \"is_master\": true,\n",
      "    \"job_name\": \"tensorflow-training-2021-05-11-19-56-11-281\",\n",
      "    \"log_level\": 20,\n",
      "    \"master_hostname\": \"algo-1\",\n",
      "    \"model_dir\": \"/opt/ml/model\",\n",
      "    \"module_dir\": \"s3://sagemaker-us-west-2-688520471316/tensorflow-training-2021-05-11-19-56-11-281/source/sourcedir.tar.gz\",\n",
      "    \"module_name\": \"mpi_demo\",\n",
      "    \"network_interface_name\": \"eth0\",\n",
      "    \"num_cpus\": 8,\n",
      "    \"num_gpus\": 0,\n",
      "    \"output_data_dir\": \"/opt/ml/output/data\",\n",
      "    \"output_dir\": \"/opt/ml/output\",\n",
      "    \"output_intermediate_dir\": \"/opt/ml/output/intermediate\",\n",
      "    \"resource_config\": {\n",
      "        \"current_host\": \"algo-1\",\n",
      "        \"hosts\": [\n",
      "            \"algo-1\",\n",
      "            \"algo-2\",\n",
      "            \"algo-3\"\n",
      "        ],\n",
      "        \"network_interface_name\": \"eth0\"\n",
      "    },\n",
      "    \"user_entry_point\": \"mpi_demo.py\"\u001b[0m\n",
      "\u001b[34m}\n",
      "\u001b[0m\n",
      "\u001b[34mEnvironment variables:\n",
      "\u001b[0m\n",
      "\u001b[34mSM_HOSTS=[\"algo-1\",\"algo-2\",\"algo-3\"]\u001b[0m\n",
      "\u001b[34mSM_NETWORK_INTERFACE_NAME=eth0\u001b[0m\n",
      "\u001b[34mSM_HPS={\"model_dir\":\"/opt/ml/model\"}\u001b[0m\n",
      "\u001b[34mSM_USER_ENTRY_POINT=mpi_demo.py\u001b[0m\n",
      "\u001b[34mSM_FRAMEWORK_PARAMS={\"sagemaker_mpi_custom_mpi_options\":\"\",\"sagemaker_mpi_enabled\":true,\"sagemaker_mpi_num_of_processes_per_host\":2}\u001b[0m\n",
      "\u001b[34mSM_RESOURCE_CONFIG={\"current_host\":\"algo-1\",\"hosts\":[\"algo-1\",\"algo-2\",\"algo-3\"],\"network_interface_name\":\"eth0\"}\u001b[0m\n",
      "\u001b[34mSM_INPUT_DATA_CONFIG={}\u001b[0m\n",
      "\u001b[34mSM_OUTPUT_DATA_DIR=/opt/ml/output/data\u001b[0m\n",
      "\u001b[34mSM_CHANNELS=[]\u001b[0m\n",
      "\u001b[34mSM_CURRENT_HOST=algo-1\u001b[0m\n",
      "\u001b[34mSM_MODULE_NAME=mpi_demo\u001b[0m\n",
      "\u001b[34mSM_LOG_LEVEL=20\u001b[0m\n",
      "\u001b[34mSM_FRAMEWORK_MODULE=sagemaker_tensorflow_container.training:main\u001b[0m\n",
      "\u001b[34mSM_INPUT_DIR=/opt/ml/input\u001b[0m\n",
      "\u001b[34mSM_INPUT_CONFIG_DIR=/opt/ml/input/config\u001b[0m\n",
      "\u001b[34mSM_OUTPUT_DIR=/opt/ml/output\u001b[0m\n",
      "\u001b[34mSM_NUM_CPUS=8\u001b[0m\n",
      "\u001b[34mSM_NUM_GPUS=0\u001b[0m\n",
      "\u001b[34mSM_MODEL_DIR=/opt/ml/model\u001b[0m\n",
      "\u001b[34mSM_MODULE_DIR=s3://sagemaker-us-west-2-688520471316/tensorflow-training-2021-05-11-19-56-11-281/source/sourcedir.tar.gz\u001b[0m\n",
      "\u001b[34mSM_TRAINING_ENV={\"additional_framework_parameters\":{\"sagemaker_mpi_custom_mpi_options\":\"\",\"sagemaker_mpi_enabled\":true,\"sagemaker_mpi_num_of_processes_per_host\":2},\"channel_input_dirs\":{},\"current_host\":\"algo-1\",\"framework_module\":\"sagemaker_tensorflow_container.training:main\",\"hosts\":[\"algo-1\",\"algo-2\",\"algo-3\"],\"hyperparameters\":{\"model_dir\":\"/opt/ml/model\"},\"input_config_dir\":\"/opt/ml/input/config\",\"input_data_config\":{},\"input_dir\":\"/opt/ml/input\",\"is_master\":true,\"job_name\":\"tensorflow-training-2021-05-11-19-56-11-281\",\"log_level\":20,\"master_hostname\":\"algo-1\",\"model_dir\":\"/opt/ml/model\",\"module_dir\":\"s3://sagemaker-us-west-2-688520471316/tensorflow-training-2021-05-11-19-56-11-281/source/sourcedir.tar.gz\",\"module_name\":\"mpi_demo\",\"network_interface_name\":\"eth0\",\"num_cpus\":8,\"num_gpus\":0,\"output_data_dir\":\"/opt/ml/output/data\",\"output_dir\":\"/opt/ml/output\",\"output_intermediate_dir\":\"/opt/ml/output/intermediate\",\"resource_config\":{\"current_host\":\"algo-1\",\"hosts\":[\"algo-1\",\"algo-2\",\"algo-3\"],\"network_interface_name\":\"eth0\"},\"user_entry_point\":\"mpi_demo.py\"}\u001b[0m\n",
      "\u001b[34mSM_USER_ARGS=[\"--model_dir\",\"/opt/ml/model\"]\u001b[0m\n",
      "\u001b[34mSM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate\u001b[0m\n",
      "\u001b[34mSM_HP_MODEL_DIR=/opt/ml/model\u001b[0m\n",
      "\u001b[34mPYTHONPATH=/opt/ml/code:/usr/local/bin:/usr/local/lib/python37.zip:/usr/local/lib/python3.7:/usr/local/lib/python3.7/lib-dynload:/usr/local/lib/python3.7/site-packages\n",
      "\u001b[0m\n",
      "\u001b[34mInvoking script with the following command:\n",
      "\u001b[0m\n",
      "\u001b[34mmpirun --host algo-1:2,algo-2:2,algo-3:2 -np 6 --allow-run-as-root --display-map --tag-output -mca btl_tcp_if_include eth0 -mca oob_tcp_if_include eth0 -mca plm_rsh_no_tree_spawn 1 -bind-to none -map-by slot -mca pml ob1 -mca btl ^openib -mca orte_abort_on_non_zero_status 1 -x NCCL_MIN_NRINGS=4 -x NCCL_SOCKET_IFNAME=eth0 -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -x LD_PRELOAD=/usr/local/lib/python3.7/site-packages/gethostname.cpython-37m-x86_64-linux-gnu.so -x SM_HOSTS -x SM_NETWORK_INTERFACE_NAME -x SM_HPS -x SM_USER_ENTRY_POINT -x SM_FRAMEWORK_PARAMS -x SM_RESOURCE_CONFIG -x SM_INPUT_DATA_CONFIG -x SM_OUTPUT_DATA_DIR -x SM_CHANNELS -x SM_CURRENT_HOST -x SM_MODULE_NAME -x SM_LOG_LEVEL -x SM_FRAMEWORK_MODULE -x SM_INPUT_DIR -x SM_INPUT_CONFIG_DIR -x SM_OUTPUT_DIR -x SM_NUM_CPUS -x SM_NUM_GPUS -x SM_MODEL_DIR -x SM_MODULE_DIR -x SM_TRAINING_ENV -x SM_USER_ARGS -x SM_OUTPUT_INTERMEDIATE_DIR -x SM_HP_MODEL_DIR -x PYTHONPATH /usr/local/bin/python3.7 -m mpi4py mpi_demo.py --model_dir /opt/ml/model\n",
      "\n",
      "\n",
      " Data for JOB [44607,1] offset 0 Total slots allocated 6\n",
      "\n",
      " ========================   JOB MAP   ========================\n",
      "\n",
      " Data for node: ip-10-0-97-76#011Num slots: 2#011Max slots: 0#011Num procs: 2\n",
      " #011Process OMPI jobid: [44607,1] App: 0 Process rank: 0 Bound: N/A\n",
      " #011Process OMPI jobid: [44607,1] App: 0 Process rank: 1 Bound: N/A\n",
      "\n",
      " Data for node: algo-2#011Num slots: 2#011Max slots: 0#011Num procs: 2\n",
      " #011Process OMPI jobid: [44607,1] App: 0 Process rank: 2 Bound: N/A\n",
      " #011Process OMPI jobid: [44607,1] App: 0 Process rank: 3 Bound: N/A\n",
      "\n",
      " Data for node: algo-3#011Num slots: 2#011Max slots: 0#011Num procs: 2\n",
      " #011Process OMPI jobid: [44607,1] App: 0 Process rank: 4 Bound: N/A\n",
      " #011Process OMPI jobid: [44607,1] App: 0 Process rank: 5 Bound: N/A\n",
      "\n",
      " =============================================================\u001b[0m\n",
      "\u001b[34m[1,1]<stdout>:Hello I am rank 1\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>:Number of MPI processes that will talk to each other:  6\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>:point to point\u001b[0m\n",
      "\u001b[34m[1,1]<stdout>:I received some data: [1,1]<stdout>:[0 1 2]\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>:==================================================\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>:Broadcasting from rank 0\u001b[0m\n",
      "\u001b[34m[1,3]<stdout>:Data at rank 3 [1,2]<stdout>:Data at rank 2 [1,2]<stdout>:[0 1 2 3 4 5 6 7 8 9]\u001b[0m\n",
      "\u001b[34m[1,3]<stdout>:[0 1 2 3 4 5 6 7 8 9]\u001b[0m\n",
      "\u001b[34m[1,1]<stdout>:Data at rank 1 [1,0]<stdout>:Data at rank 0 [1,1]<stdout>:[0 1 2 3 4 5 6 7 8 9]\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>:[0 1 2 3 4 5 6 7 8 9]\u001b[0m\n",
      "\u001b[34m[1,4]<stdout>:Data at rank 4 [1,5]<stdout>:Data at rank 5 [1,4]<stdout>:[0 1 2 3 4 5 6 7 8 9]\u001b[0m\n",
      "\u001b[34m[1,5]<stdout>:[0 1 2 3 4 5 6 7 8 9]\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>:==================================================\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>:Gather and reduce\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>:I am rank 0, data I gathered is: [[0 0 0 0 0 0 0 0 0 0]\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>: [1 1 1 1 1 1 1 1 1 1]\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>: [2 2 2 2 2 2 2 2 2 2]\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>: [3 3 3 3 3 3 3 3 3 3]\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>: [4 4 4 4 4 4 4 4 4 4]\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>: [5 5 5 5 5 5 5 5 5 5]]\u001b[0m\n",
      "\u001b[34m[1,0]<stdout>:I am rank 0, my avg is: [2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5]\u001b[0m\n",
      "\u001b[34m[1,1]<stdout>:I am rank 1, my avg is: [2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5]\u001b[0m\n",
      "\u001b[34m[1,4]<stdout>:I am rank 4, my avg is: [2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5]\u001b[0m\n",
      "\u001b[34m[1,5]<stdout>:I am rank 5, my avg is: [2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5]\u001b[0m\n",
      "\u001b[34m[1,2]<stdout>:I am rank 2, my avg is: [2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5]\u001b[0m\n",
      "\u001b[34m[1,3]<stdout>:I am rank 3, my avg is: [2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5]\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:19,754 sagemaker_tensorflow_container.training WARNING  No model artifact is saved under path /opt/ml/model. Your training job will not save any model files to S3.\u001b[0m\n",
      "\u001b[34mFor details of how to construct your training script see:\u001b[0m\n",
      "\u001b[34mhttps://sagemaker.readthedocs.io/en/stable/using_tf.html#adapting-your-local-tensorflow-script\u001b[0m\n",
      "\u001b[34m2021-05-11 19:59:19,754 sagemaker-training-toolkit INFO     Reporting training SUCCESS\u001b[0m\n",
      "\u001b[32m2021-05-11 19:59:49,799 sagemaker-training-toolkit INFO     MPI process finished.\u001b[0m\n",
      "\u001b[32m2021-05-11 19:59:49,800 sagemaker_tensorflow_container.training WARNING  No model artifact is saved under path /opt/ml/model. Your training job will not save any model files to S3.\u001b[0m\n",
      "\u001b[32mFor details of how to construct your training script see:\u001b[0m\n",
      "\u001b[32mhttps://sagemaker.readthedocs.io/en/stable/using_tf.html#adapting-your-local-tensorflow-script\u001b[0m\n",
      "\u001b[32m2021-05-11 19:59:49,800 sagemaker-training-toolkit INFO     Reporting training SUCCESS\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:49,782 sagemaker-training-toolkit INFO     MPI process finished.\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:49,783 sagemaker_tensorflow_container.training WARNING  No model artifact is saved under path /opt/ml/model. Your training job will not save any model files to S3.\u001b[0m\n",
      "\u001b[35mFor details of how to construct your training script see:\u001b[0m\n",
      "\u001b[35mhttps://sagemaker.readthedocs.io/en/stable/using_tf.html#adapting-your-local-tensorflow-script\u001b[0m\n",
      "\u001b[35m2021-05-11 19:59:49,783 sagemaker-training-toolkit INFO     Reporting training SUCCESS\u001b[0m\n",
      "\n",
      "2021-05-11 19:59:58 Uploading - Uploading generated training model\n",
      "2021-05-11 19:59:58 Completed - Training job completed\n",
      "Training seconds: 270\n",
      "Billable seconds: 270\n"
     ]
    }
   ],
   "source": [
    "tfest.fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "artificial-major",
   "metadata": {},
   "source": [
    "The stdout \"Number of MPI processes that will talk to each other:  6\" indicates that the processes on all hosts are included in the comm world. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "mounted-california",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "In this notebook, you went through some fundamental MPI operations, which are the bare bones of inner workings of many distributed training frameworks. You did that on SageMaker with multiple instances. You can scale up this set up to include more instances in a real ML project."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2fd4f786",
   "metadata": {},
   "source": [
    "## Notebook CI Test Results\n",
    "\n",
    "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
    "\n",
    "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n",
    "\n",
    "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/training|distributed_training|mpi_on_sagemaker|intro|mpi_demo.ipynb)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (Data Science 3.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}