{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instruction Fine tune GPT NEO\n",
    "\n",
    "Language models have recently exploded in both size and popularity. In 2018, BERT-large entered the scene and, with its 340M parameters and novel transformer architecture, set the standard on NLP task accuracy. Within just a few years, state-of-the-art NLP model size has grown by more than 500x with models such as OpenAI’s 175 billion parameter GPT-3 and similarly sized open source Bloom 176B raising the bar on NLP accuracy. This increase in the number of parameters is driven by the simple and empirically-demonstrated positive relationship between model size and accuracy: more is better. With easy access from models zoos such as Hugging Face and improved accuracy in NLP tasks such as classification and text generation, practitioners are increasingly reaching for these large models. However, deploying them can be a challenge because of their size.\n",
    "\n",
    "In this notebook, we explore how to train a large language model - GPT-Neo on SageMaker using SageMaker Distributed Model Parallel Library.\n",
    "SageMaker provides distributed training libraries and supports various distributed training options for deep learning tasks such as computer vision (CV) and natural language processing (NLP). With SageMaker’s distributed training libraries, you can run highly scalable and cost-effective custom data parallel and model parallel deep learning training jobs. For training GPT-Neo model we will be using Sharded Data Parallel(SDP). Sharded data parallelism is a memory-saving distributed training technique that splits the training state of a model (model parameters, gradients, and optimizer states) across GPUs in a data parallel group."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Licence agreement\n",
    " - View license information https://github.com/EleutherAI/gpt-neox/blob/main/LICENSE before using the model.\n",
    " - This notebook is a sample notebook and not intended for production use. Please refer to the licence at https://github.com/aws/mit-0.\n",
    "\n",
    "\n",
    " \n",
    " \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Lets begin by installing SageMaker SDK and importing libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: sagemaker in /opt/conda/lib/python3.7/site-packages (2.147.0)\n",
      "Collecting sagemaker\n",
      "  Using cached sagemaker-2.150.0-py2.py3-none-any.whl\n",
      "Collecting tblib==1.7.0\n",
      "  Using cached tblib-1.7.0-py2.py3-none-any.whl (12 kB)\n",
      "Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (20.1)\n",
      "Requirement already satisfied: jsonschema in /opt/conda/lib/python3.7/site-packages (from sagemaker) (3.2.0)\n",
      "Requirement already satisfied: protobuf<4.0,>=3.1 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (3.20.3)\n",
      "Requirement already satisfied: schema in /opt/conda/lib/python3.7/site-packages (from sagemaker) (0.7.5)\n",
      "Requirement already satisfied: google-pasta in /opt/conda/lib/python3.7/site-packages (from sagemaker) (0.2.0)\n",
      "Requirement already satisfied: boto3<2.0,>=1.26.28 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (1.26.111)\n",
      "Requirement already satisfied: importlib-metadata<5.0,>=1.4.0 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (4.13.0)\n",
      "Requirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from sagemaker) (1.3.5)\n",
      "Requirement already satisfied: platformdirs in /opt/conda/lib/python3.7/site-packages (from sagemaker) (3.2.0)\n",
      "Requirement already satisfied: protobuf3-to-dict<1.0,>=0.1.5 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (0.1.5)\n",
      "Requirement already satisfied: numpy<2.0,>=1.9.0 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (1.21.6)\n",
      "Requirement already satisfied: smdebug-rulesconfig==1.0.1 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (1.0.1)\n",
      "Requirement already satisfied: cloudpickle==2.2.1 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (2.2.1)\n",
      "Requirement already satisfied: attrs<23,>=20.3.0 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (22.2.0)\n",
      "Requirement already satisfied: PyYAML==5.4.1 in /opt/conda/lib/python3.7/site-packages (from sagemaker) (5.4.1)\n",
      "Requirement already satisfied: pathos in /opt/conda/lib/python3.7/site-packages (from sagemaker) (0.3.0)\n",
      "Requirement already satisfied: botocore<1.30.0,>=1.29.111 in /opt/conda/lib/python3.7/site-packages (from boto3<2.0,>=1.26.28->sagemaker) (1.29.111)\n",
      "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.7/site-packages (from boto3<2.0,>=1.26.28->sagemaker) (1.0.1)\n",
      "Requirement already satisfied: s3transfer<0.7.0,>=0.6.0 in /opt/conda/lib/python3.7/site-packages (from boto3<2.0,>=1.26.28->sagemaker) (0.6.0)\n",
      "Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata<5.0,>=1.4.0->sagemaker) (4.5.0)\n",
      "Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata<5.0,>=1.4.0->sagemaker) (3.15.0)\n",
      "Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=20.0->sagemaker) (2.4.6)\n",
      "Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from packaging>=20.0->sagemaker) (1.14.0)\n",
      "Requirement already satisfied: pyrsistent>=0.14.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema->sagemaker) (0.15.7)\n",
      "Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from jsonschema->sagemaker) (59.3.0)\n",
      "Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas->sagemaker) (2019.3)\n",
      "Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas->sagemaker) (2.8.2)\n",
      "Requirement already satisfied: dill>=0.3.6 in /opt/conda/lib/python3.7/site-packages (from pathos->sagemaker) (0.3.6)\n",
      "Requirement already satisfied: multiprocess>=0.70.14 in /opt/conda/lib/python3.7/site-packages (from pathos->sagemaker) (0.70.14)\n",
      "Requirement already satisfied: pox>=0.3.2 in /opt/conda/lib/python3.7/site-packages (from pathos->sagemaker) (0.3.2)\n",
      "Requirement already satisfied: ppft>=1.7.6.6 in /opt/conda/lib/python3.7/site-packages (from pathos->sagemaker) (1.7.6.6)\n",
      "Requirement already satisfied: contextlib2>=0.5.5 in /opt/conda/lib/python3.7/site-packages (from schema->sagemaker) (0.6.0.post1)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.25.4 in /opt/conda/lib/python3.7/site-packages (from botocore<1.30.0,>=1.29.111->boto3<2.0,>=1.26.28->sagemaker) (1.26.15)\n",
      "Installing collected packages: tblib, sagemaker\n",
      "  Attempting uninstall: tblib\n",
      "    Found existing installation: tblib 1.6.0\n",
      "    Uninstalling tblib-1.6.0:\n",
      "      Successfully uninstalled tblib-1.6.0\n",
      "  Attempting uninstall: sagemaker\n",
      "    Found existing installation: sagemaker 2.147.0\n",
      "    Uninstalling sagemaker-2.147.0:\n",
      "      Successfully uninstalled sagemaker-2.147.0\n",
      "Successfully installed sagemaker-2.150.0 tblib-1.7.0\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "! pip install -U sagemaker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "from sagemaker.pytorch import PyTorch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sagemaker role arn: arn:aws:iam::706553727873:role/service-role/AmazonSageMaker-ExecutionRole-20211019T121285\n",
      "sagemaker bucket: sagemaker-us-east-1-706553727873\n",
      "sagemaker session region: us-east-1\n"
     ]
    }
   ],
   "source": [
    "sess = sagemaker.Session()\n",
    "# sagemaker session bucket -> used for uploading data, models and logs\n",
    "# sagemaker will automatically create this bucket if it not exists\n",
    "sagemaker_session_bucket=None\n",
    "if sagemaker_session_bucket is None and sess is not None:\n",
    "    # set to default bucket if a bucket name is not given\n",
    "    sagemaker_session_bucket = sess.default_bucket()\n",
    "\n",
    "role = sagemaker.get_execution_role()\n",
    "\n",
    "sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)\n",
    "\n",
    "print(f\"sagemaker role arn: {role}\")\n",
    "print(f\"sagemaker bucket: {sess.default_bucket()}\")\n",
    "print(f\"sagemaker session region: {sess.boto_region_name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Data Preparation\n",
    "\n",
    "For running the training job we will use a dataset available in Huggingface datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting datasets\n",
      "  Downloading datasets-2.11.0-py3-none-any.whl (468 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m468.7/468.7 kB\u001b[0m \u001b[31m7.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: multiprocess in /opt/conda/lib/python3.7/site-packages (from datasets) (0.70.14)\n",
      "Collecting xxhash\n",
      "  Using cached xxhash-3.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB)\n",
      "Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from datasets) (4.13.0)\n",
      "Requirement already satisfied: fsspec[http]>=2021.11.1 in /opt/conda/lib/python3.7/site-packages (from datasets) (2023.1.0)\n",
      "Requirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from datasets) (1.3.5)\n",
      "Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from datasets) (20.1)\n",
      "Collecting huggingface-hub<1.0.0,>=0.11.0\n",
      "  Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m224.5/224.5 kB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: aiohttp in /opt/conda/lib/python3.7/site-packages (from datasets) (3.8.4)\n",
      "Requirement already satisfied: requests>=2.19.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (2.28.2)\n",
      "Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.7/site-packages (from datasets) (1.21.6)\n",
      "Requirement already satisfied: dill<0.3.7,>=0.3.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (0.3.6)\n",
      "Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.7/site-packages (from datasets) (5.4.1)\n",
      "Collecting tqdm>=4.62.1\n",
      "  Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)\n",
      "Collecting responses<0.19\n",
      "  Using cached responses-0.18.0-py3-none-any.whl (38 kB)\n",
      "Requirement already satisfied: pyarrow>=8.0.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (11.0.0)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (6.0.4)\n",
      "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (2.0.4)\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (4.0.2)\n",
      "Requirement already satisfied: asynctest==0.13.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (0.13.0)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (22.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (1.3.3)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (1.8.2)\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (1.3.1)\n",
      "Requirement already satisfied: typing-extensions>=3.7.4 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (4.5.0)\n",
      "Requirement already satisfied: filelock in /opt/conda/lib/python3.7/site-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets) (3.0.12)\n",
      "Collecting packaging\n",
      "  Downloading packaging-23.1-py3-none-any.whl (48 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.9/48.9 kB\u001b[0m \u001b[31m1.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (2.8)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (2022.12.7)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (1.26.15)\n",
      "Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->datasets) (3.15.0)\n",
      "Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas->datasets) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas->datasets) (2019.3)\n",
      "Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.14.0)\n",
      "Installing collected packages: xxhash, tqdm, packaging, responses, huggingface-hub, datasets\n",
      "  Attempting uninstall: tqdm\n",
      "    Found existing installation: tqdm 4.42.1\n",
      "    Uninstalling tqdm-4.42.1:\n",
      "      Successfully uninstalled tqdm-4.42.1\n",
      "  Attempting uninstall: packaging\n",
      "    Found existing installation: packaging 20.1\n",
      "    Uninstalling packaging-20.1:\n",
      "      Successfully uninstalled packaging-20.1\n",
      "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "pytest-astropy 0.8.0 requires pytest-cov>=2.0, which is not installed.\n",
      "pytest-astropy 0.8.0 requires pytest-filter-subpackage>=0.1, which is not installed.\u001b[0m\u001b[31m\n",
      "\u001b[0mSuccessfully installed datasets-2.11.0 huggingface-hub-0.14.1 packaging-23.1 responses-0.18.0 tqdm-4.65.0 xxhash-3.2.0\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "! pip install datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7eb09a6680fe492d827f75a064e4289e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "A Jupyter Widget"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading and preparing dataset parquet/tatsu-lab--alpaca to /root/.cache/huggingface/datasets/tatsu-lab___parquet/tatsu-lab--alpaca-9b55fb286e3c7ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "94d2eb43295041bf98eab44ffc7886f2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "A Jupyter Widget"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d66c506a729040b18de101e8307e7fbd",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "A Jupyter Widget"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "febb4891e36246118fef9312c4872f02",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "A Jupyter Widget"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "A Jupyter Widget"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/tatsu-lab___parquet/tatsu-lab--alpaca-9b55fb286e3c7ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e66029cb320d4337a9bd1ee3ab39e1e8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "A Jupyter Widget"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "instruction_data = load_dataset('tatsu-lab/alpaca')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "instructionDF = pd.DataFrame(instruction_data[\"train\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "train_df = instructionDF.iloc[:5000]\n",
    "valid_df = instructionDF.iloc[5000:7000]\n",
    "\n",
    "train_df.to_csv(\"train.csv\",index=False)\n",
    "valid_df.to_csv(\"valid.csv\",index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Upload the training data to s3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "train_data_url = sess.upload_data(\n",
    "    path=\"train.csv\",\n",
    "    key_prefix=\"alpaca\",\n",
    ")\n",
    "\n",
    "valid_data_url = sess.upload_data(\n",
    "    path=\"valid.csv\",\n",
    "    key_prefix=\"alpaca\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "training file path s3://sagemaker-us-east-1-706553727873/alpaca/train.csv\n",
      "validation file path s3://sagemaker-us-east-1-706553727873/alpaca/valid.csv\n"
     ]
    }
   ],
   "source": [
    "print(f\"training file path {train_data_url}\")\n",
    "print(f\"validation file path {valid_data_url}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train Model\n",
    "\n",
    "Now we are ready to run the training using SageMaker Estimator. A training script is required for SageMaker PyTorch estimator to run a model training job. Below is the script for fine-tuning a pretrained Hugging Face GPT-Neo model with the dataset we just put in the S3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mtorch\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mmath\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mtorch\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mutils\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mdata\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m DataLoader\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mtransformers\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m (\u001b[37m\u001b[39;49;00m\n",
      "    AutoModelForCausalLM,\u001b[37m\u001b[39;49;00m\n",
      "    AutoTokenizer,\u001b[37m\u001b[39;49;00m\n",
      "    default_data_collator,\u001b[37m\u001b[39;49;00m\n",
      "    get_scheduler,\u001b[37m\u001b[39;49;00m\n",
      ")\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mitertools\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m chain\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mcopy\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mdatasets\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m load_dataset\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mtqdm\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m tqdm\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mutils\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m parse_args\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36msmdistributed\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mmodelparallel\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36msmdistributed\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mmodelparallel\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mtorch\u001b[39;49;00m \u001b[34mas\u001b[39;49;00m \u001b[04m\u001b[36msmp\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mutils\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m is_main_process,main_process_first,wait_for_everyone\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[90m@smp\u001b[39;49;00m.step\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mdef\u001b[39;49;00m \u001b[32mtrain_step\u001b[39;49;00m(model, batch):\u001b[37m\u001b[39;49;00m\n",
      "    loss = model(**batch)[\u001b[33m\"\u001b[39;49;00m\u001b[33mloss\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m]\u001b[37m\u001b[39;49;00m\n",
      "    model.backward(loss)\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mreturn\u001b[39;49;00m loss\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[90m@smp\u001b[39;49;00m.step\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mdef\u001b[39;49;00m \u001b[32mtest_step\u001b[39;49;00m(model, batch):\u001b[37m\u001b[39;49;00m\n",
      "    output = model(**batch)\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mreturn\u001b[39;49;00m output\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mdef\u001b[39;49;00m \u001b[32mmain\u001b[39;49;00m():\u001b[37m\u001b[39;49;00m\n",
      "    args = parse_args()\u001b[37m\u001b[39;49;00m\n",
      "    smp.init()\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    text_column = \u001b[33m\"\u001b[39;49;00m\u001b[33mquestion\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    label_column = \u001b[33m\"\u001b[39;49;00m\u001b[33manswer\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    torch.manual_seed(args.seed)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    tokenizer = AutoTokenizer.from_pretrained(args.model_name_or_path)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mif\u001b[39;49;00m args.block_size \u001b[35mis\u001b[39;49;00m \u001b[34mNone\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n",
      "        block_size = tokenizer.model_max_length\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[34mif\u001b[39;49;00m block_size > \u001b[34m1024\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[36mprint\u001b[39;49;00m(\u001b[37m\u001b[39;49;00m\n",
      "                \u001b[33m\"\u001b[39;49;00m\u001b[33mThe chosen tokenizer supports a `model_max_length` that is longer than the default `block_size` value\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "                \u001b[33m\"\u001b[39;49;00m\u001b[33m of 1024. If you would like to use a longer `block_size` up to `tokenizer.model_max_length` you can\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "                \u001b[33m\"\u001b[39;49;00m\u001b[33m override this default with `--block_size xxx`.\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "            )\u001b[37m\u001b[39;49;00m\n",
      "        block_size = \u001b[34m1024\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[34mif\u001b[39;49;00m args.block_size > tokenizer.model_max_length:\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[36mprint\u001b[39;49;00m(\u001b[37m\u001b[39;49;00m\n",
      "                \u001b[33mf\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mThe block_size passed (\u001b[39;49;00m\u001b[33m{\u001b[39;49;00margs.block_size\u001b[33m}\u001b[39;49;00m\u001b[33m) is larger than the maximum length for the model\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "                \u001b[33mf\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33m(\u001b[39;49;00m\u001b[33m{\u001b[39;49;00mtokenizer.model_max_length\u001b[33m}\u001b[39;49;00m\u001b[33m). Using block_size=\u001b[39;49;00m\u001b[33m{\u001b[39;49;00mtokenizer.model_max_length\u001b[33m}\u001b[39;49;00m\u001b[33m.\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "            )\u001b[37m\u001b[39;49;00m\n",
      "        block_size = \u001b[36mmin\u001b[39;49;00m(args.block_size, tokenizer.model_max_length)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    dataset = load_dataset(\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[33m'\u001b[39;49;00m\u001b[33mcsv\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, data_files={\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[33m\"\u001b[39;49;00m\u001b[33mtrain\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: args.train_file,\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[33m\"\u001b[39;49;00m\u001b[33mvalidation\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: args.validation_file,\u001b[37m\u001b[39;49;00m\n",
      "        })\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mdef\u001b[39;49;00m \u001b[32mpreprocess_function\u001b[39;49;00m(examples):\u001b[37m\u001b[39;49;00m\n",
      "        inputs = [prompt + tokenizer.eos_token \u001b[34mfor\u001b[39;49;00m prompt \u001b[35min\u001b[39;49;00m examples[\u001b[33m\"\u001b[39;49;00m\u001b[33mtext\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m]]\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "        model_inputs = tokenizer(inputs)\u001b[37m\u001b[39;49;00m\n",
      "        model_inputs[\u001b[33m\"\u001b[39;49;00m\u001b[33mlabels\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m] = copy.deepcopy(model_inputs[\u001b[33m\"\u001b[39;49;00m\u001b[33minput_ids\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m])\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[34mreturn\u001b[39;49;00m model_inputs\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mdef\u001b[39;49;00m \u001b[32mgroup_texts\u001b[39;49;00m(examples):\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[37m# Concatenate all texts.\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "        concatenated_examples = {k: \u001b[36mlist\u001b[39;49;00m(chain(*examples[k])) \u001b[34mfor\u001b[39;49;00m k \u001b[35min\u001b[39;49;00m examples.keys()}\u001b[37m\u001b[39;49;00m\n",
      "        total_length = \u001b[36mlen\u001b[39;49;00m(concatenated_examples[\u001b[36mlist\u001b[39;49;00m(examples.keys())[\u001b[34m0\u001b[39;49;00m]])\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[37m# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[37m# customize this part to your needs.\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[34mif\u001b[39;49;00m total_length >= block_size:\u001b[37m\u001b[39;49;00m\n",
      "            total_length = (total_length // block_size) * block_size\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[37m# Split by chunks of max_len.\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "        result = {\u001b[37m\u001b[39;49;00m\n",
      "            k: [t[i : i + block_size] \u001b[34mfor\u001b[39;49;00m i \u001b[35min\u001b[39;49;00m \u001b[36mrange\u001b[39;49;00m(\u001b[34m0\u001b[39;49;00m, total_length, block_size)]\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[34mfor\u001b[39;49;00m k, t \u001b[35min\u001b[39;49;00m concatenated_examples.items()\u001b[37m\u001b[39;49;00m\n",
      "        }\u001b[37m\u001b[39;49;00m\n",
      "        result[\u001b[33m\"\u001b[39;49;00m\u001b[33mlabels\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m] = result[\u001b[33m\"\u001b[39;49;00m\u001b[33minput_ids\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m].copy()\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[34mreturn\u001b[39;49;00m result\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mwith\u001b[39;49;00m main_process_first(smp.rank()):\u001b[37m\u001b[39;49;00m\n",
      "        tokenized_datasets = dataset.map(\u001b[37m\u001b[39;49;00m\n",
      "            preprocess_function,\u001b[37m\u001b[39;49;00m\n",
      "            batched=\u001b[34mTrue\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "            num_proc=args.preprocessing_num_workers,\u001b[37m\u001b[39;49;00m\n",
      "            remove_columns=dataset[\u001b[33m\"\u001b[39;49;00m\u001b[33mtrain\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m].column_names,\u001b[37m\u001b[39;49;00m\n",
      "            load_from_cache_file=\u001b[34mTrue\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "            desc=\u001b[33m\"\u001b[39;49;00m\u001b[33mRunning tokenizer on dataset\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "        )\u001b[37m\u001b[39;49;00m\n",
      "        processed_datasets = tokenized_datasets.map(\u001b[37m\u001b[39;49;00m\n",
      "                group_texts,\u001b[37m\u001b[39;49;00m\n",
      "                batched=\u001b[34mTrue\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "                num_proc=args.preprocessing_num_workers,\u001b[37m\u001b[39;49;00m\n",
      "                desc=\u001b[33mf\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mGrouping texts in chunks of \u001b[39;49;00m\u001b[33m{\u001b[39;49;00mblock_size\u001b[33m}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "            )\u001b[37m\u001b[39;49;00m\n",
      "     \u001b[37m\u001b[39;49;00m\n",
      "    wait_for_everyone()\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    train_dataset = processed_datasets[\u001b[33m\"\u001b[39;49;00m\u001b[33mtrain\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m]\u001b[37m\u001b[39;49;00m\n",
      "    eval_dataset = processed_datasets[\u001b[33m\"\u001b[39;49;00m\u001b[33mvalidation\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m]\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    train_sampler = torch.utils.data.DistributedSampler(\u001b[37m\u001b[39;49;00m\n",
      "                train_dataset,\u001b[37m\u001b[39;49;00m\n",
      "                shuffle=\u001b[34mTrue\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "                seed=args.seed,\u001b[37m\u001b[39;49;00m\n",
      "                rank=smp.dp_rank(),\u001b[37m\u001b[39;49;00m\n",
      "                num_replicas=smp.dp_size(),\u001b[37m\u001b[39;49;00m\n",
      "                drop_last=\u001b[34mTrue\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "            )\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m\u001b[39;49;00m\n",
      "    eval_sampler = torch.utils.data.DistributedSampler(\u001b[37m\u001b[39;49;00m\n",
      "                eval_dataset,\u001b[37m\u001b[39;49;00m\n",
      "                shuffle=\u001b[34mTrue\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "                seed=args.seed,\u001b[37m\u001b[39;49;00m\n",
      "                rank=smp.dp_rank(),\u001b[37m\u001b[39;49;00m\n",
      "                num_replicas=smp.dp_size(),\u001b[37m\u001b[39;49;00m\n",
      "                drop_last=\u001b[34mTrue\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "            )\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    train_dataloader = DataLoader(\u001b[37m\u001b[39;49;00m\n",
      "        train_dataset, sampler=train_sampler, collate_fn=default_data_collator, batch_size=args.per_device_train_batch_size, pin_memory=\u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    )\u001b[37m\u001b[39;49;00m\n",
      "    eval_dataloader = DataLoader(\u001b[37m\u001b[39;49;00m\n",
      "        eval_dataset,sampler=eval_sampler, collate_fn=default_data_collator, batch_size=args.per_device_eval_batch_size, pin_memory=\u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    )\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[36mprint\u001b[39;49;00m(\u001b[36mnext\u001b[39;49;00m(\u001b[36miter\u001b[39;49;00m(train_dataloader)))\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m# creating model\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mwith\u001b[39;49;00m smp.model_creation(\u001b[37m\u001b[39;49;00m\n",
      "        tensor_parallelism=\u001b[34mTrue\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "        dtype=torch.bfloat16,\u001b[37m\u001b[39;49;00m\n",
      "        flash_attention=\u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "        ):\u001b[37m\u001b[39;49;00m\n",
      "        model = AutoModelForCausalLM.from_pretrained(args.model_name_or_path,cache_dir=\u001b[33m\"\u001b[39;49;00m\u001b[33m/tmp\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,torch_dtype=torch.bfloat16)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    model = smp.DistributedModel(model, trace_device=\u001b[33m\"\u001b[39;49;00m\u001b[33mgpu\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m# Optimizer\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m# Split weights in two groups, one with weight decay and the other not.\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    no_decay = [\u001b[33m\"\u001b[39;49;00m\u001b[33mbias\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[33m\"\u001b[39;49;00m\u001b[33mLayerNorm.weight\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[33m\"\u001b[39;49;00m\u001b[33mlayer_norm.weight\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m]\u001b[37m\u001b[39;49;00m\n",
      "    optimizer_grouped_parameters = [\u001b[37m\u001b[39;49;00m\n",
      "        {\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[33m\"\u001b[39;49;00m\u001b[33mparams\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: [p \u001b[34mfor\u001b[39;49;00m n, p \u001b[35min\u001b[39;49;00m model.named_parameters() \u001b[34mif\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m \u001b[36many\u001b[39;49;00m(nd \u001b[35min\u001b[39;49;00m n \u001b[34mfor\u001b[39;49;00m nd \u001b[35min\u001b[39;49;00m no_decay)],\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[33m\"\u001b[39;49;00m\u001b[33mweight_decay\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: args.weight_decay,\u001b[37m\u001b[39;49;00m\n",
      "        },\u001b[37m\u001b[39;49;00m\n",
      "        {\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[33m\"\u001b[39;49;00m\u001b[33mparams\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: [p \u001b[34mfor\u001b[39;49;00m n, p \u001b[35min\u001b[39;49;00m model.named_parameters() \u001b[34mif\u001b[39;49;00m \u001b[36many\u001b[39;49;00m(nd \u001b[35min\u001b[39;49;00m n \u001b[34mfor\u001b[39;49;00m nd \u001b[35min\u001b[39;49;00m no_decay)],\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[33m\"\u001b[39;49;00m\u001b[33mweight_decay\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m: \u001b[34m0.0\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "        },\u001b[37m\u001b[39;49;00m\n",
      "    ]\u001b[37m\u001b[39;49;00m\n",
      "    optimizer = torch.optim.AdamW(optimizer_grouped_parameters, lr=args.learning_rate)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m# Scheduler and math around the number of training steps.\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    overrode_max_train_steps = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    num_update_steps_per_epoch = math.ceil(\u001b[36mlen\u001b[39;49;00m(train_dataloader) / args.gradient_accumulation_steps)\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mif\u001b[39;49;00m args.max_train_steps \u001b[35mis\u001b[39;49;00m \u001b[34mNone\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n",
      "        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch\u001b[37m\u001b[39;49;00m\n",
      "        overrode_max_train_steps = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    lr_scheduler = get_scheduler(\u001b[37m\u001b[39;49;00m\n",
      "        name=args.lr_scheduler_type,\u001b[37m\u001b[39;49;00m\n",
      "        optimizer=optimizer,\u001b[37m\u001b[39;49;00m\n",
      "        num_warmup_steps=args.num_warmup_steps * args.gradient_accumulation_steps,\u001b[37m\u001b[39;49;00m\n",
      "        num_training_steps=args.max_train_steps * args.gradient_accumulation_steps,\u001b[37m\u001b[39;49;00m\n",
      "    )\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    transformer_layers = model.get_module().transformer.seq_layers\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    smp.set_activation_checkpointing(\u001b[37m\u001b[39;49;00m\n",
      "    transformer_layers, pack_args_as_tuple=\u001b[34mTrue\u001b[39;49;00m, strategy=\u001b[33m'\u001b[39;49;00m\u001b[33meach\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    wait_for_everyone()\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    optimizer = smp.DistributedOptimizer(optimizer)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    device = torch.device(\u001b[33mf\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mcuda:\u001b[39;49;00m\u001b[33m{\u001b[39;49;00msmp.local_rank()\u001b[33m}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    num_update_steps_per_epoch = math.ceil(\u001b[36mlen\u001b[39;49;00m(train_dataloader) / args.gradient_accumulation_steps)\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mif\u001b[39;49;00m overrode_max_train_steps:\u001b[37m\u001b[39;49;00m\n",
      "        args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m# Afterwards we recalculate our number of training epochs\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "    args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mfor\u001b[39;49;00m epoch \u001b[35min\u001b[39;49;00m \u001b[36mrange\u001b[39;49;00m(args.num_train_epochs):\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "        model.train()\u001b[37m\u001b[39;49;00m\n",
      "        total_loss = \u001b[34m0\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[34mfor\u001b[39;49;00m step, batch \u001b[35min\u001b[39;49;00m \u001b[36menumerate\u001b[39;49;00m(tqdm(train_dataloader)):\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[37m\u001b[39;49;00m\n",
      "            batch = {k: v.to(device) \u001b[34mfor\u001b[39;49;00m k, v \u001b[35min\u001b[39;49;00m batch.items()}\u001b[37m\u001b[39;49;00m\n",
      "            loss = train_step(model,batch)\u001b[37m\u001b[39;49;00m\n",
      "            total_loss += loss.reduce_mean().detach().float()\u001b[37m\u001b[39;49;00m\n",
      "            optimizer.step()\u001b[37m\u001b[39;49;00m\n",
      "            lr_scheduler.step()\u001b[37m\u001b[39;49;00m\n",
      "            optimizer.zero_grad()\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "        train_epoch_loss = total_loss / \u001b[36mlen\u001b[39;49;00m(train_dataloader)\u001b[37m\u001b[39;49;00m\n",
      "        train_ppl = torch.exp(train_epoch_loss)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[34mif\u001b[39;49;00m is_main_process(smp.rank()):\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[36mprint\u001b[39;49;00m(\u001b[33mf\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33m{\u001b[39;49;00mepoch\u001b[33m=}\u001b[39;49;00m\u001b[33m: \u001b[39;49;00m\u001b[33m{\u001b[39;49;00mtrain_ppl\u001b[33m=}\u001b[39;49;00m\u001b[33m \u001b[39;49;00m\u001b[33m{\u001b[39;49;00mtrain_epoch_loss\u001b[33m=}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "        model.eval()\u001b[37m\u001b[39;49;00m\n",
      "        eval_preds = []\u001b[37m\u001b[39;49;00m\n",
      "        eval_loss = \u001b[34m0\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[34mfor\u001b[39;49;00m estep, batch \u001b[35min\u001b[39;49;00m \u001b[36menumerate\u001b[39;49;00m(tqdm(eval_dataloader)):\u001b[37m\u001b[39;49;00m\n",
      "            batch = {k: v.to(device) \u001b[34mfor\u001b[39;49;00m k, v \u001b[35min\u001b[39;49;00m batch.items()}\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[34mwith\u001b[39;49;00m torch.no_grad():\u001b[37m\u001b[39;49;00m\n",
      "                outputs = test_step(model,batch)\u001b[37m\u001b[39;49;00m\n",
      "            loss = outputs[\u001b[33m\"\u001b[39;49;00m\u001b[33mloss\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m].reduce_mean()\u001b[37m\u001b[39;49;00m\n",
      "            eval_loss += loss.detach().float()\u001b[37m\u001b[39;49;00m\n",
      "            logits_mb = outputs[\u001b[33m\"\u001b[39;49;00m\u001b[33mlogits\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m]\u001b[37m\u001b[39;49;00m\n",
      "            logits = torch.cat(\u001b[36mtuple\u001b[39;49;00m(logits_mb.outputs), dim=\u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n",
      "            eval_preds.extend(\u001b[37m\u001b[39;49;00m\n",
      "                tokenizer.batch_decode(torch.argmax(logits, -\u001b[34m1\u001b[39;49;00m).detach().cpu().numpy(), skip_special_tokens=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n",
      "                 )\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "        eval_epoch_loss = eval_loss / \u001b[36mlen\u001b[39;49;00m(eval_dataloader)\u001b[37m\u001b[39;49;00m\n",
      "        eval_ppl = torch.exp(eval_epoch_loss)\u001b[37m\u001b[39;49;00m\n",
      "        \u001b[34mif\u001b[39;49;00m is_main_process(smp.rank()):\u001b[37m\u001b[39;49;00m\n",
      "            \u001b[36mprint\u001b[39;49;00m(\u001b[33mf\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33m{\u001b[39;49;00mepoch\u001b[33m=}\u001b[39;49;00m\u001b[33m: \u001b[39;49;00m\u001b[33m{\u001b[39;49;00meval_ppl\u001b[33m=}\u001b[39;49;00m\u001b[33m \u001b[39;49;00m\u001b[33m{\u001b[39;49;00meval_epoch_loss\u001b[33m=}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m# save the checkpoint\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "     \u001b[37m\u001b[39;49;00m\n",
      "    smp.save_checkpoint(args.checkpoint_dir,\u001b[37m\u001b[39;49;00m\n",
      "                tag=\u001b[33mf\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33mgptneo_3b_model.pt\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "                partial=\u001b[34mFalse\u001b[39;49;00m,\u001b[37m\u001b[39;49;00m\n",
      "                model=model,\u001b[37m\u001b[39;49;00m\n",
      "                optimizer=optimizer)\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33msaving the final model\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    wait_for_everyone()\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[34mif\u001b[39;49;00m is_main_process(smp.rank()):\u001b[37m\u001b[39;49;00m\n",
      "        tokenizer.save_pretrained(args.checkpoint_dir)\u001b[37m\u001b[39;49;00m\n",
      "    \u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "    wait_for_everyone()\u001b[37m\u001b[39;49;00m\n",
      "\u001b[37m\u001b[39;49;00m\n",
      "\u001b[34mif\u001b[39;49;00m \u001b[31m__name__\u001b[39;49;00m == \u001b[33m\"\u001b[39;49;00m\u001b[33m__main__\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n",
      "    main()\u001b[37m\u001b[39;49;00m\n"
     ]
    }
   ],
   "source": [
    "!pygmentize ./scripts/train.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "hyperparameters = {}\n",
    "SM_DATA_DIR = \"/opt/ml/input/data\" \n",
    "\n",
    "hyperparameters[\"model_name_or_path\"] = \"EleutherAI/gpt-neo-2.7B\"\n",
    "hyperparameters[\"checkpoint_dir\"] = \"/opt/ml/checkpoints\"\n",
    "hyperparameters[\"train_file\"] = f\"{SM_DATA_DIR}/train/train.csv\"\n",
    "hyperparameters[\"validation_file\"] = f\"{SM_DATA_DIR}/valid/valid.csv\"\n",
    "hyperparameters[\"per_device_train_batch_size\"] = 1\n",
    "hyperparameters[\"per_device_eval_batch_size\"] = 1\n",
    "hyperparameters[\"block_size\"] = 2048\n",
    "hyperparameters[\"num_train_epochs\"] = 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Store model files as checkpoints for easy deployment\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "\n",
    "checkpoint_dir = \"/opt/ml/checkpoints\"\n",
    "checkpoint_s3_path = \"s3://\" + sess.default_bucket() + \"/gptneo-checkpoints\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Setup params for Sharded Data Parallel (SDP)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "smp_options = {\n",
    "    \"enabled\":True,\n",
    "    \"parameters\": {                        # Required\n",
    "        \"pipeline_parallel_degree\": 1,     # Required\n",
    "        \"ddp\": True,\n",
    "        \"ddp_dist_backend\": \"auto\",\n",
    "        # parameters for sharded data parallelism\n",
    "        \"sharded_data_parallel_degree\": 4,              # Add this to activate sharded data parallelism\n",
    "        \"partitions\":1,\n",
    "        \"offload_activations\": True,           \n",
    "        \"fp16\":True,\n",
    "        \"skip_tracing\": True\n",
    "\n",
    "    }\n",
    "}\n",
    "\n",
    "mpi_options = {\n",
    "    \"enabled\" : True,                      # Required\n",
    "    \"processes_per_host\" : 4               # Required\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Start the training job\n",
    "We use g5.12.xlarge which consists of 4 GPU to shard the model states and run the training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "\n",
    "base_job_name = \"gpt-neo-instruction-fine-tuning\"\n",
    "estimator = PyTorch(\n",
    "    base_job_name=base_job_name,\n",
    "    source_dir=\"./scripts\",\n",
    "    entry_point=\"train.py\",\n",
    "    role=role,\n",
    "    framework_version=\"2.0.0\",\n",
    "    py_version=\"py310\",\n",
    "    instance_count=1,\n",
    "    instance_type=\"ml.g5.12xlarge\",\n",
    "    hyperparameters=hyperparameters,\n",
    "    checkpoint_local_path=checkpoint_dir,\n",
    "    checkpoint_s3_uri=checkpoint_s3_path,\n",
    "    disable_profiler=True,\n",
    "    distribution={\n",
    "        \"smdistributed\": {\"modelparallel\": smp_options},\n",
    "        \"mpi\": mpi_options\n",
    "    }, \n",
    "    keep_alive_period_in_seconds = 15*60 # 15mins\n",
    "\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.\n",
      "INFO:sagemaker:Creating training-job with name: gpt-neo-instruction-fine-tuning-2023-04-27-14-10-18-045\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using provided s3_resource\n",
      "2023-04-27 14:10:18 Starting - Starting the training job...\n",
      "2023-04-27 14:10:44 Starting - Preparing the instances for training......\n",
      "2023-04-27 14:11:39 Downloading - Downloading input data...\n",
      "2023-04-27 14:11:59 Training - Downloading the training image...........................\n",
      "2023-04-27 14:16:35 Training - Training image download completed. Training in progress......\u001b[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device\u001b[0m\n",
      "\u001b[34mbash: no job control in this shell\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:26,049 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:26,079 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:26,088 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:26,090 sagemaker_pytorch_container.training INFO     Invoking user training script.\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:26,323 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:\u001b[0m\n",
      "\u001b[34m/opt/conda/bin/python3.10 -m pip install -r requirements.txt\u001b[0m\n",
      "\u001b[34mCollecting transformers==4.21.0\u001b[0m\n",
      "\u001b[34mDownloading transformers-4.21.0-py3-none-any.whl (4.7 MB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/4.7 MB 99.3 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mCollecting datasets\u001b[0m\n",
      "\u001b[34mDownloading datasets-2.11.0-py3-none-any.whl (468 kB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.7/468.7 kB 80.4 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mCollecting sentencepiece!=0.1.92\u001b[0m\n",
      "\u001b[34mDownloading sentencepiece-0.1.98-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 92.2 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mCollecting evaluate\u001b[0m\n",
      "\u001b[34mDownloading evaluate-0.4.0-py3-none-any.whl (81 kB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.4/81.4 kB 24.8 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from transformers==4.21.0->-r requirements.txt (line 1)) (23.1)\u001b[0m\n",
      "\u001b[34mCollecting huggingface-hub<1.0,>=0.1.0\u001b[0m\n",
      "\u001b[34mDownloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 224.5/224.5 kB 51.2 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mCollecting regex!=2019.12.17\u001b[0m\n",
      "\u001b[34mDownloading regex-2023.3.23-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (769 kB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 769.6/769.6 kB 82.9 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.10/site-packages (from transformers==4.21.0->-r requirements.txt (line 1)) (4.65.0)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from transformers==4.21.0->-r requirements.txt (line 1)) (2.28.2)\u001b[0m\n",
      "\u001b[34mCollecting tokenizers!=0.11.3,<0.13,>=0.11.1\u001b[0m\n",
      "\u001b[34mDownloading tokenizers-0.12.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 131.1 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.10/site-packages (from transformers==4.21.0->-r requirements.txt (line 1)) (5.4.1)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from transformers==4.21.0->-r requirements.txt (line 1)) (3.11.0)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.10/site-packages (from transformers==4.21.0->-r requirements.txt (line 1)) (1.23.5)\u001b[0m\n",
      "\u001b[34mCollecting xxhash\u001b[0m\n",
      "\u001b[34mDownloading xxhash-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 212.5/212.5 kB 52.3 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: fsspec[http]>=2021.11.1 in /opt/conda/lib/python3.10/site-packages (from datasets->-r requirements.txt (line 2)) (2023.4.0)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: multiprocess in /opt/conda/lib/python3.10/site-packages (from datasets->-r requirements.txt (line 2)) (0.70.14)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: dill<0.3.7,>=0.3.0 in /opt/conda/lib/python3.10/site-packages (from datasets->-r requirements.txt (line 2)) (0.3.6)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: pandas in /opt/conda/lib/python3.10/site-packages (from datasets->-r requirements.txt (line 2)) (2.0.0)\u001b[0m\n",
      "\u001b[34mCollecting aiohttp\u001b[0m\n",
      "\u001b[34mDownloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 85.3 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mCollecting responses<0.19\u001b[0m\n",
      "\u001b[34mDownloading responses-0.18.0-py3-none-any.whl (38 kB)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: pyarrow>=8.0.0 in /opt/conda/lib/python3.10/site-packages (from datasets->-r requirements.txt (line 2)) (11.0.0)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: charset-normalizer<4.0,>=2.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->-r requirements.txt (line 2)) (3.1.0)\u001b[0m\n",
      "\u001b[34mCollecting frozenlist>=1.1.1\u001b[0m\n",
      "\u001b[34mDownloading frozenlist-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (149 kB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.6/149.6 kB 35.1 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mCollecting multidict<7.0,>=4.5\u001b[0m\n",
      "\u001b[34mDownloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 114.5/114.5 kB 29.8 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mCollecting async-timeout<5.0,>=4.0.0a3\u001b[0m\n",
      "\u001b[34mDownloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\u001b[0m\n",
      "\u001b[34mCollecting aiosignal>=1.1.2\u001b[0m\n",
      "\u001b[34mDownloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\u001b[0m\n",
      "\u001b[34mCollecting yarl<2.0,>=1.0\u001b[0m\n",
      "\u001b[34mDownloading yarl-1.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (268 kB)\u001b[0m\n",
      "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 268.8/268.8 kB 50.9 MB/s eta 0:00:00\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->-r requirements.txt (line 2)) (22.2.0)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers==4.21.0->-r requirements.txt (line 1)) (4.5.0)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->transformers==4.21.0->-r requirements.txt (line 1)) (1.26.15)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->transformers==4.21.0->-r requirements.txt (line 1)) (3.4)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->transformers==4.21.0->-r requirements.txt (line 1)) (2022.12.7)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets->-r requirements.txt (line 2)) (2.8.2)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets->-r requirements.txt (line 2)) (2023.3)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets->-r requirements.txt (line 2)) (2023.3)\u001b[0m\n",
      "\u001b[34mRequirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->datasets->-r requirements.txt (line 2)) (1.16.0)\u001b[0m\n",
      "\u001b[34mInstalling collected packages: tokenizers, sentencepiece, xxhash, regex, multidict, frozenlist, async-timeout, yarl, responses, huggingface-hub, aiosignal, transformers, aiohttp, datasets, evaluate\u001b[0m\n",
      "\u001b[34mSuccessfully installed aiohttp-3.8.4 aiosignal-1.3.1 async-timeout-4.0.2 datasets-2.11.0 evaluate-0.4.0 frozenlist-1.3.3 huggingface-hub-0.14.1 multidict-6.0.4 regex-2023.3.23 responses-0.18.0 sentencepiece-0.1.98 tokenizers-0.12.1 transformers-4.21.0 xxhash-3.2.0 yarl-1.9.2\u001b[0m\n",
      "\u001b[34mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,153 sagemaker-training-toolkit INFO     Waiting for the process to finish and give a return code.\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,153 sagemaker-training-toolkit INFO     Done waiting for a return code. Received 0 from exiting process.\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,184 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,225 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,235 sagemaker-training-toolkit INFO     Starting MPI run as worker node.\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,235 sagemaker-training-toolkit INFO     Creating SSH daemon.\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,235 sagemaker-training-toolkit INFO     Waiting for MPI workers to establish their SSH connections\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,235 sagemaker-training-toolkit INFO     Env Hosts: ['algo-1'] Hosts: ['algo-1:4'] process_per_hosts: 4 num_processes: 4\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,236 sagemaker-training-toolkit INFO     Network interface name: eth0\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,267 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,309 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,318 sagemaker-training-toolkit INFO     Invoking user script\u001b[0m\n",
      "\u001b[34mTraining Env:\u001b[0m\n",
      "\u001b[34m{\n",
      "    \"additional_framework_parameters\": {\n",
      "        \"sagemaker_distributed_dataparallel_enabled\": false,\n",
      "        \"sagemaker_instance_type\": \"ml.g5.12xlarge\",\n",
      "        \"sagemaker_mpi_custom_mpi_options\": \"\",\n",
      "        \"sagemaker_mpi_enabled\": true,\n",
      "        \"sagemaker_mpi_num_of_processes_per_host\": 4\n",
      "    },\n",
      "    \"channel_input_dirs\": {\n",
      "        \"train\": \"/opt/ml/input/data/train\",\n",
      "        \"valid\": \"/opt/ml/input/data/valid\"\n",
      "    },\n",
      "    \"current_host\": \"algo-1\",\n",
      "    \"current_instance_group\": \"homogeneousCluster\",\n",
      "    \"current_instance_group_hosts\": [\n",
      "        \"algo-1\"\n",
      "    ],\n",
      "    \"current_instance_type\": \"ml.g5.12xlarge\",\n",
      "    \"distribution_hosts\": [\n",
      "        \"algo-1\"\n",
      "    ],\n",
      "    \"distribution_instance_groups\": [\n",
      "        \"homogeneousCluster\"\n",
      "    ],\n",
      "    \"framework_module\": \"sagemaker_pytorch_container.training:main\",\n",
      "    \"hosts\": [\n",
      "        \"algo-1\"\n",
      "    ],\n",
      "    \"hyperparameters\": {\n",
      "        \"block_size\": 2048,\n",
      "        \"checkpoint_dir\": \"/opt/ml/checkpoints\",\n",
      "        \"model_name_or_path\": \"EleutherAI/gpt-neo-2.7B\",\n",
      "        \"mp_parameters\": {\n",
      "            \"pipeline_parallel_degree\": 1,\n",
      "            \"ddp\": true,\n",
      "            \"ddp_dist_backend\": \"auto\",\n",
      "            \"sharded_data_parallel_degree\": 4,\n",
      "            \"partitions\": 1,\n",
      "            \"offload_activations\": true,\n",
      "            \"fp16\": true,\n",
      "            \"skip_tracing\": true\n",
      "        },\n",
      "        \"num_train_epochs\": 2,\n",
      "        \"per_device_eval_batch_size\": 1,\n",
      "        \"per_device_train_batch_size\": 1,\n",
      "        \"train_file\": \"/opt/ml/input/data/train/train.csv\",\n",
      "        \"validation_file\": \"/opt/ml/input/data/valid/valid.csv\"\n",
      "    },\n",
      "    \"input_config_dir\": \"/opt/ml/input/config\",\n",
      "    \"input_data_config\": {\n",
      "        \"train\": {\n",
      "            \"TrainingInputMode\": \"File\",\n",
      "            \"S3DistributionType\": \"FullyReplicated\",\n",
      "            \"RecordWrapperType\": \"None\"\n",
      "        },\n",
      "        \"valid\": {\n",
      "            \"TrainingInputMode\": \"File\",\n",
      "            \"S3DistributionType\": \"FullyReplicated\",\n",
      "            \"RecordWrapperType\": \"None\"\n",
      "        }\n",
      "    },\n",
      "    \"input_dir\": \"/opt/ml/input\",\n",
      "    \"instance_groups\": [\n",
      "        \"homogeneousCluster\"\n",
      "    ],\n",
      "    \"instance_groups_dict\": {\n",
      "        \"homogeneousCluster\": {\n",
      "            \"instance_group_name\": \"homogeneousCluster\",\n",
      "            \"instance_type\": \"ml.g5.12xlarge\",\n",
      "            \"hosts\": [\n",
      "                \"algo-1\"\n",
      "            ]\n",
      "        }\n",
      "    },\n",
      "    \"is_hetero\": false,\n",
      "    \"is_master\": true,\n",
      "    \"is_modelparallel_enabled\": true,\n",
      "    \"is_smddpmprun_installed\": true,\n",
      "    \"job_name\": \"gpt-neo-instruction-fine-tuning-2023-04-27-14-10-18-045\",\n",
      "    \"log_level\": 20,\n",
      "    \"master_hostname\": \"algo-1\",\n",
      "    \"model_dir\": \"/opt/ml/model\",\n",
      "    \"module_dir\": \"s3://sagemaker-us-east-1-706553727873/gpt-neo-instruction-fine-tuning-2023-04-27-14-10-18-045/source/sourcedir.tar.gz\",\n",
      "    \"module_name\": \"train\",\n",
      "    \"network_interface_name\": \"eth0\",\n",
      "    \"num_cpus\": 48,\n",
      "    \"num_gpus\": 4,\n",
      "    \"num_neurons\": 0,\n",
      "    \"output_data_dir\": \"/opt/ml/output/data\",\n",
      "    \"output_dir\": \"/opt/ml/output\",\n",
      "    \"output_intermediate_dir\": \"/opt/ml/output/intermediate\",\n",
      "    \"resource_config\": {\n",
      "        \"current_host\": \"algo-1\",\n",
      "        \"current_instance_type\": \"ml.g5.12xlarge\",\n",
      "        \"current_group_name\": \"homogeneousCluster\",\n",
      "        \"hosts\": [\n",
      "            \"algo-1\"\n",
      "        ],\n",
      "        \"instance_groups\": [\n",
      "            {\n",
      "                \"instance_group_name\": \"homogeneousCluster\",\n",
      "                \"instance_type\": \"ml.g5.12xlarge\",\n",
      "                \"hosts\": [\n",
      "                    \"algo-1\"\n",
      "                ]\n",
      "            }\n",
      "        ],\n",
      "        \"network_interface_name\": \"eth0\"\n",
      "    },\n",
      "    \"user_entry_point\": \"train.py\"\u001b[0m\n",
      "\u001b[34m}\u001b[0m\n",
      "\u001b[34mEnvironment variables:\u001b[0m\n",
      "\u001b[34mSM_HOSTS=[\"algo-1\"]\u001b[0m\n",
      "\u001b[34mSM_NETWORK_INTERFACE_NAME=eth0\u001b[0m\n",
      "\u001b[34mSM_HPS={\"block_size\":2048,\"checkpoint_dir\":\"/opt/ml/checkpoints\",\"model_name_or_path\":\"EleutherAI/gpt-neo-2.7B\",\"mp_parameters\":{\"ddp\":true,\"ddp_dist_backend\":\"auto\",\"fp16\":true,\"offload_activations\":true,\"partitions\":1,\"pipeline_parallel_degree\":1,\"sharded_data_parallel_degree\":4,\"skip_tracing\":true},\"num_train_epochs\":2,\"per_device_eval_batch_size\":1,\"per_device_train_batch_size\":1,\"train_file\":\"/opt/ml/input/data/train/train.csv\",\"validation_file\":\"/opt/ml/input/data/valid/valid.csv\"}\u001b[0m\n",
      "\u001b[34mSM_USER_ENTRY_POINT=train.py\u001b[0m\n",
      "\u001b[34mSM_FRAMEWORK_PARAMS={\"sagemaker_distributed_dataparallel_enabled\":false,\"sagemaker_instance_type\":\"ml.g5.12xlarge\",\"sagemaker_mpi_custom_mpi_options\":\"\",\"sagemaker_mpi_enabled\":true,\"sagemaker_mpi_num_of_processes_per_host\":4}\u001b[0m\n",
      "\u001b[34mSM_RESOURCE_CONFIG={\"current_group_name\":\"homogeneousCluster\",\"current_host\":\"algo-1\",\"current_instance_type\":\"ml.g5.12xlarge\",\"hosts\":[\"algo-1\"],\"instance_groups\":[{\"hosts\":[\"algo-1\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.g5.12xlarge\"}],\"network_interface_name\":\"eth0\"}\u001b[0m\n",
      "\u001b[34mSM_INPUT_DATA_CONFIG={\"train\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"},\"valid\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"}}\u001b[0m\n",
      "\u001b[34mSM_OUTPUT_DATA_DIR=/opt/ml/output/data\u001b[0m\n",
      "\u001b[34mSM_CHANNELS=[\"train\",\"valid\"]\u001b[0m\n",
      "\u001b[34mSM_CURRENT_HOST=algo-1\u001b[0m\n",
      "\u001b[34mSM_CURRENT_INSTANCE_TYPE=ml.g5.12xlarge\u001b[0m\n",
      "\u001b[34mSM_CURRENT_INSTANCE_GROUP=homogeneousCluster\u001b[0m\n",
      "\u001b[34mSM_CURRENT_INSTANCE_GROUP_HOSTS=[\"algo-1\"]\u001b[0m\n",
      "\u001b[34mSM_INSTANCE_GROUPS=[\"homogeneousCluster\"]\u001b[0m\n",
      "\u001b[34mSM_INSTANCE_GROUPS_DICT={\"homogeneousCluster\":{\"hosts\":[\"algo-1\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.g5.12xlarge\"}}\u001b[0m\n",
      "\u001b[34mSM_DISTRIBUTION_INSTANCE_GROUPS=[\"homogeneousCluster\"]\u001b[0m\n",
      "\u001b[34mSM_IS_HETERO=false\u001b[0m\n",
      "\u001b[34mSM_MODULE_NAME=train\u001b[0m\n",
      "\u001b[34mSM_LOG_LEVEL=20\u001b[0m\n",
      "\u001b[34mSM_FRAMEWORK_MODULE=sagemaker_pytorch_container.training:main\u001b[0m\n",
      "\u001b[34mSM_INPUT_DIR=/opt/ml/input\u001b[0m\n",
      "\u001b[34mSM_INPUT_CONFIG_DIR=/opt/ml/input/config\u001b[0m\n",
      "\u001b[34mSM_OUTPUT_DIR=/opt/ml/output\u001b[0m\n",
      "\u001b[34mSM_NUM_CPUS=48\u001b[0m\n",
      "\u001b[34mSM_NUM_GPUS=4\u001b[0m\n",
      "\u001b[34mSM_NUM_NEURONS=0\u001b[0m\n",
      "\u001b[34mSM_MODEL_DIR=/opt/ml/model\u001b[0m\n",
      "\u001b[34mSM_MODULE_DIR=s3://sagemaker-us-east-1-706553727873/gpt-neo-instruction-fine-tuning-2023-04-27-14-10-18-045/source/sourcedir.tar.gz\u001b[0m\n",
      "\u001b[34mSM_TRAINING_ENV={\"additional_framework_parameters\":{\"sagemaker_distributed_dataparallel_enabled\":false,\"sagemaker_instance_type\":\"ml.g5.12xlarge\",\"sagemaker_mpi_custom_mpi_options\":\"\",\"sagemaker_mpi_enabled\":true,\"sagemaker_mpi_num_of_processes_per_host\":4},\"channel_input_dirs\":{\"train\":\"/opt/ml/input/data/train\",\"valid\":\"/opt/ml/input/data/valid\"},\"current_host\":\"algo-1\",\"current_instance_group\":\"homogeneousCluster\",\"current_instance_group_hosts\":[\"algo-1\"],\"current_instance_type\":\"ml.g5.12xlarge\",\"distribution_hosts\":[\"algo-1\"],\"distribution_instance_groups\":[\"homogeneousCluster\"],\"framework_module\":\"sagemaker_pytorch_container.training:main\",\"hosts\":[\"algo-1\"],\"hyperparameters\":{\"block_size\":2048,\"checkpoint_dir\":\"/opt/ml/checkpoints\",\"model_name_or_path\":\"EleutherAI/gpt-neo-2.7B\",\"mp_parameters\":{\"ddp\":true,\"ddp_dist_backend\":\"auto\",\"fp16\":true,\"offload_activations\":true,\"partitions\":1,\"pipeline_parallel_degree\":1,\"sharded_data_parallel_degree\":4,\"skip_tracing\":true},\"num_train_epochs\":2,\"per_device_eval_batch_size\":1,\"per_device_train_batch_size\":1,\"train_file\":\"/opt/ml/input/data/train/train.csv\",\"validation_file\":\"/opt/ml/input/data/valid/valid.csv\"},\"input_config_dir\":\"/opt/ml/input/config\",\"input_data_config\":{\"train\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"},\"valid\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"}},\"input_dir\":\"/opt/ml/input\",\"instance_groups\":[\"homogeneousCluster\"],\"instance_groups_dict\":{\"homogeneousCluster\":{\"hosts\":[\"algo-1\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.g5.12xlarge\"}},\"is_hetero\":false,\"is_master\":true,\"is_modelparallel_enabled\":null,\"is_smddpmprun_installed\":true,\"job_name\":\"gpt-neo-instruction-fine-tuning-2023-04-27-14-10-18-045\",\"log_level\":20,\"master_hostname\":\"algo-1\",\"model_dir\":\"/opt/ml/model\",\"module_dir\":\"s3://sagemaker-us-east-1-706553727873/gpt-neo-instruction-fine-tuning-2023-04-27-14-10-18-045/source/sourcedir.tar.gz\",\"module_name\":\"train\",\"network_interface_name\":\"eth0\",\"num_cpus\":48,\"num_gpus\":4,\"num_neurons\":0,\"output_data_dir\":\"/opt/ml/output/data\",\"output_dir\":\"/opt/ml/output\",\"output_intermediate_dir\":\"/opt/ml/output/intermediate\",\"resource_config\":{\"current_group_name\":\"homogeneousCluster\",\"current_host\":\"algo-1\",\"current_instance_type\":\"ml.g5.12xlarge\",\"hosts\":[\"algo-1\"],\"instance_groups\":[{\"hosts\":[\"algo-1\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.g5.12xlarge\"}],\"network_interface_name\":\"eth0\"},\"user_entry_point\":\"train.py\"}\u001b[0m\n",
      "\u001b[34mSM_USER_ARGS=[\"--block_size\",\"2048\",\"--checkpoint_dir\",\"/opt/ml/checkpoints\",\"--model_name_or_path\",\"EleutherAI/gpt-neo-2.7B\",\"--mp_parameters\",\"ddp=True,ddp_dist_backend=auto,fp16=True,offload_activations=True,partitions=1,pipeline_parallel_degree=1,sharded_data_parallel_degree=4,skip_tracing=True\",\"--num_train_epochs\",\"2\",\"--per_device_eval_batch_size\",\"1\",\"--per_device_train_batch_size\",\"1\",\"--train_file\",\"/opt/ml/input/data/train/train.csv\",\"--validation_file\",\"/opt/ml/input/data/valid/valid.csv\"]\u001b[0m\n",
      "\u001b[34mSM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate\u001b[0m\n",
      "\u001b[34mSM_CHANNEL_TRAIN=/opt/ml/input/data/train\u001b[0m\n",
      "\u001b[34mSM_CHANNEL_VALID=/opt/ml/input/data/valid\u001b[0m\n",
      "\u001b[34mSM_HP_BLOCK_SIZE=2048\u001b[0m\n",
      "\u001b[34mSM_HP_CHECKPOINT_DIR=/opt/ml/checkpoints\u001b[0m\n",
      "\u001b[34mSM_HP_MODEL_NAME_OR_PATH=EleutherAI/gpt-neo-2.7B\u001b[0m\n",
      "\u001b[34mSM_HP_MP_PARAMETERS={\"ddp\":true,\"ddp_dist_backend\":\"auto\",\"fp16\":true,\"offload_activations\":true,\"partitions\":1,\"pipeline_parallel_degree\":1,\"sharded_data_parallel_degree\":4,\"skip_tracing\":true}\u001b[0m\n",
      "\u001b[34mSM_HP_NUM_TRAIN_EPOCHS=2\u001b[0m\n",
      "\u001b[34mSM_HP_PER_DEVICE_EVAL_BATCH_SIZE=1\u001b[0m\n",
      "\u001b[34mSM_HP_PER_DEVICE_TRAIN_BATCH_SIZE=1\u001b[0m\n",
      "\u001b[34mSM_HP_TRAIN_FILE=/opt/ml/input/data/train/train.csv\u001b[0m\n",
      "\u001b[34mSM_HP_VALIDATION_FILE=/opt/ml/input/data/valid/valid.csv\u001b[0m\n",
      "\u001b[34mPYTHONPATH=/opt/ml/code:/opt/conda/bin:/opt/conda/lib/python310.zip:/opt/conda/lib/python3.10:/opt/conda/lib/python3.10/lib-dynload:/opt/conda/lib/python3.10/site-packages\u001b[0m\n",
      "\u001b[34mInvoking script with the following command:\u001b[0m\n",
      "\u001b[34mmpirun --host algo-1:4 -np 4 --allow-run-as-root --display-map --tag-output -mca btl_tcp_if_include eth0 -mca oob_tcp_if_include eth0 -mca plm_rsh_no_tree_spawn 1 -bind-to none -map-by slot -mca pml ob1 -mca btl ^openib -mca orte_abort_on_non_zero_status 1 -mca btl_vader_single_copy_mechanism none -x NCCL_MIN_NRINGS=4 -x NCCL_SOCKET_IFNAME=eth0 -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -x LD_PRELOAD=/opt/conda/lib/python3.10/site-packages/gethostname.cpython-310-x86_64-linux-gnu.so -x SM_HOSTS -x SM_NETWORK_INTERFACE_NAME -x SM_HPS -x SM_USER_ENTRY_POINT -x SM_FRAMEWORK_PARAMS -x SM_RESOURCE_CONFIG -x SM_INPUT_DATA_CONFIG -x SM_OUTPUT_DATA_DIR -x SM_CHANNELS -x SM_CURRENT_HOST -x SM_CURRENT_INSTANCE_TYPE -x SM_CURRENT_INSTANCE_GROUP -x SM_CURRENT_INSTANCE_GROUP_HOSTS -x SM_INSTANCE_GROUPS -x SM_INSTANCE_GROUPS_DICT -x SM_DISTRIBUTION_INSTANCE_GROUPS -x SM_IS_HETERO -x SM_MODULE_NAME -x SM_LOG_LEVEL -x SM_FRAMEWORK_MODULE -x SM_INPUT_DIR -x SM_INPUT_CONFIG_DIR -x SM_OUTPUT_DIR -x SM_NUM_CPUS -x SM_NUM_GPUS -x SM_NUM_NEURONS -x SM_MODEL_DIR -x SM_MODULE_DIR -x SM_TRAINING_ENV -x SM_USER_ARGS -x SM_OUTPUT_INTERMEDIATE_DIR -x SM_CHANNEL_TRAIN -x SM_CHANNEL_VALID -x SM_HP_BLOCK_SIZE -x SM_HP_CHECKPOINT_DIR -x SM_HP_MODEL_NAME_OR_PATH -x SM_HP_MP_PARAMETERS -x SM_HP_NUM_TRAIN_EPOCHS -x SM_HP_PER_DEVICE_EVAL_BATCH_SIZE -x SM_HP_PER_DEVICE_TRAIN_BATCH_SIZE -x SM_HP_TRAIN_FILE -x SM_HP_VALIDATION_FILE -x PYTHONPATH smddpmprun -i ml.g5.12xlarge --allow-bypass /opt/conda/bin/python3.10 -m mpi4py train.py --block_size 2048 --checkpoint_dir /opt/ml/checkpoints --model_name_or_path EleutherAI/gpt-neo-2.7B --mp_parameters ddp=True,ddp_dist_backend=auto,fp16=True,offload_activations=True,partitions=1,pipeline_parallel_degree=1,sharded_data_parallel_degree=4,skip_tracing=True --num_train_epochs 2 --per_device_eval_batch_size 1 --per_device_train_batch_size 1 --train_file /opt/ml/input/data/train/train.csv --validation_file /opt/ml/input/data/valid/valid.csv\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:35,349 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)\u001b[0m\n",
      "\u001b[34m2023-04-27 14:17:37,425 sagemaker-training-toolkit INFO     Exceptions not imported for SageMaker TF as Tensorflow is not installed.\u001b[0m\n",
      "\u001b[34mData for JOB [41125,1] offset 0 Total slots allocated 4\n",
      " ========================   JOB MAP   ========================\n",
      " Data for node: algo-1#011Num slots: 4#011Max slots: 0#011Num procs: 4\n",
      " #011Process OMPI jobid: [41125,1] App: 0 Process rank: 0 Bound: N/A\n",
      " #011Process OMPI jobid: [41125,1] App: 0 Process rank: 1 Bound: N/A\n",
      " #011Process OMPI jobid: [41125,1] App: 0 Process rank: 2 Bound: N/A\n",
      " #011Process OMPI jobid: [41125,1] App: 0 Process rank: 3 Bound: N/A\n",
      " =============================================================\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:SMDDPCollectivesInitWarning: The system is not compatible or not configured to run SMDDP collectives optimized for AWS infrastructure. The training job will fall back to NCCL.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:Refer to the following information to troubleshoot the issue:\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:#011The instance type is not supported for running the training job. Please use one of the following: ml.p4d.24xlarge or ml.p4de.24xlarge.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:SMDDPCollectivesInitWarning: The system is not compatible or not configured to run SMDDP collectives optimized for AWS infrastructure. The training job will fall back to NCCL.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:Refer to the following information to troubleshoot the issue:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:#011The instance type is not supported for running the training job. Please use one of the following: ml.p4d.24xlarge or ml.p4de.24xlarge.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:SMDDPCollectivesInitWarning: The system is not compatible or not configured to run SMDDP collectives optimized for AWS infrastructure. The training job will fall back to NCCL.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:Refer to the following information to troubleshoot the issue:\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:#011The instance type is not supported for running the training job. Please use one of the following: ml.p4d.24xlarge or ml.p4de.24xlarge.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:SMDDPCollectivesInitWarning: The system is not compatible or not configured to run SMDDP collectives optimized for AWS infrastructure. The training job will fall back to NCCL.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:Refer to the following information to troubleshoot the issue:\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:#011The instance type is not supported for running the training job. Please use one of the following: ml.p4d.24xlarge or ml.p4de.24xlarge.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:17:43.036: W smdistributed/modelparallel/backend/core.py:423] smddp backend does not support training on single node. Falling back to nccl backend.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.036: W smdistributed/modelparallel/backend/core.py:423] smddp backend does not support training on single node. Falling back to nccl backend.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:17:43.036: W smdistributed/modelparallel/backend/core.py:423] smddp backend does not support training on single node. Falling back to nccl backend.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:17:43.036: W smdistributed/modelparallel/backend/core.py:423] smddp backend does not support training on single node. Falling back to nccl backend.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/backend/core.py:481] [smddp] SMDATAPARALLEL_DEVICE_NAME is defined in os.environ: False.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/backend/core.py:481] [smddp] SAGEMAKER_INSTANCE_TYPE is defined in os.environ: False.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/backend/core.py:481] [smddp] SMDATAPARALLEL_DEVICE_NAME is defined in os.environ: False.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/backend/core.py:481] [smddp] SMDATAPARALLEL_DEVICE_NAME is defined in os.environ: False.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/torch/state_mod.py:100] [0] Initializing torch distributed process groups with nccl backend\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/backend/core.py:481] [smddp] SAGEMAKER_INSTANCE_TYPE is defined in os.environ: False.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/backend/core.py:481] [smddp] SAGEMAKER_INSTANCE_TYPE is defined in os.environ: False.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/backend/core.py:481] [smddp] SMDATAPARALLEL_DEVICE_NAME is defined in os.environ: False.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/torch/state_mod.py:100] [1] Initializing torch distributed process groups with nccl backend\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/torch/state_mod.py:100] [2] Initializing torch distributed process groups with nccl backend\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:17:43.037: I smdistributed/modelparallel/backend/core.py:481] [smddp] SAGEMAKER_INSTANCE_TYPE is defined in os.environ: False.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:17:43.038: I smdistributed/modelparallel/torch/state_mod.py:100] [3] Initializing torch distributed process groups with nccl backend\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:5 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:5 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:6 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:6 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:6 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:6 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:6 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:7 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:7 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:7 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:7 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:7 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:8 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:8 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:8 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:8 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:8 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:9 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:9 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:9 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:9 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:9 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:10 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:10 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:10 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:10 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:10 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:11 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:11 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:11 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:11 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:11 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:12 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:12 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:12 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:12 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:12 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:13 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:13 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:13 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:13 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:13 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:14 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:14 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:14 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:14 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:14 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:15 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:15 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:15 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.116: I smdistributed/modelparallel/torch/state_mod.py:163] [0] Finished initializing torch distributed process groups. pp_rank: 0, tp_rank: 0, dp_rank: 0, rdp_rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:17:43.116: I smdistributed/modelparallel/torch/state_mod.py:163] [3] Finished initializing torch distributed process groups. pp_rank: 0, tp_rank: 0, dp_rank: 3, rdp_rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/torch/throttler.py:37] Using NCCL throttle limit of 8.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:314] Configuration parameters:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   activation_loading_horizon: 4\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   active_microbatches: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   auto_partition: True\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   bf16: False\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   contiguous: True\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   ddp: True\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   ddp_dist_backend: auto\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   ddp_port: None\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   default_partition: None\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.117: I smdistributed/modelparallel/backend/config.py:317]   delayed_parameter_initialization: False\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   fp16: True\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   fp16_params: False\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   horovod: False\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   memory_weight: 0.8\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   microbatches: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   offload_activations: True\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   optimize: speed\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   pipeline: interleaved\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   pipeline_parallel_degree: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   placement_strategy: cluster\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   predefined_hooks: True\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   prescaled_batch: False\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   sdp_gradient_clipping: 1.0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   sdp_hierarchical_allgather: True\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.118: I smdistributed/modelparallel/backend/config.py:317]   sdp_max_live_parameters: 1000000000\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.119: I smdistributed/modelparallel/backend/config.py:317]   sdp_param_persistence_threshold: 1000000\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.119: I smdistributed/modelparallel/backend/config.py:317]   sdp_reduce_bucket_size: 500000000\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.119: I smdistributed/modelparallel/backend/config.py:317]   shard_optimizer_state: False\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.119: I smdistributed/modelparallel/backend/config.py:317]   sharded_data_parallel_degree: 4\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:17:43.119: W smdistributed/modelparallel/torch/__init__.py:115] SageMaker model parallelism is initialized already. Ignoring the smp.init() call.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.119: I smdistributed/modelparallel/backend/config.py:317]   skip_tracing: True\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.119: I smdistributed/modelparallel/backend/config.py:317]   tensor_parallel_degree: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.119: I smdistributed/modelparallel/backend/config.py:317]   tensor_parallel_seed: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.119: W smdistributed/modelparallel/backend/config.py:323] WARNING: \"fp16_params\" is a deprecated config key, please use \"fp16\" instead\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:17:43.121: W smdistributed/modelparallel/torch/__init__.py:115] SageMaker model parallelism is initialized already. Ignoring the smp.init() call.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:15 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:15 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:17:43.126: I smdistributed/modelparallel/torch/state_mod.py:163] [2] Finished initializing torch distributed process groups. pp_rank: 0, tp_rank: 0, dp_rank: 2, rdp_rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:17:43.126: I smdistributed/modelparallel/torch/state_mod.py:163] [1] Finished initializing torch distributed process groups. pp_rank: 0, tp_rank: 0, dp_rank: 1, rdp_rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:17:43.128: W smdistributed/modelparallel/torch/__init__.py:115] SageMaker model parallelism is initialized already. Ignoring the smp.init() call.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:17:43.129: W smdistributed/modelparallel/torch/__init__.py:115] SageMaker model parallelism is initialized already. Ignoring the smp.init() call.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading tokenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading tokenizer_config.json: 100%|██████████| 200/200 [00:00<00:00, 214kB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading config.json: 100%|██████████| 1.42k/1.42k [00:00<00:00, 3.15MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading vocab.json:   0%|          | 0.00/779k [00:00<?, ?B/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading vocab.json: 100%|██████████| 779k/779k [00:00<00:00, 88.2MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading merges.txt: 100%|██████████| 446k/446k [00:00<00:00, 90.6MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Downloading special_tokens_map.json: 100%|██████████| 90.0/90.0 [00:00<00:00, 199kB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:Downloading and preparing dataset csv/default to /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1...\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading data files: 100%|██████████| 2/2 [00:00<00:00, 10686.12it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Extracting data files: 100%|██████████| 2/2 [00:00<00:00, 2129.09it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Generating train split: 0 examples [00:00, ? examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Generating validation split: 0 examples [00:00, ? examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:Dataset csv downloaded and prepared to /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1. Subsequent calls will reuse this data.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  0%|          | 0/2 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015100%|██████████| 2/2 [00:00<00:00, 1051.07it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:WARNING:datasets.builder:Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  0%|          | 0/2 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 2/2 [00:00<00:00, 1038.84it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:WARNING:datasets.builder:Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  0%|          | 0/2 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015100%|██████████| 2/2 [00:00<00:00, 1079.06it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:WARNING:datasets.builder:Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  0%|          | 0/2 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015100%|██████████| 2/2 [00:00<00:00, 992.62it/s][1,mpirank:1,algo-1]<stderr>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Running tokenizer on dataset:   0%|          | 0/5000 [00:00<?, ? examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Running tokenizer on dataset:  20%|██        | 1000/5000 [00:00<00:00, 7299.14 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Running tokenizer on dataset:  40%|████      | 2000/5000 [00:00<00:00, 7959.23 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Running tokenizer on dataset:  60%|██████    | 3000/5000 [00:00<00:00, 8139.44 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Running tokenizer on dataset:  80%|████████  | 4000/5000 [00:00<00:00, 8145.28 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Running tokenizer on dataset: 100%|██████████| 5000/5000 [00:00<00:00, 8154.76 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Running tokenizer on dataset:   0%|          | 0/2000 [00:00<?, ? examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Running tokenizer on dataset:  50%|█████     | 1000/2000 [00:00<00:00, 8447.54 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Running tokenizer on dataset: 100%|██████████| 2000/2000 [00:00<00:00, 8138.36 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Grouping texts in chunks of 2048:   0%|          | 0/5000 [00:00<?, ? examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Grouping texts in chunks of 2048:  20%|██        | 1000/5000 [00:00<00:00, 6708.28 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Grouping texts in chunks of 2048:  40%|████      | 2000/5000 [00:00<00:00, 6728.22 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Grouping texts in chunks of 2048:  60%|██████    | 3000/5000 [00:00<00:00, 6751.56 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Grouping texts in chunks of 2048:  80%|████████  | 4000/5000 [00:00<00:00, 6603.77 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Grouping texts in chunks of 2048: 100%|██████████| 5000/5000 [00:00<00:00, 6650.76 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Grouping texts in chunks of 2048:   0%|          | 0/2000 [00:00<?, ? examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Grouping texts in chunks of 2048:  50%|█████     | 1000/2000 [00:00<00:00, 6436.56 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015Grouping texts in chunks of 2048: 100%|██████████| 2000/2000 [00:00<00:00, 6216.25 examples/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:96 [0] NCCL INFO Bootstrap : Using eth0:10.0.76.67<0>\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:96 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:96 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:96 [0] NCCL INFO cudaDriverVersion 12000\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:NCCL version 2.16.2+cuda11.8\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:98 [3] NCCL INFO cudaDriverVersion 12000\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:97 [2] NCCL INFO cudaDriverVersion 12000\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:95 [1] NCCL INFO cudaDriverVersion 12000\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.5.0aws\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] nccl_net_ofi_init:1444 NCCL WARN NET/OFI Only EFA provider is supported\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] nccl_net_ofi_init:1483 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO NET/Socket : Using [0]eth0:10.0.76.67<0>\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:97 [2] NCCL INFO Bootstrap : Using eth0:10.0.76.67<0>\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:97 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:97 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO NET/OFI Using aws-ofi-nccl 1.5.0aws\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] nccl_net_ofi_init:1444 NCCL WARN NET/OFI Only EFA provider is supported\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] nccl_net_ofi_init:1483 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO NET/Socket : Using [0]eth0:10.0.76.67<0>\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:98 [3] NCCL INFO Bootstrap : Using eth0:10.0.76.67<0>\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:98 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:98 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO NET/OFI Using aws-ofi-nccl 1.5.0aws\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] nccl_net_ofi_init:1444 NCCL WARN NET/OFI Only EFA provider is supported\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] nccl_net_ofi_init:1483 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO NET/Socket : Using [0]eth0:10.0.76.67<0>\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:95 [1] NCCL INFO Bootstrap : Using eth0:10.0.76.67<0>\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:95 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:95 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO NET/OFI Using aws-ofi-nccl 1.5.0aws\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] nccl_net_ofi_init:1444 NCCL WARN NET/OFI Only EFA provider is supported\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] nccl_net_ofi_init:1483 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO NET/Socket : Using [0]eth0:10.0.76.67<0>\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) an[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disab[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:d dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:led between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO NCCL_MIN_NRINGS set by environment to 4.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO NCCL_MIN_NRINGS set by environment to 4.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO NCCL_MIN_NRINGS set by environment to 4.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] -1/-1/-1->3->2\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO NCCL_MIN_NRINGS set by environment to 4.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Channel 00/04 :    0   1   2   3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Channel 01/04 :    0   1   2   3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Channel 02/04 :    0   1   2   3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Channel 03/04 :    0   1   2   3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Channel 00 : 1[1c0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Channel 01 : 1[1c0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Channel 02 : 1[1c0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Channel 03 : 1[1c0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Channel 00 : 3[1e0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Channel 01 : 3[1e0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Channel 02 : 3[1e0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Channel 03 : 3[1e0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Channel 00 : 2[1d0] -> 3[1e0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Channel 01 : 2[1d0] -> 3[1e0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Channel 02 : 2[1d0] -> 3[1e0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Channel 03 : 2[1d0] -> 3[1e0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Channel 00 : 0[1b0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Channel 01 : 0[1b0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Channel 02 : 0[1b0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Channel 03 : 0[1b0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Channel 00 : 3[1e0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Channel 01 : 3[1e0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Channel 02 : 3[1e0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Channel 03 : 3[1e0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Channel 00 : 2[1d0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Channel 01 : 2[1d0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Channel 02 : 2[1d0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Channel 03 : 2[1d0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Channel 00 : 1[1c0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Channel 01 : 1[1c0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Channel 02 : 1[1c0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Channel 03 : 1[1c0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:280 [3] NCCL INFO comm 0x5578886bc580 rank 3 nranks 4 cudaDev 3 busId 1e0 commId 0x57fc5f080b648ad9 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:278 [0] NCCL INFO comm 0x56425acf6ea0 rank 0 nranks 4 cudaDev 0 busId 1b0 commId 0x57fc5f080b648ad9 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:279 [2] NCCL INFO comm 0x56344f3af730 rank 2 nranks 4 cudaDev 2 busId 1d0 commId 0x57fc5f080b648ad9 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:281 [1] NCCL INFO comm 0x5629306ae440 rank 1 nranks 4 cudaDev 1 busId 1c0 commId 0x57fc5f080b648ad9 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-f91a8f3a9dd8776b.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-f91a8f3a9dd8776b.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-f91a8f3a9dd8776b.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-9f39cf9e3fd05af1.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-4950cce7e75d9289.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-416c413cf53b149a.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-9f39cf9e3fd05af1.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-9f39cf9e3fd05af1.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-4950cce7e75d9289.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-4950cce7e75d9289.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-416c413cf53b149a.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:WARNING:datasets.arrow_dataset:Loading cached processed dataset at /root/.cache/huggingface/datasets/csv/default-128b04a1742f086e/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1/cache-416c413cf53b149a.arrow\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:{'input_ids': tensor([[11018,    82,  4468,  ...,   262,  6817,   286]]), 'attention_mask': tensor([[1, 1, 1,  ..., 1, 1, 1]]), 'labels': tensor([[11018,    82,  4468,  ...,   262,  6817,   286]])}\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:{'input_ids': tensor([[   20,    13, 22104,  ..., 19430,   257,  2882]]), 'attention_mask': tensor([[1, 1, 1,  ..., 1, 1, 1]]), 'labels': tensor([[   20,    13, 22104,  ..., 19430,   257,  2882]])}\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:{'input_ids': tensor([[3554,  198,  198,  ...,  428, 6827,   13]]), 'attention_mask': tensor([[1, 1, 1,  ..., 1, 1, 1]]), 'labels': tensor([[3554,  198,  198,  ...,  428, 6827,   13]])}\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:{'input_ids': tensor([[21017, 18261,    25,  ..., 11219,    13,  9461]]), 'attention_mask': tensor([[1, 1, 1,  ..., 1, 1, 1]]), 'labels': tensor([[21017, 18261,    25,  ..., 11219,    13,  9461]])}\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading config.json: 100%|██████████| 1.42k/1.42k [00:00<00:00, 2.73MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   0%|          | 0.00/9.94G [00:00<?, ?B/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   0%|          | 10.1M/9.94G [00:00<01:41, 106MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   0%|          | 22.0M/9.94G [00:00<01:31, 117MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   0%|          | 33.8M/9.94G [00:00<01:28, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   0%|          | 45.7M/9.94G [00:00<01:27, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   1%|          | 57.6M/9.94G [00:00<01:26, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   1%|          | 69.4M/9.94G [00:00<01:25, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   1%|          | 81.3M/9.94G [00:00<01:25, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   1%|          | 93.2M/9.94G [00:00<01:25, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   1%|          | 105M/9.94G [00:00<01:25, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   1%|          | 117M/9.94G [00:01<01:25, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   1%|▏         | 129M/9.94G [00:01<01:25, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   1%|▏         | 140M/9.94G [00:01<01:25, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   1%|▏         | 152M/9.94G [00:01<01:24, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   2%|▏         | 164M/9.94G [00:01<01:24, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   2%|▏         | 176M/9.94G [00:01<01:24, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   2%|▏         | 188M/9.94G [00:01<01:24, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   2%|▏         | 200M/9.94G [00:01<01:24, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   2%|▏         | 211M/9.94G [00:01<01:24, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   2%|▏         | 223M/9.94G [00:01<01:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   2%|▏         | 235M/9.94G [00:02<01:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   2%|▏         | 247M/9.94G [00:02<01:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   3%|▎         | 258M/9.94G [00:02<01:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   3%|▎         | 270M/9.94G [00:02<01:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   3%|▎         | 282M/9.94G [00:02<01:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   3%|▎         | 294M/9.94G [00:02<01:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   3%|▎         | 306M/9.94G [00:02<01:23, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   3%|▎         | 317M/9.94G [00:02<01:23, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   3%|▎         | 329M/9.94G [00:02<01:23, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   3%|▎         | 341M/9.94G [00:02<01:23, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   3%|▎         | 353M/9.94G [00:03<01:23, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   4%|▎         | 365M/9.94G [00:03<01:22, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   4%|▎         | 377M/9.94G [00:03<01:22, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   4%|▍         | 389M/9.94G [00:03<01:22, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   4%|▍         | 400M/9.94G [00:03<01:22, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   4%|▍         | 412M/9.94G [00:03<01:22, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   4%|▍         | 424M/9.94G [00:03<01:22, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   4%|▍         | 436M/9.94G [00:03<01:22, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   4%|▍         | 448M/9.94G [00:03<01:22, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   5%|▍         | 459M/9.94G [00:03<01:22, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   5%|▍         | 471M/9.94G [00:04<01:22, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   5%|▍         | 483M/9.94G [00:04<01:22, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   5%|▍         | 495M/9.94G [00:04<01:22, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   5%|▍         | 507M/9.94G [00:04<01:22, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   5%|▌         | 518M/9.94G [00:04<01:22, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   5%|▌         | 530M/9.94G [00:04<01:22, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   5%|▌         | 542M/9.94G [00:04<01:21, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   5%|▌         | 554M/9.94G [00:04<01:21, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   6%|▌         | 566M/9.94G [00:04<01:21, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   6%|▌         | 577M/9.94G [00:04<01:21, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   6%|▌         | 589M/9.94G [00:05<01:21, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   6%|▌         | 601M/9.94G [00:05<01:21, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   6%|▌         | 613M/9.94G [00:05<01:21, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   6%|▌         | 624M/9.94G [00:05<01:21, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   6%|▋         | 636M/9.94G [00:05<01:21, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   6%|▋         | 648M/9.94G [00:05<01:20, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   6%|▋         | 660M/9.94G [00:05<01:20, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   7%|▋         | 672M/9.94G [00:05<01:20, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   7%|▋         | 684M/9.94G [00:05<01:20, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   7%|▋         | 695M/9.94G [00:05<01:20, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   7%|▋         | 707M/9.94G [00:06<01:20, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   7%|▋         | 719M/9.94G [00:06<01:20, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   7%|▋         | 731M/9.94G [00:06<01:19, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   7%|▋         | 743M/9.94G [00:06<01:19, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   7%|▋         | 754M/9.94G [00:06<01:19, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   8%|▊         | 766M/9.94G [00:06<01:19, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   8%|▊         | 778M/9.94G [00:06<01:19, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   8%|▊         | 790M/9.94G [00:06<01:19, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   8%|▊         | 802M/9.94G [00:06<01:19, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   8%|▊         | 813M/9.94G [00:06<01:19, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   8%|▊         | 825M/9.94G [00:07<01:19, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   8%|▊         | 837M/9.94G [00:07<01:19, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   8%|▊         | 849M/9.94G [00:07<01:19, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   8%|▊         | 860M/9.94G [00:07<01:19, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   9%|▊         | 872M/9.94G [00:07<01:19, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   9%|▊         | 884M/9.94G [00:07<01:18, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   9%|▉         | 896M/9.94G [00:07<01:18, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   9%|▉         | 908M/9.94G [00:07<01:18, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   9%|▉         | 920M/9.94G [00:07<01:18, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   9%|▉         | 931M/9.94G [00:07<01:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   9%|▉         | 943M/9.94G [00:08<01:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   9%|▉         | 955M/9.94G [00:08<01:18, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:   9%|▉         | 967M/9.94G [00:08<01:18, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  10%|▉         | 978M/9.94G [00:08<01:18, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  10%|▉         | 990M/9.94G [00:08<01:18, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  10%|▉         | 0.98G/9.94G [00:08<01:18, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  10%|▉         | 0.99G/9.94G [00:08<01:18, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  10%|█         | 1.00G/9.94G [00:08<01:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  10%|█         | 1.01G/9.94G [00:08<01:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  10%|█         | 1.02G/9.94G [00:08<01:18, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  10%|█         | 1.04G/9.94G [00:09<01:17, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  11%|█         | 1.05G/9.94G [00:09<01:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  11%|█         | 1.06G/9.94G [00:09<01:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  11%|█         | 1.07G/9.94G [00:09<01:17, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  11%|█         | 1.08G/9.94G [00:09<01:17, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  11%|█         | 1.09G/9.94G [00:09<01:18, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  11%|█         | 1.10G/9.94G [00:09<01:17, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  11%|█         | 1.12G/9.94G [00:09<01:17, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  11%|█▏        | 1.13G/9.94G [00:09<01:17, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  11%|█▏        | 1.14G/9.94G [00:09<01:18, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  12%|█▏        | 1.15G/9.94G [00:10<01:17, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  12%|█▏        | 1.16G/9.94G [00:10<01:17, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  12%|█▏        | 1.17G/9.94G [00:10<01:17, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  12%|█▏        | 1.18G/9.94G [00:10<01:17, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  12%|█▏        | 1.20G/9.94G [00:10<01:16, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  12%|█▏        | 1.21G/9.94G [00:10<01:16, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  12%|█▏        | 1.22G/9.94G [00:10<01:16, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  12%|█▏        | 1.23G/9.94G [00:10<01:16, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  12%|█▏        | 1.24G/9.94G [00:10<01:16, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  13%|█▎        | 1.25G/9.94G [00:10<01:16, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  13%|█▎        | 1.26G/9.94G [00:11<01:16, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  13%|█▎        | 1.28G/9.94G [00:11<01:16, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  13%|█▎        | 1.29G/9.94G [00:11<01:16, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  13%|█▎        | 1.30G/9.94G [00:11<01:16, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  13%|█▎        | 1.31G/9.94G [00:11<01:15, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  13%|█▎        | 1.32G/9.94G [00:11<01:15, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  13%|█▎        | 1.33G/9.94G [00:11<01:15, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  14%|█▎        | 1.34G/9.94G [00:11<01:15, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  14%|█▎        | 1.36G/9.94G [00:11<01:15, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  14%|█▎        | 1.37G/9.94G [00:11<01:15, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  14%|█▍        | 1.38G/9.94G [00:12<01:20, 115MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  14%|█▍        | 1.39G/9.94G [00:12<01:18, 117MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  14%|█▍        | 1.40G/9.94G [00:12<01:17, 118MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  14%|█▍        | 1.41G/9.94G [00:12<01:17, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  14%|█▍        | 1.42G/9.94G [00:12<01:16, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  14%|█▍        | 1.43G/9.94G [00:12<01:18, 117MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  15%|█▍        | 1.45G/9.94G [00:12<01:17, 118MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  15%|█▍        | 1.46G/9.94G [00:12<01:16, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  15%|█▍        | 1.47G/9.94G [00:12<01:16, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  15%|█▍        | 1.48G/9.94G [00:12<01:15, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  15%|█▍        | 1.49G/9.94G [00:13<01:15, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  15%|█▌        | 1.50G/9.94G [00:13<01:15, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  15%|█▌        | 1.51G/9.94G [00:13<01:14, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  15%|█▌        | 1.52G/9.94G [00:13<01:14, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  15%|█▌        | 1.54G/9.94G [00:13<01:14, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  16%|█▌        | 1.55G/9.94G [00:13<01:14, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  16%|█▌        | 1.56G/9.94G [00:13<01:14, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  16%|█▌        | 1.57G/9.94G [00:13<01:13, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  16%|█▌        | 1.58G/9.94G [00:13<01:13, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  16%|█▌        | 1.59G/9.94G [00:13<01:13, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  16%|█▌        | 1.60G/9.94G [00:14<01:13, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  16%|█▌        | 1.62G/9.94G [00:14<01:13, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  16%|█▋        | 1.63G/9.94G [00:14<01:12, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  16%|█▋        | 1.64G/9.94G [00:14<01:12, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  17%|█▋        | 1.65G/9.94G [00:14<01:12, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  17%|█▋        | 1.66G/9.94G [00:14<01:12, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  17%|█▋        | 1.67G/9.94G [00:14<01:12, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  17%|█▋        | 1.68G/9.94G [00:14<01:12, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  17%|█▋        | 1.70G/9.94G [00:14<01:12, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  17%|█▋        | 1.71G/9.94G [00:14<01:12, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  17%|█▋        | 1.72G/9.94G [00:15<01:12, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  17%|█▋        | 1.73G/9.94G [00:15<01:11, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  18%|█▊        | 1.74G/9.94G [00:15<01:11, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  18%|█▊        | 1.75G/9.94G [00:15<01:11, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  18%|█▊        | 1.76G/9.94G [00:15<01:11, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  18%|█▊        | 1.78G/9.94G [00:15<01:12, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  18%|█▊        | 1.79G/9.94G [00:15<01:15, 116MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  18%|█▊        | 1.80G/9.94G [00:15<01:14, 118MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  18%|█▊        | 1.81G/9.94G [00:15<01:13, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  18%|█▊        | 1.82G/9.94G [00:15<01:12, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  18%|█▊        | 1.83G/9.94G [00:16<01:12, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  19%|█▊        | 1.84G/9.94G [00:16<01:11, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  19%|█▊        | 1.85G/9.94G [00:16<01:11, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  19%|█▉        | 1.87G/9.94G [00:16<01:10, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  19%|█▉        | 1.88G/9.94G [00:16<01:10, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  19%|█▉        | 1.89G/9.94G [00:16<01:10, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  19%|█▉        | 1.90G/9.94G [00:16<01:10, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  19%|█▉        | 1.91G/9.94G [00:16<01:10, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  19%|█▉        | 1.92G/9.94G [00:16<01:09, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  19%|█▉        | 1.94G/9.94G [00:16<01:09, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  20%|█▉        | 1.95G/9.94G [00:17<01:09, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  20%|█▉        | 1.96G/9.94G [00:17<01:09, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  20%|█▉        | 1.97G/9.94G [00:17<01:09, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  20%|█▉        | 1.98G/9.94G [00:17<01:09, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  20%|██        | 1.99G/9.94G [00:17<01:09, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  20%|██        | 2.00G/9.94G [00:17<01:08, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  20%|██        | 2.02G/9.94G [00:17<01:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  20%|██        | 2.03G/9.94G [00:17<01:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  21%|██        | 2.04G/9.94G [00:17<01:08, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  21%|██        | 2.05G/9.94G [00:17<01:08, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  21%|██        | 2.06G/9.94G [00:18<01:08, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  21%|██        | 2.07G/9.94G [00:18<01:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  21%|██        | 2.08G/9.94G [00:18<01:08, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  21%|██        | 2.10G/9.94G [00:18<01:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  21%|██        | 2.11G/9.94G [00:18<01:08, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  21%|██▏       | 2.12G/9.94G [00:18<01:08, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  21%|██▏       | 2.13G/9.94G [00:18<01:08, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  22%|██▏       | 2.14G/9.94G [00:18<01:07, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  22%|██▏       | 2.15G/9.94G [00:18<01:07, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  22%|██▏       | 2.17G/9.94G [00:18<01:07, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  22%|██▏       | 2.18G/9.94G [00:19<01:07, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  22%|██▏       | 2.19G/9.94G [00:19<01:07, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  22%|██▏       | 2.20G/9.94G [00:19<01:11, 117MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  22%|██▏       | 2.21G/9.94G [00:19<01:12, 114MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  22%|██▏       | 2.22G/9.94G [00:19<01:13, 112MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  22%|██▏       | 2.23G/9.94G [00:19<01:11, 115MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  23%|██▎       | 2.24G/9.94G [00:19<01:10, 117MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  23%|██▎       | 2.26G/9.94G [00:19<01:09, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  23%|██▎       | 2.27G/9.94G [00:19<01:08, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  23%|██▎       | 2.28G/9.94G [00:20<01:07, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  23%|██▎       | 2.29G/9.94G [00:20<01:07, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  23%|██▎       | 2.30G/9.94G [00:20<01:08, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  23%|██▎       | 2.31G/9.94G [00:20<01:07, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  23%|██▎       | 2.32G/9.94G [00:20<01:07, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  24%|██▎       | 2.34G/9.94G [00:20<01:07, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  24%|██▎       | 2.35G/9.94G [00:20<01:06, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  24%|██▎       | 2.36G/9.94G [00:20<01:06, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  24%|██▍       | 2.37G/9.94G [00:20<01:06, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  24%|██▍       | 2.38G/9.94G [00:20<01:06, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  24%|██▍       | 2.39G/9.94G [00:21<01:05, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  24%|██▍       | 2.40G/9.94G [00:21<01:07, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  24%|██▍       | 2.42G/9.94G [00:21<01:07, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  24%|██▍       | 2.43G/9.94G [00:21<01:06, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  25%|██▍       | 2.44G/9.94G [00:21<01:06, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  25%|██▍       | 2.45G/9.94G [00:21<01:05, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  25%|██▍       | 2.46G/9.94G [00:21<01:05, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  25%|██▍       | 2.47G/9.94G [00:21<01:05, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  25%|██▌       | 2.49G/9.94G [00:21<01:05, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  25%|██▌       | 2.50G/9.94G [00:21<01:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  25%|██▌       | 2.51G/9.94G [00:22<01:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  25%|██▌       | 2.52G/9.94G [00:22<01:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  25%|██▌       | 2.53G/9.94G [00:22<01:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  26%|██▌       | 2.54G/9.94G [00:22<01:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  26%|██▌       | 2.55G/9.94G [00:22<01:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  26%|██▌       | 2.57G/9.94G [00:22<01:04, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  26%|██▌       | 2.58G/9.94G [00:22<01:04, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  26%|██▌       | 2.59G/9.94G [00:22<01:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  26%|██▌       | 2.60G/9.94G [00:22<01:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  26%|██▋       | 2.61G/9.94G [00:22<01:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  26%|██▋       | 2.62G/9.94G [00:23<01:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  27%|██▋       | 2.63G/9.94G [00:23<01:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  27%|██▋       | 2.65G/9.94G [00:23<01:03, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  27%|██▋       | 2.66G/9.94G [00:23<01:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  27%|██▋       | 2.67G/9.94G [00:23<01:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  27%|██▋       | 2.68G/9.94G [00:23<01:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  27%|██▋       | 2.69G/9.94G [00:23<01:02, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  27%|██▋       | 2.70G/9.94G [00:23<01:02, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  27%|██▋       | 2.72G/9.94G [00:23<01:02, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  27%|██▋       | 2.73G/9.94G [00:23<01:02, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  28%|██▊       | 2.74G/9.94G [00:24<01:02, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  28%|██▊       | 2.75G/9.94G [00:24<01:02, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  28%|██▊       | 2.76G/9.94G [00:24<01:02, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  28%|██▊       | 2.77G/9.94G [00:24<01:02, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  28%|██▊       | 2.78G/9.94G [00:24<01:02, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  28%|██▊       | 2.80G/9.94G [00:24<01:02, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  28%|██▊       | 2.81G/9.94G [00:24<01:02, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  28%|██▊       | 2.82G/9.94G [00:24<01:02, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  28%|██▊       | 2.83G/9.94G [00:24<01:01, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  29%|██▊       | 2.84G/9.94G [00:24<01:01, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  29%|██▊       | 2.85G/9.94G [00:25<01:01, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  29%|██▉       | 2.86G/9.94G [00:25<01:01, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  29%|██▉       | 2.88G/9.94G [00:25<01:01, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  29%|██▉       | 2.89G/9.94G [00:25<01:01, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  29%|██▉       | 2.90G/9.94G [00:25<01:01, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  29%|██▉       | 2.91G/9.94G [00:25<01:01, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  29%|██▉       | 2.92G/9.94G [00:25<01:01, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  30%|██▉       | 2.93G/9.94G [00:25<01:00, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  30%|██▉       | 2.95G/9.94G [00:25<01:00, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  30%|██▉       | 2.96G/9.94G [00:25<01:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  30%|██▉       | 2.97G/9.94G [00:26<01:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  30%|██▉       | 2.98G/9.94G [00:26<01:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  30%|███       | 2.99G/9.94G [00:26<01:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  30%|███       | 3.00G/9.94G [00:26<01:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  30%|███       | 3.01G/9.94G [00:26<01:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  30%|███       | 3.03G/9.94G [00:26<01:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  31%|███       | 3.04G/9.94G [00:26<01:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  31%|███       | 3.05G/9.94G [00:26<01:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  31%|███       | 3.06G/9.94G [00:26<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  31%|███       | 3.07G/9.94G [00:26<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  31%|███       | 3.08G/9.94G [00:27<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  31%|███       | 3.09G/9.94G [00:27<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  31%|███▏      | 3.11G/9.94G [00:27<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  31%|███▏      | 3.12G/9.94G [00:27<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  31%|███▏      | 3.13G/9.94G [00:27<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  32%|███▏      | 3.14G/9.94G [00:27<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  32%|███▏      | 3.15G/9.94G [00:27<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  32%|███▏      | 3.16G/9.94G [00:27<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  32%|███▏      | 3.18G/9.94G [00:27<00:59, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  32%|███▏      | 3.19G/9.94G [00:27<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  32%|███▏      | 3.20G/9.94G [00:28<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  32%|███▏      | 3.21G/9.94G [00:28<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  32%|███▏      | 3.22G/9.94G [00:28<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  33%|███▎      | 3.23G/9.94G [00:28<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  33%|███▎      | 3.24G/9.94G [00:28<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  33%|███▎      | 3.26G/9.94G [00:28<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  33%|███▎      | 3.27G/9.94G [00:28<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  33%|███▎      | 3.28G/9.94G [00:28<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  33%|███▎      | 3.29G/9.94G [00:28<00:58, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  33%|███▎      | 3.30G/9.94G [00:28<00:57, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  33%|███▎      | 3.31G/9.94G [00:29<00:57, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  33%|███▎      | 3.32G/9.94G [00:29<00:57, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  34%|███▎      | 3.34G/9.94G [00:29<00:57, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  34%|███▎      | 3.35G/9.94G [00:29<00:57, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  34%|███▍      | 3.36G/9.94G [00:29<00:57, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  34%|███▍      | 3.37G/9.94G [00:29<00:57, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  34%|███▍      | 3.38G/9.94G [00:29<00:57, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  34%|███▍      | 3.39G/9.94G [00:29<00:57, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  34%|███▍      | 3.40G/9.94G [00:29<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  34%|███▍      | 3.42G/9.94G [00:29<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  34%|███▍      | 3.43G/9.94G [00:30<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  35%|███▍      | 3.44G/9.94G [00:30<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  35%|███▍      | 3.45G/9.94G [00:30<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  35%|███▍      | 3.46G/9.94G [00:30<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  35%|███▍      | 3.47G/9.94G [00:30<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  35%|███▌      | 3.49G/9.94G [00:30<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  35%|███▌      | 3.50G/9.94G [00:30<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  35%|███▌      | 3.51G/9.94G [00:30<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  35%|███▌      | 3.52G/9.94G [00:30<00:56, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  36%|███▌      | 3.53G/9.94G [00:30<00:55, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  36%|███▌      | 3.54G/9.94G [00:31<00:55, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  36%|███▌      | 3.55G/9.94G [00:31<00:55, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  36%|███▌      | 3.57G/9.94G [00:31<00:55, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  36%|███▌      | 3.58G/9.94G [00:31<00:55, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  36%|███▌      | 3.59G/9.94G [00:31<00:55, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  36%|███▌      | 3.60G/9.94G [00:31<00:55, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  36%|███▋      | 3.61G/9.94G [00:31<00:55, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  36%|███▋      | 3.62G/9.94G [00:31<00:55, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  37%|███▋      | 3.63G/9.94G [00:31<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  37%|███▋      | 3.65G/9.94G [00:31<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  37%|███▋      | 3.66G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  37%|███▋      | 3.67G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  37%|███▋      | 3.68G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  37%|███▋      | 3.69G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  37%|███▋      | 3.70G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  37%|███▋      | 3.71G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  37%|███▋      | 3.73G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  38%|███▊      | 3.74G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  38%|███▊      | 3.75G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  38%|███▊      | 3.76G/9.94G [00:32<00:54, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  38%|███▊      | 3.77G/9.94G [00:33<00:53, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  38%|███▊      | 3.78G/9.94G [00:33<00:53, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  38%|███▊      | 3.79G/9.94G [00:33<00:53, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  38%|███▊      | 3.81G/9.94G [00:33<00:53, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  38%|███▊      | 3.82G/9.94G [00:33<00:53, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  39%|███▊      | 3.83G/9.94G [00:33<00:53, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  39%|███▊      | 3.84G/9.94G [00:33<00:53, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  39%|███▉      | 3.85G/9.94G [00:33<00:53, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  39%|███▉      | 3.86G/9.94G [00:33<00:53, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  39%|███▉      | 3.87G/9.94G [00:33<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  39%|███▉      | 3.89G/9.94G [00:34<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  39%|███▉      | 3.90G/9.94G [00:34<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  39%|███▉      | 3.91G/9.94G [00:34<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  39%|███▉      | 3.92G/9.94G [00:34<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  40%|███▉      | 3.93G/9.94G [00:34<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  40%|███▉      | 3.94G/9.94G [00:34<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  40%|███▉      | 3.96G/9.94G [00:34<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  40%|███▉      | 3.97G/9.94G [00:34<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  40%|████      | 3.98G/9.94G [00:34<00:52, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  40%|████      | 3.99G/9.94G [00:34<00:52, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  40%|████      | 4.00G/9.94G [00:35<00:52, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  40%|████      | 4.01G/9.94G [00:35<00:52, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  40%|████      | 4.02G/9.94G [00:35<00:52, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  41%|████      | 4.03G/9.94G [00:35<00:51, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  41%|████      | 4.05G/9.94G [00:35<00:51, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  41%|████      | 4.06G/9.94G [00:35<00:51, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  41%|████      | 4.07G/9.94G [00:35<00:51, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  41%|████      | 4.08G/9.94G [00:35<00:51, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  41%|████      | 4.09G/9.94G [00:35<00:51, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  41%|████▏     | 4.10G/9.94G [00:35<00:51, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  41%|████▏     | 4.11G/9.94G [00:36<00:51, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  42%|████▏     | 4.13G/9.94G [00:36<00:51, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  42%|████▏     | 4.14G/9.94G [00:36<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  42%|████▏     | 4.15G/9.94G [00:36<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  42%|████▏     | 4.16G/9.94G [00:36<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  42%|████▏     | 4.17G/9.94G [00:36<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  42%|████▏     | 4.18G/9.94G [00:36<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  42%|████▏     | 4.19G/9.94G [00:36<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  42%|████▏     | 4.21G/9.94G [00:36<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  42%|████▏     | 4.22G/9.94G [00:36<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  43%|████▎     | 4.23G/9.94G [00:37<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  43%|████▎     | 4.24G/9.94G [00:37<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  43%|████▎     | 4.25G/9.94G [00:37<00:50, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  43%|████▎     | 4.26G/9.94G [00:37<00:49, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  43%|████▎     | 4.27G/9.94G [00:37<00:49, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  43%|████▎     | 4.29G/9.94G [00:37<00:49, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  43%|████▎     | 4.30G/9.94G [00:37<00:49, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  43%|████▎     | 4.31G/9.94G [00:37<00:49, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  43%|████▎     | 4.32G/9.94G [00:37<00:49, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  44%|████▎     | 4.33G/9.94G [00:37<00:49, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  44%|████▎     | 4.34G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  44%|████▍     | 4.35G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  44%|████▍     | 4.37G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  44%|████▍     | 4.38G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  44%|████▍     | 4.39G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  44%|████▍     | 4.40G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  44%|████▍     | 4.41G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  44%|████▍     | 4.42G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  45%|████▍     | 4.43G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  45%|████▍     | 4.45G/9.94G [00:38<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  45%|████▍     | 4.46G/9.94G [00:39<00:48, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  45%|████▍     | 4.47G/9.94G [00:39<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  45%|████▌     | 4.48G/9.94G [00:39<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  45%|████▌     | 4.49G/9.94G [00:39<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  45%|████▌     | 4.50G/9.94G [00:39<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  45%|████▌     | 4.51G/9.94G [00:39<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  46%|████▌     | 4.53G/9.94G [00:39<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  46%|████▌     | 4.54G/9.94G [00:39<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  46%|████▌     | 4.55G/9.94G [00:39<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  46%|████▌     | 4.56G/9.94G [00:39<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  46%|████▌     | 4.57G/9.94G [00:40<00:47, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  46%|████▌     | 4.58G/9.94G [00:40<00:46, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  46%|████▌     | 4.59G/9.94G [00:40<00:46, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  46%|████▋     | 4.61G/9.94G [00:40<00:46, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  46%|████▋     | 4.62G/9.94G [00:40<00:46, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  47%|████▋     | 4.63G/9.94G [00:40<00:46, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  47%|████▋     | 4.64G/9.94G [00:40<00:46, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  47%|████▋     | 4.65G/9.94G [00:40<00:46, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  47%|████▋     | 4.66G/9.94G [00:40<00:46, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  47%|████▋     | 4.67G/9.94G [00:40<00:46, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  47%|████▋     | 4.69G/9.94G [00:41<00:46, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  47%|████▋     | 4.70G/9.94G [00:41<00:46, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  47%|████▋     | 4.71G/9.94G [00:41<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  47%|████▋     | 4.72G/9.94G [00:41<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  48%|████▊     | 4.73G/9.94G [00:41<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  48%|████▊     | 4.74G/9.94G [00:41<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  48%|████▊     | 4.75G/9.94G [00:41<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  48%|████▊     | 4.77G/9.94G [00:41<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  48%|████▊     | 4.78G/9.94G [00:41<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  48%|████▊     | 4.79G/9.94G [00:41<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  48%|████▊     | 4.80G/9.94G [00:42<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  48%|████▊     | 4.81G/9.94G [00:42<00:45, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  49%|████▊     | 4.82G/9.94G [00:42<00:44, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  49%|████▊     | 4.83G/9.94G [00:42<00:44, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  49%|████▊     | 4.85G/9.94G [00:42<00:44, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  49%|████▉     | 4.86G/9.94G [00:42<00:44, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  49%|████▉     | 4.87G/9.94G [00:42<00:44, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  49%|████▉     | 4.88G/9.94G [00:42<00:44, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  49%|████▉     | 4.89G/9.94G [00:42<00:44, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  49%|████▉     | 4.90G/9.94G [00:42<00:44, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  49%|████▉     | 4.91G/9.94G [00:43<00:44, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  50%|████▉     | 4.93G/9.94G [00:43<00:43, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  50%|████▉     | 4.94G/9.94G [00:43<00:43, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  50%|████▉     | 4.95G/9.94G [00:43<00:43, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  50%|████▉     | 4.96G/9.94G [00:43<00:43, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  50%|█████     | 4.97G/9.94G [00:43<00:43, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  50%|█████     | 4.98G/9.94G [00:43<00:43, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  50%|█████     | 4.99G/9.94G [00:43<00:43, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  50%|█████     | 5.01G/9.94G [00:43<00:43, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  50%|█████     | 5.02G/9.94G [00:43<00:43, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  51%|█████     | 5.03G/9.94G [00:44<00:43, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  51%|█████     | 5.04G/9.94G [00:44<00:43, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  51%|█████     | 5.05G/9.94G [00:44<00:42, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  51%|█████     | 5.06G/9.94G [00:44<00:42, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  51%|█████     | 5.07G/9.94G [00:44<00:42, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  51%|█████     | 5.08G/9.94G [00:44<00:42, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  51%|█████▏    | 5.10G/9.94G [00:44<00:42, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  51%|█████▏    | 5.11G/9.94G [00:44<00:42, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  52%|█████▏    | 5.12G/9.94G [00:44<00:42, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  52%|█████▏    | 5.13G/9.94G [00:44<00:42, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  52%|█████▏    | 5.14G/9.94G [00:45<00:42, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  52%|█████▏    | 5.15G/9.94G [00:45<00:41, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  52%|█████▏    | 5.16G/9.94G [00:45<00:41, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  52%|█████▏    | 5.18G/9.94G [00:45<00:41, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  52%|█████▏    | 5.19G/9.94G [00:45<00:41, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  52%|█████▏    | 5.20G/9.94G [00:45<00:41, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  52%|█████▏    | 5.21G/9.94G [00:45<00:41, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  53%|█████▎    | 5.22G/9.94G [00:45<00:41, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  53%|█████▎    | 5.23G/9.94G [00:45<00:41, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  53%|█████▎    | 5.24G/9.94G [00:45<00:41, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  53%|█████▎    | 5.26G/9.94G [00:46<00:41, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  53%|█████▎    | 5.27G/9.94G [00:46<00:41, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  53%|█████▎    | 5.28G/9.94G [00:46<00:41, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  53%|█████▎    | 5.29G/9.94G [00:46<00:41, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  53%|█████▎    | 5.30G/9.94G [00:46<00:41, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  53%|█████▎    | 5.31G/9.94G [00:46<00:41, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  54%|█████▎    | 5.32G/9.94G [00:46<00:40, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  54%|█████▎    | 5.34G/9.94G [00:46<00:40, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  54%|█████▍    | 5.35G/9.94G [00:46<00:40, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  54%|█████▍    | 5.36G/9.94G [00:46<00:40, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  54%|█████▍    | 5.37G/9.94G [00:47<00:40, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  54%|█████▍    | 5.38G/9.94G [00:47<00:40, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  54%|█████▍    | 5.39G/9.94G [00:47<00:40, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  54%|█████▍    | 5.40G/9.94G [00:47<00:39, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  54%|█████▍    | 5.41G/9.94G [00:47<00:39, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  55%|█████▍    | 5.43G/9.94G [00:47<00:39, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  55%|█████▍    | 5.44G/9.94G [00:47<00:39, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  55%|█████▍    | 5.45G/9.94G [00:47<00:39, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  55%|█████▍    | 5.46G/9.94G [00:47<00:39, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  55%|█████▌    | 5.47G/9.94G [00:47<00:39, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  55%|█████▌    | 5.48G/9.94G [00:48<00:39, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  55%|█████▌    | 5.49G/9.94G [00:48<00:39, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  55%|█████▌    | 5.51G/9.94G [00:48<00:39, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  56%|█████▌    | 5.52G/9.94G [00:48<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  56%|█████▌    | 5.53G/9.94G [00:48<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  56%|█████▌    | 5.54G/9.94G [00:48<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  56%|█████▌    | 5.55G/9.94G [00:48<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  56%|█████▌    | 5.56G/9.94G [00:48<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  56%|█████▌    | 5.57G/9.94G [00:48<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  56%|█████▌    | 5.59G/9.94G [00:48<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  56%|█████▋    | 5.60G/9.94G [00:49<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  56%|█████▋    | 5.61G/9.94G [00:49<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  57%|█████▋    | 5.62G/9.94G [00:49<00:38, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  57%|█████▋    | 5.63G/9.94G [00:49<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  57%|█████▋    | 5.64G/9.94G [00:49<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  57%|█████▋    | 5.65G/9.94G [00:49<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  57%|█████▋    | 5.67G/9.94G [00:49<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  57%|█████▋    | 5.68G/9.94G [00:49<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  57%|█████▋    | 5.69G/9.94G [00:49<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  57%|█████▋    | 5.70G/9.94G [00:49<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  57%|█████▋    | 5.71G/9.94G [00:50<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  58%|█████▊    | 5.72G/9.94G [00:50<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  58%|█████▊    | 5.73G/9.94G [00:50<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  58%|█████▊    | 5.74G/9.94G [00:50<00:37, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  58%|█████▊    | 5.76G/9.94G [00:50<00:36, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  58%|█████▊    | 5.77G/9.94G [00:50<00:36, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  58%|█████▊    | 5.78G/9.94G [00:50<00:36, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  58%|█████▊    | 5.79G/9.94G [00:50<00:36, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  58%|█████▊    | 5.80G/9.94G [00:50<00:36, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  58%|█████▊    | 5.81G/9.94G [00:50<00:36, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  59%|█████▊    | 5.82G/9.94G [00:51<00:36, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  59%|█████▊    | 5.84G/9.94G [00:51<00:36, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  59%|█████▉    | 5.85G/9.94G [00:51<00:36, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  59%|█████▉    | 5.86G/9.94G [00:51<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  59%|█████▉    | 5.87G/9.94G [00:51<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  59%|█████▉    | 5.88G/9.94G [00:51<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  59%|█████▉    | 5.89G/9.94G [00:51<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  59%|█████▉    | 5.90G/9.94G [00:51<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  60%|█████▉    | 5.92G/9.94G [00:51<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  60%|█████▉    | 5.93G/9.94G [00:51<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  60%|█████▉    | 5.94G/9.94G [00:52<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  60%|█████▉    | 5.95G/9.94G [00:52<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  60%|█████▉    | 5.96G/9.94G [00:52<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  60%|██████    | 5.97G/9.94G [00:52<00:35, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  60%|██████    | 5.98G/9.94G [00:52<00:34, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  60%|██████    | 5.99G/9.94G [00:52<00:34, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  60%|██████    | 6.01G/9.94G [00:52<00:34, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  61%|██████    | 6.02G/9.94G [00:52<00:34, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  61%|██████    | 6.03G/9.94G [00:52<00:34, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  61%|██████    | 6.04G/9.94G [00:52<00:34, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  61%|██████    | 6.05G/9.94G [00:53<00:34, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  61%|██████    | 6.06G/9.94G [00:53<00:34, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  61%|██████    | 6.07G/9.94G [00:53<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  61%|██████    | 6.09G/9.94G [00:53<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  61%|██████▏   | 6.10G/9.94G [00:53<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  61%|██████▏   | 6.11G/9.94G [00:53<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  62%|██████▏   | 6.12G/9.94G [00:53<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  62%|██████▏   | 6.13G/9.94G [00:53<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  62%|██████▏   | 6.14G/9.94G [00:53<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  62%|██████▏   | 6.15G/9.94G [00:53<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  62%|██████▏   | 6.17G/9.94G [00:54<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  62%|██████▏   | 6.18G/9.94G [00:54<00:33, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  62%|██████▏   | 6.19G/9.94G [00:54<00:32, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  62%|██████▏   | 6.20G/9.94G [00:54<00:32, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  62%|██████▏   | 6.21G/9.94G [00:54<00:32, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  63%|██████▎   | 6.22G/9.94G [00:54<00:32, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  63%|██████▎   | 6.23G/9.94G [00:54<00:32, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  63%|██████▎   | 6.25G/9.94G [00:54<00:32, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  63%|██████▎   | 6.26G/9.94G [00:54<00:32, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  63%|██████▎   | 6.27G/9.94G [00:54<00:32, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  63%|██████▎   | 6.28G/9.94G [00:55<00:32, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  63%|██████▎   | 6.29G/9.94G [00:55<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  63%|██████▎   | 6.30G/9.94G [00:55<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  64%|██████▎   | 6.31G/9.94G [00:55<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  64%|██████▎   | 6.33G/9.94G [00:55<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  64%|██████▍   | 6.34G/9.94G [00:55<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  64%|██████▍   | 6.35G/9.94G [00:55<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  64%|██████▍   | 6.36G/9.94G [00:55<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  64%|██████▍   | 6.37G/9.94G [00:55<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  64%|██████▍   | 6.38G/9.94G [00:55<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  64%|██████▍   | 6.39G/9.94G [00:56<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  64%|██████▍   | 6.40G/9.94G [00:56<00:31, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  65%|██████▍   | 6.42G/9.94G [00:56<00:30, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  65%|██████▍   | 6.43G/9.94G [00:56<00:30, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  65%|██████▍   | 6.44G/9.94G [00:56<00:30, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  65%|██████▍   | 6.45G/9.94G [00:56<00:30, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  65%|██████▌   | 6.46G/9.94G [00:56<00:30, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  65%|██████▌   | 6.47G/9.94G [00:56<00:30, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  65%|██████▌   | 6.48G/9.94G [00:56<00:30, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  65%|██████▌   | 6.50G/9.94G [00:56<00:30, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  65%|██████▌   | 6.51G/9.94G [00:57<00:31, 117MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  66%|██████▌   | 6.52G/9.94G [00:57<00:31, 118MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  66%|██████▌   | 6.53G/9.94G [00:57<00:36, 101MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  66%|██████▌   | 6.54G/9.94G [00:57<00:34, 106MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  66%|██████▌   | 6.55G/9.94G [00:57<00:33, 109MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  66%|██████▌   | 6.56G/9.94G [00:57<00:32, 112MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  66%|██████▌   | 6.57G/9.94G [00:57<00:31, 115MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  66%|██████▋   | 6.59G/9.94G [00:57<00:30, 117MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  66%|██████▋   | 6.60G/9.94G [00:57<00:30, 118MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  66%|██████▋   | 6.61G/9.94G [00:58<00:30, 117MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  67%|██████▋   | 6.62G/9.94G [00:58<00:30, 118MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  67%|██████▋   | 6.63G/9.94G [00:58<00:29, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  67%|██████▋   | 6.64G/9.94G [00:58<00:29, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  67%|██████▋   | 6.65G/9.94G [00:58<00:29, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  67%|██████▋   | 6.66G/9.94G [00:58<00:29, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  67%|██████▋   | 6.68G/9.94G [00:58<00:29, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  67%|██████▋   | 6.69G/9.94G [00:58<00:29, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  67%|██████▋   | 6.70G/9.94G [00:58<00:28, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  68%|██████▊   | 6.71G/9.94G [00:58<00:28, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  68%|██████▊   | 6.72G/9.94G [00:59<00:28, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  68%|██████▊   | 6.73G/9.94G [00:59<00:28, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  68%|██████▊   | 6.74G/9.94G [00:59<00:28, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  68%|██████▊   | 6.75G/9.94G [00:59<00:28, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  68%|██████▊   | 6.77G/9.94G [00:59<00:28, 118MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  68%|██████▊   | 6.78G/9.94G [00:59<00:28, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  68%|██████▊   | 6.79G/9.94G [00:59<00:28, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  68%|██████▊   | 6.80G/9.94G [00:59<00:28, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  69%|██████▊   | 6.81G/9.94G [00:59<00:28, 117MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  69%|██████▊   | 6.82G/9.94G [00:59<00:28, 118MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  69%|██████▉   | 6.83G/9.94G [01:00<00:27, 119MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  69%|██████▉   | 6.85G/9.94G [01:00<00:27, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  69%|██████▉   | 6.86G/9.94G [01:00<00:27, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  69%|██████▉   | 6.87G/9.94G [01:00<00:27, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  69%|██████▉   | 6.88G/9.94G [01:00<00:27, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  69%|██████▉   | 6.89G/9.94G [01:00<00:26, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  69%|██████▉   | 6.90G/9.94G [01:00<00:26, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  70%|██████▉   | 6.91G/9.94G [01:00<00:26, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  70%|██████▉   | 6.92G/9.94G [01:00<00:26, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  70%|██████▉   | 6.94G/9.94G [01:00<00:26, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  70%|██████▉   | 6.95G/9.94G [01:01<00:26, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  70%|███████   | 6.96G/9.94G [01:01<00:26, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  70%|███████   | 6.97G/9.94G [01:01<00:26, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  70%|███████   | 6.98G/9.94G [01:01<00:26, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  70%|███████   | 6.99G/9.94G [01:01<00:26, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  70%|███████   | 7.00G/9.94G [01:01<00:25, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  71%|███████   | 7.02G/9.94G [01:01<00:25, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  71%|███████   | 7.03G/9.94G [01:01<00:25, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  71%|███████   | 7.04G/9.94G [01:01<00:25, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  71%|███████   | 7.05G/9.94G [01:01<00:25, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  71%|███████   | 7.06G/9.94G [01:02<00:25, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  71%|███████   | 7.07G/9.94G [01:02<00:25, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  71%|███████▏  | 7.08G/9.94G [01:02<00:25, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  71%|███████▏  | 7.10G/9.94G [01:02<00:24, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  72%|███████▏  | 7.11G/9.94G [01:02<00:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  72%|███████▏  | 7.12G/9.94G [01:02<00:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  72%|███████▏  | 7.13G/9.94G [01:02<00:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  72%|███████▏  | 7.14G/9.94G [01:02<00:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  72%|███████▏  | 7.15G/9.94G [01:02<00:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  72%|███████▏  | 7.16G/9.94G [01:02<00:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  72%|███████▏  | 7.18G/9.94G [01:03<00:24, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  72%|███████▏  | 7.19G/9.94G [01:03<00:24, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  72%|███████▏  | 7.20G/9.94G [01:03<00:24, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  73%|███████▎  | 7.21G/9.94G [01:03<00:24, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  73%|███████▎  | 7.22G/9.94G [01:03<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  73%|███████▎  | 7.23G/9.94G [01:03<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  73%|███████▎  | 7.24G/9.94G [01:03<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  73%|███████▎  | 7.26G/9.94G [01:03<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  73%|███████▎  | 7.27G/9.94G [01:03<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  73%|███████▎  | 7.28G/9.94G [01:03<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  73%|███████▎  | 7.29G/9.94G [01:04<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  73%|███████▎  | 7.30G/9.94G [01:04<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  74%|███████▎  | 7.31G/9.94G [01:04<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  74%|███████▎  | 7.32G/9.94G [01:04<00:23, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  74%|███████▍  | 7.34G/9.94G [01:04<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  74%|███████▍  | 7.35G/9.94G [01:04<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  74%|███████▍  | 7.36G/9.94G [01:04<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  74%|███████▍  | 7.37G/9.94G [01:04<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  74%|███████▍  | 7.38G/9.94G [01:04<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  74%|███████▍  | 7.39G/9.94G [01:04<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  74%|███████▍  | 7.40G/9.94G [01:05<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  75%|███████▍  | 7.41G/9.94G [01:05<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  75%|███████▍  | 7.43G/9.94G [01:05<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  75%|███████▍  | 7.44G/9.94G [01:05<00:22, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  75%|███████▍  | 7.45G/9.94G [01:05<00:22, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  75%|███████▌  | 7.46G/9.94G [01:05<00:21, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  75%|███████▌  | 7.47G/9.94G [01:05<00:21, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  75%|███████▌  | 7.48G/9.94G [01:05<00:21, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  75%|███████▌  | 7.49G/9.94G [01:05<00:21, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  76%|███████▌  | 7.51G/9.94G [01:05<00:21, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  76%|███████▌  | 7.52G/9.94G [01:06<00:21, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  76%|███████▌  | 7.53G/9.94G [01:06<00:21, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  76%|███████▌  | 7.54G/9.94G [01:06<00:21, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  76%|███████▌  | 7.55G/9.94G [01:06<00:21, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  76%|███████▌  | 7.56G/9.94G [01:06<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  76%|███████▌  | 7.57G/9.94G [01:06<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  76%|███████▋  | 7.58G/9.94G [01:06<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  76%|███████▋  | 7.60G/9.94G [01:06<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  77%|███████▋  | 7.61G/9.94G [01:06<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  77%|███████▋  | 7.62G/9.94G [01:07<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  77%|███████▋  | 7.63G/9.94G [01:07<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  77%|███████▋  | 7.64G/9.94G [01:07<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  77%|███████▋  | 7.65G/9.94G [01:07<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  77%|███████▋  | 7.66G/9.94G [01:07<00:20, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  77%|███████▋  | 7.68G/9.94G [01:07<00:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  77%|███████▋  | 7.69G/9.94G [01:07<00:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  77%|███████▋  | 7.70G/9.94G [01:07<00:19, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  78%|███████▊  | 7.71G/9.94G [01:07<00:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  78%|███████▊  | 7.72G/9.94G [01:07<00:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  78%|███████▊  | 7.73G/9.94G [01:08<00:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  78%|███████▊  | 7.74G/9.94G [01:08<00:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  78%|███████▊  | 7.76G/9.94G [01:08<00:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  78%|███████▊  | 7.77G/9.94G [01:08<00:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  78%|███████▊  | 7.78G/9.94G [01:08<00:19, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  78%|███████▊  | 7.79G/9.94G [01:08<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  78%|███████▊  | 7.80G/9.94G [01:08<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  79%|███████▊  | 7.81G/9.94G [01:08<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  79%|███████▊  | 7.82G/9.94G [01:08<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  79%|███████▉  | 7.83G/9.94G [01:08<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  79%|███████▉  | 7.85G/9.94G [01:09<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  79%|███████▉  | 7.86G/9.94G [01:09<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  79%|███████▉  | 7.87G/9.94G [01:09<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  79%|███████▉  | 7.88G/9.94G [01:09<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  79%|███████▉  | 7.89G/9.94G [01:09<00:18, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  80%|███████▉  | 7.90G/9.94G [01:09<00:17, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  80%|███████▉  | 7.91G/9.94G [01:09<00:17, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  80%|███████▉  | 7.93G/9.94G [01:09<00:17, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  80%|███████▉  | 7.94G/9.94G [01:09<00:17, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  80%|███████▉  | 7.95G/9.94G [01:09<00:17, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  80%|████████  | 7.96G/9.94G [01:10<00:17, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  80%|████████  | 7.97G/9.94G [01:10<00:17, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  80%|████████  | 7.98G/9.94G [01:10<00:16, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  80%|████████  | 8.00G/9.94G [01:10<00:16, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  81%|████████  | 8.01G/9.94G [01:10<00:16, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  81%|████████  | 8.02G/9.94G [01:10<00:16, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  81%|████████  | 8.03G/9.94G [01:10<00:16, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  81%|████████  | 8.04G/9.94G [01:10<00:16, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  81%|████████  | 8.05G/9.94G [01:10<00:16, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  81%|████████  | 8.07G/9.94G [01:10<00:16, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  81%|████████▏ | 8.08G/9.94G [01:11<00:16, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  81%|████████▏ | 8.09G/9.94G [01:11<00:15, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  81%|████████▏ | 8.10G/9.94G [01:11<00:15, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  82%|████████▏ | 8.11G/9.94G [01:11<00:15, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  82%|████████▏ | 8.12G/9.94G [01:11<00:15, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  82%|████████▏ | 8.14G/9.94G [01:11<00:15, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  82%|████████▏ | 8.15G/9.94G [01:11<00:15, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  82%|████████▏ | 8.16G/9.94G [01:11<00:15, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  82%|████████▏ | 8.17G/9.94G [01:11<00:15, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  82%|████████▏ | 8.18G/9.94G [01:11<00:15, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  82%|████████▏ | 8.19G/9.94G [01:12<00:14, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  83%|████████▎ | 8.21G/9.94G [01:12<00:14, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  83%|████████▎ | 8.22G/9.94G [01:12<00:14, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  83%|████████▎ | 8.23G/9.94G [01:12<00:14, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  83%|████████▎ | 8.24G/9.94G [01:12<00:14, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  83%|████████▎ | 8.25G/9.94G [01:12<00:14, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  83%|████████▎ | 8.26G/9.94G [01:12<00:14, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  83%|████████▎ | 8.28G/9.94G [01:12<00:14, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  83%|████████▎ | 8.29G/9.94G [01:12<00:14, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  83%|████████▎ | 8.30G/9.94G [01:12<00:14, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  84%|████████▎ | 8.31G/9.94G [01:13<00:13, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  84%|████████▎ | 8.32G/9.94G [01:13<00:13, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  84%|████████▍ | 8.33G/9.94G [01:13<00:13, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  84%|████████▍ | 8.35G/9.94G [01:13<00:13, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  84%|████████▍ | 8.36G/9.94G [01:13<00:13, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  84%|████████▍ | 8.37G/9.94G [01:13<00:13, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  84%|████████▍ | 8.38G/9.94G [01:13<00:13, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  84%|████████▍ | 8.39G/9.94G [01:13<00:13, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  85%|████████▍ | 8.40G/9.94G [01:13<00:13, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  85%|████████▍ | 8.42G/9.94G [01:13<00:13, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  85%|████████▍ | 8.43G/9.94G [01:14<00:12, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  85%|████████▍ | 8.44G/9.94G [01:14<00:12, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  85%|████████▌ | 8.45G/9.94G [01:14<00:12, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  85%|████████▌ | 8.46G/9.94G [01:14<00:12, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  85%|████████▌ | 8.47G/9.94G [01:14<00:12, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  85%|████████▌ | 8.49G/9.94G [01:14<00:12, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  85%|████████▌ | 8.50G/9.94G [01:14<00:12, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  86%|████████▌ | 8.51G/9.94G [01:14<00:12, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  86%|████████▌ | 8.52G/9.94G [01:14<00:12, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  86%|████████▌ | 8.53G/9.94G [01:14<00:12, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  86%|████████▌ | 8.54G/9.94G [01:15<00:11, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  86%|████████▌ | 8.56G/9.94G [01:15<00:11, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  86%|████████▌ | 8.57G/9.94G [01:15<00:11, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  86%|████████▋ | 8.58G/9.94G [01:15<00:11, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  86%|████████▋ | 8.59G/9.94G [01:15<00:11, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  87%|████████▋ | 8.60G/9.94G [01:15<00:11, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  87%|████████▋ | 8.62G/9.94G [01:15<00:11, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  87%|████████▋ | 8.63G/9.94G [01:15<00:11, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  87%|████████▋ | 8.64G/9.94G [01:15<00:11, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  87%|████████▋ | 8.65G/9.94G [01:15<00:11, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  87%|████████▋ | 8.66G/9.94G [01:16<00:10, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  87%|████████▋ | 8.67G/9.94G [01:16<00:10, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  87%|████████▋ | 8.69G/9.94G [01:16<00:10, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  87%|████████▋ | 8.70G/9.94G [01:16<00:10, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  88%|████████▊ | 8.71G/9.94G [01:16<00:10, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  88%|████████▊ | 8.72G/9.94G [01:16<00:10, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  88%|████████▊ | 8.73G/9.94G [01:16<00:10, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  88%|████████▊ | 8.74G/9.94G [01:16<00:10, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  88%|████████▊ | 8.76G/9.94G [01:16<00:10, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  88%|████████▊ | 8.77G/9.94G [01:16<00:10, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  88%|████████▊ | 8.78G/9.94G [01:17<00:09, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  88%|████████▊ | 8.79G/9.94G [01:17<00:09, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  89%|████████▊ | 8.80G/9.94G [01:17<00:09, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  89%|████████▊ | 8.81G/9.94G [01:17<00:09, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  89%|████████▉ | 8.83G/9.94G [01:17<00:09, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  89%|████████▉ | 8.84G/9.94G [01:17<00:09, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  89%|████████▉ | 8.85G/9.94G [01:17<00:09, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  89%|████████▉ | 8.86G/9.94G [01:17<00:09, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  89%|████████▉ | 8.87G/9.94G [01:17<00:09, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  89%|████████▉ | 8.88G/9.94G [01:17<00:09, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  89%|████████▉ | 8.90G/9.94G [01:18<00:08, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  90%|████████▉ | 8.91G/9.94G [01:18<00:08, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  90%|████████▉ | 8.92G/9.94G [01:18<00:08, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  90%|████████▉ | 8.93G/9.94G [01:18<00:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  90%|████████▉ | 8.94G/9.94G [01:18<00:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  90%|█████████ | 8.95G/9.94G [01:18<00:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  90%|█████████ | 8.97G/9.94G [01:18<00:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  90%|█████████ | 8.98G/9.94G [01:18<00:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  90%|█████████ | 8.99G/9.94G [01:18<00:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  91%|█████████ | 9.00G/9.94G [01:18<00:08, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  91%|█████████ | 9.01G/9.94G [01:19<00:08, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  91%|█████████ | 9.02G/9.94G [01:19<00:07, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  91%|█████████ | 9.03G/9.94G [01:19<00:07, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  91%|█████████ | 9.05G/9.94G [01:19<00:07, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  91%|█████████ | 9.06G/9.94G [01:19<00:07, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  91%|█████████ | 9.07G/9.94G [01:19<00:07, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  91%|█████████▏| 9.08G/9.94G [01:19<00:07, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  91%|█████████▏| 9.09G/9.94G [01:19<00:07, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  92%|█████████▏| 9.10G/9.94G [01:19<00:07, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  92%|█████████▏| 9.12G/9.94G [01:19<00:07, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  92%|█████████▏| 9.13G/9.94G [01:20<00:06, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  92%|█████████▏| 9.14G/9.94G [01:20<00:06, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  92%|█████████▏| 9.15G/9.94G [01:20<00:06, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  92%|█████████▏| 9.16G/9.94G [01:20<00:06, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  92%|█████████▏| 9.17G/9.94G [01:20<00:06, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  92%|█████████▏| 9.19G/9.94G [01:20<00:06, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  93%|█████████▎| 9.20G/9.94G [01:20<00:06, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  93%|█████████▎| 9.21G/9.94G [01:20<00:06, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  93%|█████████▎| 9.22G/9.94G [01:20<00:06, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  93%|█████████▎| 9.23G/9.94G [01:20<00:06, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  93%|█████████▎| 9.24G/9.94G [01:21<00:05, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  93%|█████████▎| 9.26G/9.94G [01:21<00:05, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  93%|█████████▎| 9.27G/9.94G [01:21<00:05, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  93%|█████████▎| 9.28G/9.94G [01:21<00:05, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  93%|█████████▎| 9.29G/9.94G [01:21<00:05, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  94%|█████████▎| 9.30G/9.94G [01:21<00:05, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  94%|█████████▎| 9.32G/9.94G [01:21<00:05, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  94%|█████████▍| 9.33G/9.94G [01:21<00:05, 120MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  94%|█████████▍| 9.34G/9.94G [01:21<00:05, 121MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  94%|█████████▍| 9.35G/9.94G [01:21<00:05, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  94%|█████████▍| 9.36G/9.94G [01:22<00:05, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  94%|█████████▍| 9.37G/9.94G [01:22<00:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  94%|█████████▍| 9.39G/9.94G [01:22<00:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  95%|█████████▍| 9.40G/9.94G [01:22<00:04, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  95%|█████████▍| 9.41G/9.94G [01:22<00:04, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  95%|█████████▍| 9.42G/9.94G [01:22<00:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  95%|█████████▍| 9.43G/9.94G [01:22<00:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  95%|█████████▌| 9.44G/9.94G [01:22<00:04, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  95%|█████████▌| 9.46G/9.94G [01:22<00:04, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  95%|█████████▌| 9.47G/9.94G [01:22<00:04, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  95%|█████████▌| 9.48G/9.94G [01:23<00:04, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  95%|█████████▌| 9.49G/9.94G [01:23<00:03, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  96%|█████████▌| 9.50G/9.94G [01:23<00:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  96%|█████████▌| 9.51G/9.94G [01:23<00:03, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  96%|█████████▌| 9.52G/9.94G [01:23<00:03, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  96%|█████████▌| 9.54G/9.94G [01:23<00:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  96%|█████████▌| 9.55G/9.94G [01:23<00:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  96%|█████████▌| 9.56G/9.94G [01:23<00:03, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  96%|█████████▋| 9.57G/9.94G [01:23<00:03, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  96%|█████████▋| 9.58G/9.94G [01:23<00:03, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  97%|█████████▋| 9.59G/9.94G [01:24<00:03, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  97%|█████████▋| 9.61G/9.94G [01:24<00:02, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  97%|█████████▋| 9.62G/9.94G [01:24<00:02, 122MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  97%|█████████▋| 9.63G/9.94G [01:24<00:02, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  97%|█████████▋| 9.64G/9.94G [01:24<00:02, 124MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  97%|█████████▋| 9.65G/9.94G [01:24<00:02, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  97%|█████████▋| 9.66G/9.94G [01:24<00:02, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  97%|█████████▋| 9.68G/9.94G [01:24<00:02, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  97%|█████████▋| 9.69G/9.94G [01:24<00:02, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  98%|█████████▊| 9.70G/9.94G [01:24<00:02, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  98%|█████████▊| 9.71G/9.94G [01:25<00:01, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  98%|█████████▊| 9.72G/9.94G [01:25<00:01, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  98%|█████████▊| 9.73G/9.94G [01:25<00:01, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  98%|█████████▊| 9.75G/9.94G [01:25<00:01, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  98%|█████████▊| 9.76G/9.94G [01:25<00:01, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  98%|█████████▊| 9.77G/9.94G [01:25<00:01, 125MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  98%|█████████▊| 9.78G/9.94G [01:25<00:01, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  99%|█████████▊| 9.79G/9.94G [01:25<00:01, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  99%|█████████▊| 9.80G/9.94G [01:25<00:01, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  99%|█████████▉| 9.82G/9.94G [01:25<00:01, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  99%|█████████▉| 9.83G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  99%|█████████▉| 9.84G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  99%|█████████▉| 9.85G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  99%|█████████▉| 9.86G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  99%|█████████▉| 9.87G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin:  99%|█████████▉| 9.89G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin: 100%|█████████▉| 9.90G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin: 100%|█████████▉| 9.91G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin: 100%|█████████▉| 9.92G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin: 100%|█████████▉| 9.93G/9.94G [01:26<00:00, 126MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015Downloading pytorch_model.bin: 100%|██████████| 9.94G/9.94G [01:27<00:00, 123MB/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:20:04.076: I smdistributed/modelparallel/torch/model.py:153] [3] Bit16_Module initialized, using dtype torch.float16\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:20:05.292: I smdistributed/modelparallel/torch/model.py:153] [0] Bit16_Module initialized, using dtype torch.float16\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:20:05.321: I smdistributed/modelparallel/torch/model.py:153] [1] Bit16_Module initialized, using dtype torch.float16\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:20:06.334: I smdistributed/modelparallel/torch/model.py:153] [2] Bit16_Module initialized, using dtype torch.float16\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:20:21.501: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:20:27.615: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:20:33.335: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:20:39.004: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:20:44.657: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:20:50.306: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:20:55.951: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:01.643: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:07.346: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:13.000: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:18.766: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:24.441: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:30.112: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:35.741: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:41.382: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:47.006: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:52.617: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:21:58.232: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:03.886: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:09.716: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:15.497: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:21.232: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:27.009: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:32.789: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:38.547: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:44.327: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:50.070: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:22:55.883: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:01.630: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:07.389: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:13.133: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:18.903: W smdistributed/modelparallel/torch/nn/transformer.py:214] attention_in_fp32 is only supported in SMP's regular attention implementation, so disabling flash_attention. You may consider training with flash_attention with bfloat16 training which does not require attention_in_fp32 for numerical stability and might result in better performance.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:32.391: W smdistributed/modelparallel/torch/module_manager.py:975] pack_args_as_tuple argument is deprecated, and will be removed in a future version of smp. This argument is a no-op and is not required.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:23:32.543: W smdistributed/modelparallel/torch/module_manager.py:975] pack_args_as_tuple argument is deprecated, and will be removed in a future version of smp. This argument is a no-op and is not required.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:23:33.848: W smdistributed/modelparallel/torch/module_manager.py:975] pack_args_as_tuple argument is deprecated, and will be removed in a future version of smp. This argument is a no-op and is not required.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:23:38.010: W smdistributed/modelparallel/torch/module_manager.py:975] pack_args_as_tuple argument is deprecated, and will be removed in a future version of smp. This argument is a no-op and is not required.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:38,019] [INFO] [stage3.py:661:__init__] Reduce bucket size 500000000\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:38,019] [INFO] [stage3.py:662:__init__] Allgather bucket size 50000000\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:Creating extension directory /root/.cache/torch_extensions/py310_cu118/utils...\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/utils/build.ninja...\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:Building extension module utils...\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /opt/conda/lib/python3.10/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2/2] c++ flatten_unflatten.o -shared -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:Loading extension module utils...\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:Time to load utils op: 14.18284296989441 seconds\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:23:52,906] [INFO] [partition_parameters.py:824:_init_zero2d_config] partition parameters context: local shard True, shard_size \"4\", hierarchy all-gather False\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:rank 2 enable local shard at partition parameters Init\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 2\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:Loading extension module utils...\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:Time to load utils op: 14.222181797027588 seconds\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:23:52,947] [INFO] [partition_parameters.py:824:_init_zero2d_config] partition parameters context: local shard True, shard_size \"4\", hierarchy all-gather False\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:rank 1 enable local shard at partition parameters Init\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:Loading extension module utils...\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:Loading extension module utils...\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:Time to load utils op: 14.22524356842041 seconds\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:Time to load utils op: 14.226637125015259 seconds\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:23:52,954] [INFO] [partition_parameters.py:824:_init_zero2d_config] partition parameters context: local shard True, shard_size \"4\", hierarchy all-gather False\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:rank 3 enable local shard at partition parameters Init\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:52,954] [INFO] [partition_parameters.py:824:_init_zero2d_config] partition parameters context: local shard True, shard_size \"4\", hierarchy all-gather False\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:rank 0 enable local shard at partition parameters Init\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:shard size 4\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:mp size 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:16 to store for rank: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:rank 0, model_parallel_rank 0, shard group 0/4\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:rank 0 replicate group 0/1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:created shard groups and replicate groups based on shard size 4 and mp size 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:16 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:16 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:rank 2, model_parallel_rank 0, shard group 2/4\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:rank 2 replicate group 0/1\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:16 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:rank 1, model_parallel_rank 0, shard group 1/4\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:rank 1 replicate group 0/1\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:rank 3, model_parallel_rank 0, shard group 3/4\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:rank 3 replicate group 0/1\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:16 with 4 nodes.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:53,350] [INFO] [stage3.py:1036:_zero2d_setups] rank 0, local shard True\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:53,351] [INFO] [stage3.py:1045:_zero2d_config_shard_groups] rank 0 enable local shard at DS stage3 optimizer\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:53,351] [INFO] [stage3.py:1051:_zero2d_config_shard_groups] ds_param_shard_group <torch.distributed.distributed_c10d.ProcessGroup object at 0x7fa95c12d9f0>, ds_param_repli_group None ds_param_shard_size 4 ds_param_repli_size 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:23:53,364] [INFO] [stage3.py:1036:_zero2d_setups] rank 1, local shard True \u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:23:53,364] [INFO] [stage3.py:1036:_zero2d_setups] rank 3, local shard True\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:23:53,365] [INFO] [stage3.py:1045:_zero2d_config_shard_groups] rank 1 enable local shard at DS stage3 optimizer\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:23:53,365] [INFO] [stage3.py:1051:_zero2d_config_shard_groups] ds_param_shard_group <torch.distributed.distributed_c10d.ProcessGroup object at 0x7f76f5d47870>, ds_param_repli_group None ds_param_shard_size 4 ds_param_repli_size 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:23:53,365] [INFO] [stage3.py:1045:_zero2d_config_shard_groups] rank 3 enable local shard at DS stage3 optimizer\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:23:53,365] [INFO] [stage3.py:1051:_zero2d_config_shard_groups] ds_param_shard_group <torch.distributed.distributed_c10d.ProcessGroup object at 0x7f29c56319b0>, ds_param_repli_group None ds_param_shard_size 4 ds_param_repli_size 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:23:53,372] [INFO] [stage3.py:1036:_zero2d_setups] rank 2, local shard True\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:23:53,373] [INFO] [stage3.py:1045:_zero2d_config_shard_groups] rank 2 enable local shard at DS stage3 optimizer\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:23:53,373] [INFO] [stage3.py:1051:_zero2d_config_shard_groups] ds_param_shard_group <torch.distributed.distributed_c10d.ProcessGroup object at 0x7f1a9327f8b0>, ds_param_repli_group None ds_param_shard_size 4 ds_param_repli_size 1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:54,574] [INFO] [stage3.py:875:__init__] optimizer state initialized\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:54,576] [INFO] [stage3.py:913:__init__] optimizer state initialized\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 1 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2 [2] -1/-1/-1->3->2 [3] -1/-1/-1->3->2\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Channel 00/04 :    0   1   2   3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Channel 01/04 :    0   1   2   3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Channel 02/04 :    0   1   2   3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Channel 03/04 :    0   1   2   3\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Channel 00 : 2[1d0] -> 3[1e0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Channel 00 : 3[1e0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Channel 00 : 0[1b0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Channel 00 : 1[1c0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Channel 01 : 2[1d0] -> 3[1e0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Channel 01 : 3[1e0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Channel 01 : 0[1b0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Channel 02 : 2[1d0] -> 3[1e0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Channel 01 : 1[1c0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Channel 02 : 3[1e0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Channel 02 : 0[1b0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Channel 02 : 1[1c0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Channel 03 : 2[1d0] -> 3[1e0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Channel 03 : 3[1e0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Channel 03 : 0[1b0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Channel 03 : 1[1c0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Channel 00 : 3[1e0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Channel 01 : 3[1e0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Channel 02 : 3[1e0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO P2P is disabled between connected GPUs 3 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Could not enable P2P between dev 3(=1e0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Channel 03 : 3[1e0] -> 2[1d0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO P2P is disabled between connected GPUs 0 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Could not enable P2P between dev 0(=1b0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 2. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 2(=1d0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 3. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 3(=1e0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Channel 00 : 1[1c0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Channel 00 : 2[1d0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Channel 01 : 2[1d0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Channel 01 : 1[1c0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Channel 02 : 2[1d0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO P2P is disabled between connected GPUs 2 and 1. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Could not enable P2P between dev 2(=1d0) and dev 1(=1c0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Channel 02 : 1[1c0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO P2P is disabled between connected GPUs 1 and 0. You can repress this message with NCCL_IGNORE_DISABLED_P2P=1.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Could not enable P2P between dev 1(=1c0) and dev 0(=1b0)\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Channel 03 : 2[1d0] -> 1[1c0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Channel 03 : 1[1c0] -> 0[1b0] via SHM/direct/direct\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 512 | 512\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:329 [0] NCCL INFO comm 0x56433d5df500 rank 0 nranks 4 cudaDev 0 busId 1b0 commId 0xa063461e75349cf7 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:331 [3] NCCL INFO comm 0x557893ac51f0 rank 3 nranks 4 cudaDev 3 busId 1e0 commId 0xa063461e75349cf7 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:332 [2] NCCL INFO comm 0x56352e22ed80 rank 2 nranks 4 cudaDev 2 busId 1d0 commId 0xa063461e75349cf7 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:330 [1] NCCL INFO comm 0x562a0f52b740 rank 1 nranks 4 cudaDev 1 busId 1c0 commId 0xa063461e75349cf7 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  0%|          | 0/65 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  0%|          | 0/65 [00:00<?, ?it/s][1,mpirank:0,algo-1]<stderr>:#015  0%|          | 0/65 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  0%|          | 0/65 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.030: I smdistributed/modelparallel/torch/worker.py:300] Tracing on GPU. If the model parameters do not fit in a single GPU, you can set trace_device to `cpu`.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.057: I smdistributed/modelparallel/torch/model.py:665] Partition assignments:\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.057: I smdistributed/modelparallel/torch/model.py:674] main: 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.069: I smdistributed/modelparallel/torch/model.py:599] Number of parameters on partition 0 are 420. 420 require grads\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.076: I smdistributed/modelparallel/torch/model.py:725] Finished partitioning the model\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:542: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:542: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:542: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:542: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.141: I smdistributed/modelparallel/torch/model.py:734] Broadcasted parameters and buffers for partition 0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed._all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.448: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.487: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.533: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.580: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.626: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.673: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.719: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.765: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.811: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.858: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.903: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.950: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:56.997: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.043: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.088: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.135: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.181: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.227: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.273: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.319: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.365: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.412: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.458: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.504: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.551: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.598: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.644: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.690: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.736: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.782: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.829: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:23:57.875: W smdistributed/modelparallel/torch/nn/transformer.py:2098] Using fused softmax kernel in attention computation, which ignores the attention mask input. To use an attention mask that masks at least one token, disable the fused softmax kernel by passing fused_softmax=False into the smp.tensor_parallelism or smp.set_tensor_parallelism calls.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:NCCL version 2.16.2+cuda11.8\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:NCCL version 2.16.2+cuda11.8\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:NCCL version 2.16.2+cuda11.8\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Using network Socket\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 00/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 01/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 02/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 00/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 01/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 02/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 03/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 04/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 05/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 06/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 07/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 08/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 03/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 04/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 05/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 06/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 07/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 08/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 09/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 10/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 11/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 12/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 13/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 14/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 15/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 16/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 17/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 18/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 19/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 09/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 10/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 11/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 12/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 13/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 14/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 15/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 16/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 17/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 18/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 19/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 20/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 21/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 22/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 23/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 24/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 25/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 26/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 27/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 28/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 29/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 20/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 21/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 22/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 23/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 24/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 25/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 26/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 27/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 28/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 29/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 30/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Channel 31/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 30/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Channel 31/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 00/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 01/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 02/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 03/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 04/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 05/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 06/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 07/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 08/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 09/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 10/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 11/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 12/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 13/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 14/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 15/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 00/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 16/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 17/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 18/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 19/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 01/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 02/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 03/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 20/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 21/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 22/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 04/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 05/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 06/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 07/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 08/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 09/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 10/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 11/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 12/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 13/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 14/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 15/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 16/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 17/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 18/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 19/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 23/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 24/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 25/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 26/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 27/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 28/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 29/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 30/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Channel 31/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 20/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 21/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 22/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 23/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 24/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 25/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 26/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 27/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 28/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 29/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 30/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Channel 31/32 :    0\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO P2P Chunksize set to 131072\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Connected all rings\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO Connected all trees\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:algo-1:98:370 [3] NCCL INFO comm 0x7f27f80fdf30 rank 0 nranks 1 cudaDev 3 busId 1e0 commId 0x7a3d67aa9ea73e85 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:algo-1:97:372 [2] NCCL INFO comm 0x7f18680fef30 rank 0 nranks 1 cudaDev 2 busId 1d0 commId 0x778685a5911b7208 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:algo-1:95:376 [1] NCCL INFO comm 0x7f75340fdff0 rank 0 nranks 1 cudaDev 1 busId 1c0 commId 0x35a049b677133db - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:algo-1:96:375 [0] NCCL INFO comm 0x7fa7d80fef30 rank 0 nranks 1 cudaDev 0 busId 1b0 commId 0xff70fe6d97ce0910 - Init COMPLETE\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:3015: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead.\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:3015: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead.\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:3015: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead.\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:3015: UserWarning: torch.distributed._reduce_scatter_base is a private function and will be deprecated. Please use torch.distributed.reduce_scatter_tensor instead.\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:  warnings.warn(\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  2%|▏         | 1/65 [00:06<06:47,  6.37s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  2%|▏         | 1/65 [00:06<06:47,  6.37s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:02,418] [WARNING] [stage3.py:3069:step] 1 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  2%|▏         | 1/65 [00:06<06:49,  6.39s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  2%|▏         | 1/65 [00:06<06:49,  6.39s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  3%|▎         | 2/65 [00:10<05:30,  5.24s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  3%|▎         | 2/65 [00:10<05:31,  5.27s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  3%|▎         | 2/65 [00:10<05:32,  5.27s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:06,912] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  3%|▎         | 2/65 [00:10<05:32,  5.28s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  5%|▍         | 3/65 [00:15<05:03,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  5%|▍         | 3/65 [00:15<05:04,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:11,413] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  5%|▍         | 3/65 [00:15<05:05,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  5%|▍         | 3/65 [00:15<05:05,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  6%|▌         | 4/65 [00:19<04:49,  4.74s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  6%|▌         | 4/65 [00:19<04:49,  4.74s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:15,878] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  6%|▌         | 4/65 [00:19<04:49,  4.74s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  6%|▌         | 4/65 [00:19<04:49,  4.74s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  8%|▊         | 5/65 [00:24<04:38,  4.64s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  8%|▊         | 5/65 [00:24<04:38,  4.64s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  8%|▊         | 5/65 [00:24<04:39,  4.65s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:20,369] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  8%|▊         | 5/65 [00:24<04:39,  4.65s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  9%|▉         | 6/65 [00:28<04:30,  4.59s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  9%|▉         | 6/65 [00:28<04:31,  4.60s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:24,846] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  9%|▉         | 6/65 [00:28<04:30,  4.59s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  9%|▉         | 6/65 [00:28<04:31,  4.59s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 11%|█         | 7/65 [00:33<04:24,  4.55s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 11%|█         | 7/65 [00:33<04:23,  4.55s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 11%|█         | 7/65 [00:33<04:23,  4.55s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:29,313] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 11%|█         | 7/65 [00:33<04:23,  4.55s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 12%|█▏        | 8/65 [00:37<04:18,  4.54s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 12%|█▏        | 8/65 [00:37<04:18,  4.53s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 12%|█▏        | 8/65 [00:37<04:18,  4.54s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:33,834] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 12%|█▏        | 8/65 [00:37<04:18,  4.54s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 14%|█▍        | 9/65 [00:42<04:12,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 14%|█▍        | 9/65 [00:42<04:13,  4.52s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:38,273] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 14%|█▍        | 9/65 [00:42<04:12,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 14%|█▍        | 9/65 [00:42<04:13,  4.53s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 15%|█▌        | 10/65 [00:46<04:08,  4.52s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 15%|█▌        | 10/65 [00:46<04:08,  4.52s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:42,840] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 15%|█▌        | 10/65 [00:46<04:08,  4.53s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 15%|█▌        | 10/65 [00:46<04:08,  4.53s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 17%|█▋        | 11/65 [00:51<04:03,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 17%|█▋        | 11/65 [00:51<04:03,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 17%|█▋        | 11/65 [00:51<04:03,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:47,314] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 17%|█▋        | 11/65 [00:51<04:03,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 18%|█▊        | 12/65 [00:55<03:57,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 18%|█▊        | 12/65 [00:55<03:58,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 18%|█▊        | 12/65 [00:55<03:57,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:51,744] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 18%|█▊        | 12/65 [00:55<03:57,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 20%|██        | 13/65 [01:00<03:52,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 20%|██        | 13/65 [01:00<03:53,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:24:56,213] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 20%|██        | 13/65 [01:00<03:53,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 20%|██        | 13/65 [01:00<03:53,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 22%|██▏       | 14/65 [01:04<03:48,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 22%|██▏       | 14/65 [01:04<03:48,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 22%|██▏       | 14/65 [01:04<03:48,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:00,688] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 22%|██▏       | 14/65 [01:04<03:48,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 23%|██▎       | 15/65 [01:09<03:43,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 23%|██▎       | 15/65 [01:09<03:43,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:05,135] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 23%|██▎       | 15/65 [01:09<03:43,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 23%|██▎       | 15/65 [01:09<03:43,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 25%|██▍       | 16/65 [01:13<03:38,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 25%|██▍       | 16/65 [01:13<03:39,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:09,615] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 25%|██▍       | 16/65 [01:13<03:39,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 25%|██▍       | 16/65 [01:13<03:39,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 26%|██▌       | 17/65 [01:18<03:34,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 26%|██▌       | 17/65 [01:18<03:35,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:14,093] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 26%|██▌       | 17/65 [01:18<03:34,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 26%|██▌       | 17/65 [01:18<03:35,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 28%|██▊       | 18/65 [01:22<03:30,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 28%|██▊       | 18/65 [01:22<03:30,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 28%|██▊       | 18/65 [01:22<03:30,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:18,581] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 28%|██▊       | 18/65 [01:22<03:30,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 29%|██▉       | 19/65 [01:26<03:25,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 29%|██▉       | 19/65 [01:27<03:25,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 29%|██▉       | 19/65 [01:27<03:25,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:23,049] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 29%|██▉       | 19/65 [01:27<03:25,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 31%|███       | 20/65 [01:31<03:21,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 31%|███       | 20/65 [01:31<03:21,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 31%|███       | 20/65 [01:31<03:21,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:27,549] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 31%|███       | 20/65 [01:31<03:21,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 32%|███▏      | 21/65 [01:35<03:16,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:31,984] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 32%|███▏      | 21/65 [01:35<03:16,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 32%|███▏      | 21/65 [01:35<03:17,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 32%|███▏      | 21/65 [01:36<03:17,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 34%|███▍      | 22/65 [01:40<03:13,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 34%|███▍      | 22/65 [01:40<03:13,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:36,541] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 34%|███▍      | 22/65 [01:40<03:13,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 34%|███▍      | 22/65 [01:40<03:13,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 35%|███▌      | 23/65 [01:44<03:08,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 35%|███▌      | 23/65 [01:44<03:08,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:40,991] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 35%|███▌      | 23/65 [01:44<03:08,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 35%|███▌      | 23/65 [01:44<03:08,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 37%|███▋      | 24/65 [01:49<03:03,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 37%|███▋      | 24/65 [01:49<03:04,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 37%|███▋      | 24/65 [01:49<03:04,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:45,507] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 37%|███▋      | 24/65 [01:49<03:04,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 38%|███▊      | 25/65 [01:53<02:58,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 38%|███▊      | 25/65 [01:53<02:59,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 38%|███▊      | 25/65 [01:53<02:58,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:49,956] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 38%|███▊      | 25/65 [01:53<02:59,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 40%|████      | 26/65 [01:58<02:54,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 40%|████      | 26/65 [01:58<02:54,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:54,406] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 40%|████      | 26/65 [01:58<02:54,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 40%|████      | 26/65 [01:58<02:54,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 42%|████▏     | 27/65 [02:02<02:50,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 42%|████▏     | 27/65 [02:02<02:49,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:25:58,875] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 42%|████▏     | 27/65 [02:02<02:49,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 42%|████▏     | 27/65 [02:02<02:49,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 43%|████▎     | 28/65 [02:07<02:45,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 43%|████▎     | 28/65 [02:07<02:45,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 43%|████▎     | 28/65 [02:07<02:45,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:03,348] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 43%|████▎     | 28/65 [02:07<02:45,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 45%|████▍     | 29/65 [02:11<02:40,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 45%|████▍     | 29/65 [02:11<02:40,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:07,804] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 45%|████▍     | 29/65 [02:11<02:40,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 45%|████▍     | 29/65 [02:11<02:40,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 46%|████▌     | 30/65 [02:16<02:36,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:12,257] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 46%|████▌     | 30/65 [02:16<02:36,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 46%|████▌     | 30/65 [02:16<02:36,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 46%|████▌     | 30/65 [02:16<02:36,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 48%|████▊     | 31/65 [02:20<02:31,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 48%|████▊     | 31/65 [02:20<02:31,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 48%|████▊     | 31/65 [02:20<02:31,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:16,725] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 48%|████▊     | 31/65 [02:20<02:31,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 49%|████▉     | 32/65 [02:25<02:27,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 49%|████▉     | 32/65 [02:25<02:27,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 49%|████▉     | 32/65 [02:25<02:27,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:21,188] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 49%|████▉     | 32/65 [02:25<02:27,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 51%|█████     | 33/65 [02:29<02:22,  4.45s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 51%|█████     | 33/65 [02:29<02:22,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 51%|█████     | 33/65 [02:29<02:22,  4.45s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:25,627] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 51%|█████     | 33/65 [02:29<02:22,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 52%|█████▏    | 34/65 [02:34<02:18,  4.45s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 52%|█████▏    | 34/65 [02:34<02:18,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:30,085] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 52%|█████▏    | 34/65 [02:34<02:18,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 52%|█████▏    | 34/65 [02:34<02:18,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 54%|█████▍    | 35/65 [02:38<02:13,  4.45s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 54%|█████▍    | 35/65 [02:38<02:13,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:34,575] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 54%|█████▍    | 35/65 [02:38<02:13,  4.47s/it][1,mpirank:0,algo-1]<stderr>:#015 54%|█████▍    | 35/65 [02:38<02:14,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 55%|█████▌    | 36/65 [02:42<02:09,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 55%|█████▌    | 36/65 [02:42<02:09,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:39,038] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 55%|█████▌    | 36/65 [02:43<02:09,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 55%|█████▌    | 36/65 [02:43<02:09,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 57%|█████▋    | 37/65 [02:47<02:05,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 57%|█████▋    | 37/65 [02:47<02:05,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:43,519] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 57%|█████▋    | 37/65 [02:47<02:05,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 57%|█████▋    | 37/65 [02:47<02:05,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 58%|█████▊    | 38/65 [02:52<02:01,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 58%|█████▊    | 38/65 [02:52<02:01,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:48,093] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 58%|█████▊    | 38/65 [02:52<02:01,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 58%|█████▊    | 38/65 [02:52<02:01,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 60%|██████    | 39/65 [02:56<01:56,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 60%|██████    | 39/65 [02:56<01:57,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 60%|██████    | 39/65 [02:56<01:56,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:52,619] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 60%|██████    | 39/65 [02:56<01:57,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 62%|██████▏   | 40/65 [03:01<01:52,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 62%|██████▏   | 40/65 [03:01<01:52,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:26:57,065] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 62%|██████▏   | 40/65 [03:01<01:52,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 62%|██████▏   | 40/65 [03:01<01:52,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:01,513] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 63%|██████▎   | 41/65 [03:05<01:47,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 63%|██████▎   | 41/65 [03:05<01:47,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 63%|██████▎   | 41/65 [03:05<01:47,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 63%|██████▎   | 41/65 [03:05<01:47,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 65%|██████▍   | 42/65 [03:09<01:43,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 65%|██████▍   | 42/65 [03:10<01:43,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:06,044] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 65%|██████▍   | 42/65 [03:10<01:43,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 65%|██████▍   | 42/65 [03:10<01:43,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 66%|██████▌   | 43/65 [03:14<01:38,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 66%|██████▌   | 43/65 [03:14<01:38,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:10,540] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 66%|██████▌   | 43/65 [03:14<01:38,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 66%|██████▌   | 43/65 [03:14<01:39,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 68%|██████▊   | 44/65 [03:18<01:34,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 68%|██████▊   | 44/65 [03:18<01:34,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:15,006] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 68%|██████▊   | 44/65 [03:18<01:34,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 68%|██████▊   | 44/65 [03:19<01:34,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 69%|██████▉   | 45/65 [03:23<01:29,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 69%|██████▉   | 45/65 [03:23<01:29,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:19,510] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 69%|██████▉   | 45/65 [03:23<01:29,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 69%|██████▉   | 45/65 [03:23<01:29,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 71%|███████   | 46/65 [03:27<01:25,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 71%|███████   | 46/65 [03:27<01:25,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 71%|███████   | 46/65 [03:27<01:25,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:23,991] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 71%|███████   | 46/65 [03:27<01:25,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 72%|███████▏  | 47/65 [03:32<01:20,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:28,431] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 72%|███████▏  | 47/65 [03:32<01:20,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 72%|███████▏  | 47/65 [03:32<01:20,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 72%|███████▏  | 47/65 [03:32<01:20,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 74%|███████▍  | 48/65 [03:36<01:16,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 74%|███████▍  | 48/65 [03:36<01:16,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 74%|███████▍  | 48/65 [03:36<01:16,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:32,956] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 74%|███████▍  | 48/65 [03:36<01:16,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 75%|███████▌  | 49/65 [03:41<01:11,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 75%|███████▌  | 49/65 [03:41<01:11,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 75%|███████▌  | 49/65 [03:41<01:11,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:37,422] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 75%|███████▌  | 49/65 [03:41<01:11,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 77%|███████▋  | 50/65 [03:45<01:07,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 77%|███████▋  | 50/65 [03:45<01:07,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 77%|███████▋  | 50/65 [03:45<01:07,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:41,893] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 77%|███████▋  | 50/65 [03:45<01:07,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 78%|███████▊  | 51/65 [03:50<01:02,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 78%|███████▊  | 51/65 [03:50<01:02,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:46,375] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 78%|███████▊  | 51/65 [03:50<01:02,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 78%|███████▊  | 51/65 [03:50<01:02,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 80%|████████  | 52/65 [03:54<00:58,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 80%|████████  | 52/65 [03:54<00:58,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 80%|████████  | 52/65 [03:54<00:58,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:50,832] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 80%|████████  | 52/65 [03:54<00:58,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 82%|████████▏ | 53/65 [03:59<00:53,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 82%|████████▏ | 53/65 [03:59<00:53,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:55,306] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 82%|████████▏ | 53/65 [03:59<00:53,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 82%|████████▏ | 53/65 [03:59<00:53,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 83%|████████▎ | 54/65 [04:03<00:49,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 83%|████████▎ | 54/65 [04:03<00:49,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:27:59,783] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 83%|████████▎ | 54/65 [04:03<00:49,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 83%|████████▎ | 54/65 [04:03<00:49,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 85%|████████▍ | 55/65 [04:08<00:44,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 85%|████████▍ | 55/65 [04:08<00:44,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:04,244] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 85%|████████▍ | 55/65 [04:08<00:44,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 85%|████████▍ | 55/65 [04:08<00:44,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 86%|████████▌ | 56/65 [04:12<00:40,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 86%|████████▌ | 56/65 [04:12<00:40,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 86%|████████▌ | 56/65 [04:12<00:40,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:08,724] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 86%|████████▌ | 56/65 [04:12<00:40,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 88%|████████▊ | 57/65 [04:17<00:35,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 88%|████████▊ | 57/65 [04:17<00:35,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:13,205] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 88%|████████▊ | 57/65 [04:17<00:35,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 88%|████████▊ | 57/65 [04:17<00:35,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 89%|████████▉ | 58/65 [04:21<00:31,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 89%|████████▉ | 58/65 [04:21<00:31,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:17,694] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 89%|████████▉ | 58/65 [04:21<00:31,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 89%|████████▉ | 58/65 [04:21<00:31,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 91%|█████████ | 59/65 [04:26<00:26,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 91%|█████████ | 59/65 [04:26<00:26,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 91%|█████████ | 59/65 [04:26<00:26,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:22,161] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 91%|█████████ | 59/65 [04:26<00:26,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 92%|█████████▏| 60/65 [04:30<00:22,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 92%|█████████▏| 60/65 [04:30<00:22,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 92%|█████████▏| 60/65 [04:30<00:22,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:26,669] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 92%|█████████▏| 60/65 [04:30<00:22,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 94%|█████████▍| 61/65 [04:35<00:17,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 94%|█████████▍| 61/65 [04:35<00:17,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:31,146] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 94%|█████████▍| 61/65 [04:35<00:17,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 94%|█████████▍| 61/65 [04:35<00:17,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 95%|█████████▌| 62/65 [04:39<00:13,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 95%|█████████▌| 62/65 [04:39<00:13,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:35,599] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 95%|█████████▌| 62/65 [04:39<00:13,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 95%|█████████▌| 62/65 [04:39<00:13,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 97%|█████████▋| 63/65 [04:43<00:08,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 97%|█████████▋| 63/65 [04:44<00:08,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:40,033] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 97%|█████████▋| 63/65 [04:44<00:08,  4.46s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 97%|█████████▋| 63/65 [04:44<00:08,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 98%|█████████▊| 64/65 [04:48<00:04,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 98%|█████████▊| 64/65 [04:48<00:04,  4.47s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 98%|█████████▊| 64/65 [04:48<00:04,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:44,564] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 98%|█████████▊| 64/65 [04:48<00:04,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015100%|██████████| 65/65 [04:52<00:00,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015100%|██████████| 65/65 [04:52<00:00,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015100%|██████████| 65/65 [04:52<00:00,  4.48s/it][1,mpirank:3,algo-1]<stderr>:#015100%|██████████| 65/65 [04:52<00:00,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015100%|██████████| 65/65 [04:53<00:00,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015100%|██████████| 65/65 [04:53<00:00,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:28:49,033] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 65/65 [04:53<00:00,  4.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 65/65 [04:53<00:00,  4.51s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  0%|          | 0/27 [00:00<?, ?it/s][1,mpirank:1,algo-1]<stderr>:#015  0%|          | 0/27 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  0%|          | 0/27 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:epoch=0: train_ppl=tensor(3724.6062, device='cuda:0') train_epoch_loss=tensor(8.2227, device='cuda:0')\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  0%|          | 0/27 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  4%|▎         | 1/27 [00:01<00:38,  1.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  4%|▎         | 1/27 [00:01<00:38,  1.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  4%|▎         | 1/27 [00:01<00:38,  1.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  4%|▎         | 1/27 [00:01<00:38,  1.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  7%|▋         | 2/27 [00:02<00:33,  1.32s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  7%|▋         | 2/27 [00:02<00:33,  1.32s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  7%|▋         | 2/27 [00:02<00:33,  1.32s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  7%|▋         | 2/27 [00:02<00:32,  1.31s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 11%|█         | 3/27 [00:03<00:30,  1.26s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 11%|█         | 3/27 [00:03<00:30,  1.26s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 11%|█         | 3/27 [00:03<00:30,  1.26s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 11%|█         | 3/27 [00:03<00:30,  1.26s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 15%|█▍        | 4/27 [00:05<00:28,  1.23s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 15%|█▍        | 4/27 [00:05<00:28,  1.23s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 15%|█▍        | 4/27 [00:05<00:28,  1.23s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 15%|█▍        | 4/27 [00:05<00:28,  1.22s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 19%|█▊        | 5/27 [00:06<00:26,  1.22s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 19%|█▊        | 5/27 [00:06<00:26,  1.22s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 19%|█▊        | 5/27 [00:06<00:26,  1.21s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 19%|█▊        | 5/27 [00:06<00:26,  1.21s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 22%|██▏       | 6/27 [00:07<00:25,  1.20s/it][1,mpirank:2,algo-1]<stderr>:#015 22%|██▏       | 6/27 [00:07<00:25,  1.21s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 22%|██▏       | 6/27 [00:07<00:25,  1.21s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 22%|██▏       | 6/27 [00:07<00:25,  1.21s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 26%|██▌       | 7/27 [00:08<00:23,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 26%|██▌       | 7/27 [00:08<00:23,  1.20s/it][1,mpirank:2,algo-1]<stderr>:#015 26%|██▌       | 7/27 [00:08<00:23,  1.20s/it][1,mpirank:1,algo-1]<stderr>:#015 26%|██▌       | 7/27 [00:08<00:23,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 30%|██▉       | 8/27 [00:09<00:22,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 30%|██▉       | 8/27 [00:09<00:22,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 30%|██▉       | 8/27 [00:09<00:22,  1.19s/it][1,mpirank:1,algo-1]<stderr>:#015 30%|██▉       | 8/27 [00:09<00:22,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 33%|███▎      | 9/27 [00:10<00:21,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 33%|███▎      | 9/27 [00:10<00:21,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 33%|███▎      | 9/27 [00:10<00:21,  1.19s/it][1,mpirank:1,algo-1]<stderr>:#015 33%|███▎      | 9/27 [00:10<00:21,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 37%|███▋      | 10/27 [00:12<00:20,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 37%|███▋      | 10/27 [00:12<00:20,  1.19s/it][1,mpirank:0,algo-1]<stderr>:#015 37%|███▋      | 10/27 [00:12<00:20,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 37%|███▋      | 10/27 [00:12<00:20,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 41%|████      | 11/27 [00:13<00:18,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 41%|████      | 11/27 [00:13<00:18,  1.19s/it][1,mpirank:1,algo-1]<stderr>:#015 41%|████      | 11/27 [00:13<00:18,  1.19s/it][1,mpirank:0,algo-1]<stderr>:#015 41%|████      | 11/27 [00:13<00:18,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 44%|████▍     | 12/27 [00:14<00:17,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 44%|████▍     | 12/27 [00:14<00:17,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 44%|████▍     | 12/27 [00:14<00:17,  1.18s/it][1,mpirank:1,algo-1]<stderr>:#015 44%|████▍     | 12/27 [00:14<00:17,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 48%|████▊     | 13/27 [00:15<00:16,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 48%|████▊     | 13/27 [00:15<00:16,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 48%|████▊     | 13/27 [00:15<00:16,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 48%|████▊     | 13/27 [00:15<00:16,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 52%|█████▏    | 14/27 [00:16<00:15,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 52%|█████▏    | 14/27 [00:16<00:15,  1.19s/it][1,mpirank:0,algo-1]<stderr>:#015 52%|█████▏    | 14/27 [00:16<00:15,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 52%|█████▏    | 14/27 [00:16<00:15,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 56%|█████▌    | 15/27 [00:18<00:14,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 56%|█████▌    | 15/27 [00:18<00:14,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 56%|█████▌    | 15/27 [00:18<00:14,  1.19s/it][1,mpirank:0,algo-1]<stderr>:#015 56%|█████▌    | 15/27 [00:18<00:14,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 59%|█████▉    | 16/27 [00:19<00:13,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 59%|█████▉    | 16/27 [00:19<00:13,  1.19s/it][1,mpirank:1,algo-1]<stderr>:#015 59%|█████▉    | 16/27 [00:19<00:13,  1.19s/it][1,mpirank:2,algo-1]<stderr>:#015 59%|█████▉    | 16/27 [00:19<00:13,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 63%|██████▎   | 17/27 [00:20<00:11,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 63%|██████▎   | 17/27 [00:20<00:11,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 63%|██████▎   | 17/27 [00:20<00:11,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 63%|██████▎   | 17/27 [00:20<00:11,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 67%|██████▋   | 18/27 [00:21<00:10,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 67%|██████▋   | 18/27 [00:21<00:10,  1.19s/it][1,mpirank:2,algo-1]<stderr>:#015 67%|██████▋   | 18/27 [00:21<00:10,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 67%|██████▋   | 18/27 [00:21<00:10,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 70%|███████   | 19/27 [00:22<00:09,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 70%|███████   | 19/27 [00:22<00:09,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 70%|███████   | 19/27 [00:22<00:09,  1.19s/it][1,mpirank:1,algo-1]<stderr>:#015 70%|███████   | 19/27 [00:22<00:09,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 74%|███████▍  | 20/27 [00:24<00:08,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 74%|███████▍  | 20/27 [00:24<00:08,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 74%|███████▍  | 20/27 [00:24<00:08,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 74%|███████▍  | 20/27 [00:24<00:08,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 78%|███████▊  | 21/27 [00:25<00:07,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 78%|███████▊  | 21/27 [00:25<00:07,  1.18s/it][1,mpirank:0,algo-1]<stderr>:#015 78%|███████▊  | 21/27 [00:25<00:07,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 78%|███████▊  | 21/27 [00:25<00:07,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 81%|████████▏ | 22/27 [00:26<00:05,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 81%|████████▏ | 22/27 [00:26<00:05,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 81%|████████▏ | 22/27 [00:26<00:05,  1.18s/it][1,mpirank:1,algo-1]<stderr>:#015 81%|████████▏ | 22/27 [00:26<00:05,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 85%|████████▌ | 23/27 [00:27<00:04,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 85%|████████▌ | 23/27 [00:27<00:04,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 85%|████████▌ | 23/27 [00:27<00:04,  1.18s/it][1,mpirank:0,algo-1]<stderr>:#015 85%|████████▌ | 23/27 [00:27<00:04,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 89%|████████▉ | 24/27 [00:28<00:03,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 89%|████████▉ | 24/27 [00:28<00:03,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 89%|████████▉ | 24/27 [00:28<00:03,  1.18s/it][1,mpirank:1,algo-1]<stderr>:#015 89%|████████▉ | 24/27 [00:28<00:03,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 93%|█████████▎| 25/27 [00:29<00:02,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 93%|█████████▎| 25/27 [00:29<00:02,  1.18s/it][1,mpirank:0,algo-1]<stderr>:#015 93%|█████████▎| 25/27 [00:29<00:02,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 93%|█████████▎| 25/27 [00:29<00:02,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 96%|█████████▋| 26/27 [00:31<00:01,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 96%|█████████▋| 26/27 [00:31<00:01,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 96%|█████████▋| 26/27 [00:31<00:01,  1.18s/it][1,mpirank:1,algo-1]<stderr>:#015 96%|█████████▋| 26/27 [00:31<00:01,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.18s/it][1,mpirank:2,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:epoch=0: eval_ppl=tensor(1515.2729, device='cuda:0') eval_epoch_loss=tensor(7.3234, device='cuda:0')\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  0%|          | 0/65 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  0%|          | 0/65 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  0%|          | 0/65 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  0%|          | 0/65 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  2%|▏         | 1/65 [00:04<04:47,  4.49s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  2%|▏         | 1/65 [00:04<04:47,  4.50s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  2%|▏         | 1/65 [00:04<04:50,  4.54s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:29:25,903] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  2%|▏         | 1/65 [00:04<04:51,  4.55s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  3%|▎         | 2/65 [00:09<04:59,  4.75s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  3%|▎         | 2/65 [00:09<05:01,  4.79s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  3%|▎         | 2/65 [00:09<05:02,  4.79s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:29:30,877] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  3%|▎         | 2/65 [00:09<05:02,  4.80s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  5%|▍         | 3/65 [00:14<04:53,  4.74s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  5%|▍         | 3/65 [00:14<04:54,  4.75s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  5%|▍         | 3/65 [00:14<04:55,  4.77s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:29:35,596] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  5%|▍         | 3/65 [00:14<04:55,  4.76s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:29:40,489] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  6%|▌         | 4/65 [00:19<04:53,  4.81s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  6%|▌         | 4/65 [00:19<04:55,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  6%|▌         | 4/65 [00:19<04:56,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  6%|▌         | 4/65 [00:19<04:57,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  8%|▊         | 5/65 [00:24<04:51,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:29:45,420] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  8%|▊         | 5/65 [00:24<04:51,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  8%|▊         | 5/65 [00:24<04:51,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  8%|▊         | 5/65 [00:24<04:52,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  9%|▉         | 6/65 [00:28<04:46,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  9%|▉         | 6/65 [00:28<04:47,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:29:50,335] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  9%|▉         | 6/65 [00:28<04:47,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  9%|▉         | 6/65 [00:28<04:47,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 11%|█         | 7/65 [00:33<04:44,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 11%|█         | 7/65 [00:33<04:44,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 11%|█         | 7/65 [00:34<04:45,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:29:55,363] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 11%|█         | 7/65 [00:34<04:45,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:00,381] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 12%|█▏        | 8/65 [00:39<04:42,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 12%|█▏        | 8/65 [00:39<04:43,  4.98s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 12%|█▏        | 8/65 [00:39<04:43,  4.98s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 12%|█▏        | 8/65 [00:39<04:44,  4.98s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:05,250] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 14%|█▍        | 9/65 [00:43<04:35,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 14%|█▍        | 9/65 [00:43<04:37,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 14%|█▍        | 9/65 [00:44<04:38,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 14%|█▍        | 9/65 [00:44<04:38,  4.98s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:10,032] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 15%|█▌        | 10/65 [00:48<04:28,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 15%|█▌        | 10/65 [00:48<04:28,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 15%|█▌        | 10/65 [00:48<04:28,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 15%|█▌        | 10/65 [00:48<04:28,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:14,934] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 17%|█▋        | 11/65 [00:53<04:23,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 17%|█▋        | 11/65 [00:53<04:24,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 17%|█▋        | 11/65 [00:53<04:24,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 17%|█▋        | 11/65 [00:53<04:25,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 18%|█▊        | 12/65 [00:58<04:19,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:19,867] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 18%|█▊        | 12/65 [00:58<04:19,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 18%|█▊        | 12/65 [00:58<04:19,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 18%|█▊        | 12/65 [00:58<04:20,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:24,867] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 20%|██        | 13/65 [01:03<04:16,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 20%|██        | 13/65 [01:03<04:16,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 20%|██        | 13/65 [01:03<04:17,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 20%|██        | 13/65 [01:03<04:16,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:29,826] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 22%|██▏       | 14/65 [01:08<04:11,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 22%|██▏       | 14/65 [01:08<04:11,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 22%|██▏       | 14/65 [01:08<04:12,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 22%|██▏       | 14/65 [01:08<04:12,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 23%|██▎       | 15/65 [01:13<03:59,  4.80s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 23%|██▎       | 15/65 [01:13<04:00,  4.81s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:34,406] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 23%|██▎       | 15/65 [01:13<04:01,  4.83s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 23%|██▎       | 15/65 [01:13<04:01,  4.82s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:39,250] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 25%|██▍       | 16/65 [01:17<03:56,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 25%|██▍       | 16/65 [01:17<03:57,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 25%|██▍       | 16/65 [01:17<03:57,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 25%|██▍       | 16/65 [01:18<03:57,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 26%|██▌       | 17/65 [01:23<03:56,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 26%|██▌       | 17/65 [01:23<03:56,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:44,547] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 26%|██▌       | 17/65 [01:23<03:58,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 26%|██▌       | 17/65 [01:23<03:58,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 28%|██▊       | 18/65 [01:28<03:55,  5.02s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:49,678] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 28%|██▊       | 18/65 [01:28<03:56,  5.02s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 28%|██▊       | 18/65 [01:28<03:56,  5.03s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 28%|██▊       | 18/65 [01:28<03:56,  5.04s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 29%|██▉       | 19/65 [01:33<03:49,  4.99s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:54,619] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 29%|██▉       | 19/65 [01:33<03:49,  5.00s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 29%|██▉       | 19/65 [01:33<03:50,  5.01s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 29%|██▉       | 19/65 [01:33<03:50,  5.01s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:30:59,316] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 31%|███       | 20/65 [01:37<03:40,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 31%|███       | 20/65 [01:37<03:41,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 31%|███       | 20/65 [01:38<03:40,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 31%|███       | 20/65 [01:38<03:40,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 32%|███▏      | 21/65 [01:42<03:29,  4.77s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 32%|███▏      | 21/65 [01:42<03:30,  4.78s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:03,860] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 32%|███▏      | 21/65 [01:42<03:31,  4.80s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 32%|███▏      | 21/65 [01:42<03:31,  4.81s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:08,701] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 34%|███▍      | 22/65 [01:47<03:26,  4.81s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 34%|███▍      | 22/65 [01:47<03:27,  4.82s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 34%|███▍      | 22/65 [01:47<03:27,  4.82s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 34%|███▍      | 22/65 [01:47<03:27,  4.83s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 35%|███▌      | 23/65 [01:52<03:21,  4.80s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:13,494] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 35%|███▌      | 23/65 [01:52<03:21,  4.81s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 35%|███▌      | 23/65 [01:52<03:21,  4.80s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 35%|███▌      | 23/65 [01:52<03:21,  4.80s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 37%|███▋      | 24/65 [01:57<03:18,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 37%|███▋      | 24/65 [01:57<03:18,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 37%|███▋      | 24/65 [01:57<03:18,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:18,487] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 37%|███▋      | 24/65 [01:57<03:19,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:23,409] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 38%|███▊      | 25/65 [02:02<03:15,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 38%|███▊      | 25/65 [02:02<03:15,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 38%|███▊      | 25/65 [02:02<03:16,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 38%|███▊      | 25/65 [02:02<03:16,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 40%|████      | 26/65 [02:06<03:10,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 40%|████      | 26/65 [02:07<03:11,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:28,388] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 40%|████      | 26/65 [02:07<03:11,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 40%|████      | 26/65 [02:07<03:11,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 42%|████▏     | 27/65 [02:11<03:06,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:33,328] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 42%|████▏     | 27/65 [02:11<03:06,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 42%|████▏     | 27/65 [02:12<03:07,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 42%|████▏     | 27/65 [02:12<03:07,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:38,207] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 43%|████▎     | 28/65 [02:16<03:01,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 43%|████▎     | 28/65 [02:16<03:01,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 43%|████▎     | 28/65 [02:16<03:01,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 43%|████▎     | 28/65 [02:16<03:02,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:43,136] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 45%|████▍     | 29/65 [02:21<02:56,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 45%|████▍     | 29/65 [02:21<02:56,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 45%|████▍     | 29/65 [02:21<02:56,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 45%|████▍     | 29/65 [02:21<02:57,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 46%|████▌     | 30/65 [02:26<02:51,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:48,071] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 46%|████▌     | 30/65 [02:26<02:52,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 46%|████▌     | 30/65 [02:26<02:52,  4.93s/it][1,mpirank:2,algo-1]<stderr>:#015 46%|████▌     | 30/65 [02:26<02:52,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 48%|████▊     | 31/65 [02:31<02:45,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:52,778] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 48%|████▊     | 31/65 [02:31<02:45,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 48%|████▊     | 31/65 [02:31<02:44,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 48%|████▊     | 31/65 [02:31<02:44,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 49%|████▉     | 32/65 [02:36<02:40,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:31:57,647] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 49%|████▉     | 32/65 [02:36<02:40,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 49%|████▉     | 32/65 [02:36<02:40,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 49%|████▉     | 32/65 [02:36<02:40,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 51%|█████     | 33/65 [02:41<02:36,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:02,631] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 51%|█████     | 33/65 [02:41<02:36,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 51%|█████     | 33/65 [02:41<02:36,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 51%|█████     | 33/65 [02:41<02:36,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:07,602] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 52%|█████▏    | 34/65 [02:46<02:32,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 52%|█████▏    | 34/65 [02:46<02:32,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 52%|█████▏    | 34/65 [02:46<02:32,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 52%|█████▏    | 34/65 [02:46<02:32,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:12,588] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 54%|█████▍    | 35/65 [02:51<02:28,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 54%|█████▍    | 35/65 [02:51<02:28,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 54%|█████▍    | 35/65 [02:51<02:28,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 54%|█████▍    | 35/65 [02:51<02:28,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:17,256] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 55%|█████▌    | 36/65 [02:55<02:20,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 55%|█████▌    | 36/65 [02:55<02:21,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 55%|█████▌    | 36/65 [02:56<02:21,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 55%|█████▌    | 36/65 [02:56<02:21,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:22,038] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 57%|█████▋    | 37/65 [03:00<02:15,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 57%|█████▋    | 37/65 [03:00<02:15,  4.83s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 57%|█████▋    | 37/65 [03:00<02:15,  4.83s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 57%|█████▋    | 37/65 [03:00<02:15,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 58%|█████▊    | 38/65 [03:05<02:11,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:26,986] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 58%|█████▊    | 38/65 [03:05<02:11,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 58%|█████▊    | 38/65 [03:05<02:11,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 58%|█████▊    | 38/65 [03:05<02:11,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 60%|██████    | 39/65 [03:10<02:08,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:32,135] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 60%|██████    | 39/65 [03:10<02:08,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 60%|██████    | 39/65 [03:10<02:08,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 60%|██████    | 39/65 [03:10<02:08,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:37,123] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 62%|██████▏   | 40/65 [03:15<02:04,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 62%|██████▏   | 40/65 [03:15<02:04,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 62%|██████▏   | 40/65 [03:15<02:04,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 62%|██████▏   | 40/65 [03:15<02:04,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:42,061] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 63%|██████▎   | 41/65 [03:20<01:58,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 63%|██████▎   | 41/65 [03:20<01:58,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 63%|██████▎   | 41/65 [03:20<01:59,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 63%|██████▎   | 41/65 [03:20<01:59,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:46,766] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 65%|██████▍   | 42/65 [03:25<01:52,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 65%|██████▍   | 42/65 [03:25<01:52,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 65%|██████▍   | 42/65 [03:25<01:52,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 65%|██████▍   | 42/65 [03:25<01:52,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 66%|██████▌   | 43/65 [03:30<01:46,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:51,716] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 66%|██████▌   | 43/65 [03:30<01:47,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 66%|██████▌   | 43/65 [03:30<01:47,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 66%|██████▌   | 43/65 [03:30<01:47,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:32:56,365] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 68%|██████▊   | 44/65 [03:35<01:41,  4.83s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 68%|██████▊   | 44/65 [03:35<01:41,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 68%|██████▊   | 44/65 [03:35<01:41,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 68%|██████▊   | 44/65 [03:35<01:41,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:01,110] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 69%|██████▉   | 45/65 [03:39<01:36,  4.80s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 69%|██████▉   | 45/65 [03:39<01:35,  4.79s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 69%|██████▉   | 45/65 [03:39<01:35,  4.79s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 69%|██████▉   | 45/65 [03:39<01:36,  4.81s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:06,077] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 71%|███████   | 46/65 [03:44<01:32,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 71%|███████   | 46/65 [03:44<01:32,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 71%|███████   | 46/65 [03:44<01:32,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 71%|███████   | 46/65 [03:44<01:32,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:11,057] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 72%|███████▏  | 47/65 [03:49<01:28,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 72%|███████▏  | 47/65 [03:49<01:27,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 72%|███████▏  | 47/65 [03:49<01:28,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 72%|███████▏  | 47/65 [03:49<01:28,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:15,879] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 74%|███████▍  | 48/65 [03:54<01:22,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 74%|███████▍  | 48/65 [03:54<01:22,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 74%|███████▍  | 48/65 [03:54<01:22,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 74%|███████▍  | 48/65 [03:54<01:22,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:21,050] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 75%|███████▌  | 49/65 [03:59<01:19,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 75%|███████▌  | 49/65 [03:59<01:19,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 75%|███████▌  | 49/65 [03:59<01:19,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 75%|███████▌  | 49/65 [03:59<01:19,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 77%|███████▋  | 50/65 [04:04<01:13,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 77%|███████▋  | 50/65 [04:04<01:13,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:25,878] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 77%|███████▋  | 50/65 [04:04<01:13,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 77%|███████▋  | 50/65 [04:04<01:14,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:30,940] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 78%|███████▊  | 51/65 [04:09<01:09,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 78%|███████▊  | 51/65 [04:09<01:09,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 78%|███████▊  | 51/65 [04:09<01:09,  4.98s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 78%|███████▊  | 51/65 [04:09<01:10,  5.00s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 80%|████████  | 52/65 [04:14<01:04,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:35,956] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 80%|████████  | 52/65 [04:14<01:04,  4.98s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 80%|████████  | 52/65 [04:14<01:04,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 80%|████████  | 52/65 [04:14<01:04,  4.97s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 82%|████████▏ | 53/65 [04:19<00:59,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 82%|████████▏ | 53/65 [04:19<00:59,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:40,932] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 82%|████████▏ | 53/65 [04:19<00:59,  4.98s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 82%|████████▏ | 53/65 [04:19<00:59,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:45,654] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 83%|████████▎ | 54/65 [04:24<00:53,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 83%|████████▎ | 54/65 [04:24<00:53,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 83%|████████▎ | 54/65 [04:24<00:53,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 83%|████████▎ | 54/65 [04:24<00:53,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 85%|████████▍ | 55/65 [04:29<00:48,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:50,461] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 85%|████████▍ | 55/65 [04:29<00:48,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 85%|████████▍ | 55/65 [04:29<00:48,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 85%|████████▍ | 55/65 [04:29<00:48,  4.87s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 86%|████████▌ | 56/65 [04:33<00:43,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 86%|████████▌ | 56/65 [04:33<00:43,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:33:55,303] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 86%|████████▌ | 56/65 [04:33<00:43,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 86%|████████▌ | 56/65 [04:33<00:43,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:34:00,549] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 88%|████████▊ | 57/65 [04:39<00:39,  4.98s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 88%|████████▊ | 57/65 [04:39<00:40,  5.00s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 88%|████████▊ | 57/65 [04:39<00:39,  4.99s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 88%|████████▊ | 57/65 [04:39<00:39,  4.99s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:34:05,197] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 89%|████████▉ | 58/65 [04:43<00:34,  4.88s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 89%|████████▉ | 58/65 [04:44<00:34,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 89%|████████▉ | 58/65 [04:44<00:34,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 89%|████████▉ | 58/65 [04:44<00:34,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 91%|█████████ | 59/65 [04:48<00:29,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:34:10,293] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 91%|█████████ | 59/65 [04:48<00:29,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 91%|█████████ | 59/65 [04:48<00:29,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 91%|█████████ | 59/65 [04:49<00:29,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:34:15,257] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 92%|█████████▏| 60/65 [04:53<00:24,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 92%|█████████▏| 60/65 [04:53<00:24,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 92%|█████████▏| 60/65 [04:53<00:24,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 92%|█████████▏| 60/65 [04:53<00:24,  4.95s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:34:20,021] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 94%|█████████▍| 61/65 [04:58<00:19,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 94%|█████████▍| 61/65 [04:58<00:19,  4.89s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 94%|█████████▍| 61/65 [04:58<00:19,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 94%|█████████▍| 61/65 [04:58<00:19,  4.91s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 95%|█████████▌| 62/65 [05:03<00:14,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:34:24,803] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 95%|█████████▌| 62/65 [05:03<00:14,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 95%|█████████▌| 62/65 [05:03<00:14,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 95%|█████████▌| 62/65 [05:03<00:14,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 97%|█████████▋| 63/65 [05:08<00:09,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:34:29,654] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 97%|█████████▋| 63/65 [05:08<00:09,  4.86s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 97%|█████████▋| 63/65 [05:08<00:09,  4.84s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 97%|█████████▋| 63/65 [05:08<00:09,  4.85s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 98%|█████████▊| 64/65 [05:13<00:04,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:34:34,767] [WARNING] [stage3.py:3069:step] 3 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 98%|█████████▊| 64/65 [05:13<00:04,  4.93s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 98%|█████████▊| 64/65 [05:13<00:04,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 98%|█████████▊| 64/65 [05:13<00:04,  4.92s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015100%|██████████| 65/65 [05:18<00:00,  4.94s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015100%|██████████| 65/65 [05:18<00:00,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  0%|          | 0/27 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015100%|██████████| 65/65 [05:18<00:00,  4.96s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015100%|██████████| 65/65 [05:18<00:00,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  0%|          | 0/27 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:34:39,860] [WARNING] [stage3.py:3069:step] 2 pytorch allocator cache flushes since last step. this happens when there is high memory pressure and is detrimental to performance. if this is happening frequently consider adjusting settings to reduce memory consumption. If you are unable to make the cache flushes go away consider adding torch.cuda.empty_cache() calls in your training loop to ensure that all ranks flush their caches at the same time\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 65/65 [05:18<00:00,  4.98s/it][1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 65/65 [05:18<00:00,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015100%|██████████| 65/65 [05:18<00:00,  4.98s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015100%|██████████| 65/65 [05:18<00:00,  4.90s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:epoch=1: train_ppl=tensor(1265.1027, device='cuda:0') train_epoch_loss=tensor(7.1429, device='cuda:0')\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  0%|          | 0/27 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  0%|          | 0/27 [00:00<?, ?it/s]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  4%|▎         | 1/27 [00:01<00:38,  1.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  4%|▎         | 1/27 [00:01<00:42,  1.62s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  4%|▎         | 1/27 [00:01<00:41,  1.59s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015  4%|▎         | 1/27 [00:01<00:38,  1.48s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015  7%|▋         | 2/27 [00:02<00:32,  1.30s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015  7%|▋         | 2/27 [00:02<00:33,  1.36s/it][1,mpirank:0,algo-1]<stderr>:#015  7%|▋         | 2/27 [00:02<00:32,  1.30s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015  7%|▋         | 2/27 [00:02<00:33,  1.35s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 11%|█         | 3/27 [00:03<00:29,  1.25s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 11%|█         | 3/27 [00:03<00:30,  1.28s/it][1,mpirank:0,algo-1]<stderr>:#015 11%|█         | 3/27 [00:03<00:29,  1.25s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 11%|█         | 3/27 [00:03<00:30,  1.27s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 15%|█▍        | 4/27 [00:05<00:28,  1.22s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 15%|█▍        | 4/27 [00:05<00:28,  1.24s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 15%|█▍        | 4/27 [00:05<00:28,  1.24s/it][1,mpirank:0,algo-1]<stderr>:#015 15%|█▍        | 4/27 [00:05<00:28,  1.22s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 19%|█▊        | 5/27 [00:06<00:26,  1.21s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 19%|█▊        | 5/27 [00:06<00:26,  1.22s/it][1,mpirank:0,algo-1]<stderr>:#015 19%|█▊        | 5/27 [00:06<00:26,  1.21s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 19%|█▊        | 5/27 [00:06<00:26,  1.22s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 22%|██▏       | 6/27 [00:07<00:25,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 22%|██▏       | 6/27 [00:07<00:25,  1.21s/it][1,mpirank:0,algo-1]<stderr>:#015 22%|██▏       | 6/27 [00:07<00:25,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 22%|██▏       | 6/27 [00:07<00:25,  1.21s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 26%|██▌       | 7/27 [00:08<00:23,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 26%|██▌       | 7/27 [00:08<00:24,  1.20s/it][1,mpirank:0,algo-1]<stderr>:#015 26%|██▌       | 7/27 [00:08<00:23,  1.20s/it][1,mpirank:1,algo-1]<stderr>:#015 26%|██▌       | 7/27 [00:08<00:23,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 30%|██▉       | 8/27 [00:09<00:22,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 30%|██▉       | 8/27 [00:09<00:22,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 30%|██▉       | 8/27 [00:09<00:22,  1.19s/it][1,mpirank:0,algo-1]<stderr>:#015 30%|██▉       | 8/27 [00:09<00:22,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 33%|███▎      | 9/27 [00:10<00:21,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 33%|███▎      | 9/27 [00:11<00:21,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 33%|███▎      | 9/27 [00:11<00:21,  1.19s/it][1,mpirank:0,algo-1]<stderr>:#015 33%|███▎      | 9/27 [00:10<00:21,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 37%|███▋      | 10/27 [00:12<00:20,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 37%|███▋      | 10/27 [00:12<00:20,  1.19s/it][1,mpirank:0,algo-1]<stderr>:#015 37%|███▋      | 10/27 [00:12<00:20,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 37%|███▋      | 10/27 [00:12<00:20,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 41%|████      | 11/27 [00:13<00:18,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 41%|████      | 11/27 [00:13<00:18,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 41%|████      | 11/27 [00:13<00:18,  1.18s/it][1,mpirank:2,algo-1]<stderr>:#015 41%|████      | 11/27 [00:13<00:18,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 44%|████▍     | 12/27 [00:14<00:17,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 44%|████▍     | 12/27 [00:14<00:17,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 44%|████▍     | 12/27 [00:14<00:17,  1.18s/it][1,mpirank:0,algo-1]<stderr>:#015 44%|████▍     | 12/27 [00:14<00:17,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 48%|████▊     | 13/27 [00:15<00:16,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 48%|████▊     | 13/27 [00:15<00:16,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 48%|████▊     | 13/27 [00:15<00:16,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 48%|████▊     | 13/27 [00:15<00:16,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 52%|█████▏    | 14/27 [00:16<00:15,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 52%|█████▏    | 14/27 [00:16<00:15,  1.18s/it][1,mpirank:0,algo-1]<stderr>:#015 52%|█████▏    | 14/27 [00:16<00:15,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 52%|█████▏    | 14/27 [00:16<00:15,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 56%|█████▌    | 15/27 [00:18<00:14,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 56%|█████▌    | 15/27 [00:18<00:14,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 56%|█████▌    | 15/27 [00:18<00:14,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 56%|█████▌    | 15/27 [00:18<00:14,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 59%|█████▉    | 16/27 [00:19<00:13,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 59%|█████▉    | 16/27 [00:19<00:13,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 59%|█████▉    | 16/27 [00:19<00:13,  1.18s/it][1,mpirank:1,algo-1]<stderr>:#015 59%|█████▉    | 16/27 [00:19<00:13,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 63%|██████▎   | 17/27 [00:20<00:11,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 63%|██████▎   | 17/27 [00:20<00:11,  1.18s/it][1,mpirank:0,algo-1]<stderr>:#015 63%|██████▎   | 17/27 [00:20<00:11,  1.18s/it][1,mpirank:1,algo-1]<stderr>:#015 63%|██████▎   | 17/27 [00:20<00:11,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 67%|██████▋   | 18/27 [00:21<00:10,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 67%|██████▋   | 18/27 [00:21<00:10,  1.18s/it][1,mpirank:0,algo-1]<stderr>:#015 67%|██████▋   | 18/27 [00:21<00:10,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 67%|██████▋   | 18/27 [00:21<00:10,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 70%|███████   | 19/27 [00:22<00:09,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 70%|███████   | 19/27 [00:22<00:09,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 70%|███████   | 19/27 [00:22<00:09,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 70%|███████   | 19/27 [00:22<00:09,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 74%|███████▍  | 20/27 [00:23<00:08,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 74%|███████▍  | 20/27 [00:24<00:08,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 74%|███████▍  | 20/27 [00:23<00:08,  1.18s/it][1,mpirank:1,algo-1]<stderr>:#015 74%|███████▍  | 20/27 [00:24<00:08,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 78%|███████▊  | 21/27 [00:25<00:07,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 78%|███████▊  | 21/27 [00:25<00:07,  1.19s/it][1,mpirank:0,algo-1]<stderr>:#015 78%|███████▊  | 21/27 [00:25<00:07,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 78%|███████▊  | 21/27 [00:25<00:07,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 81%|████████▏ | 22/27 [00:26<00:05,  1.19s/it][1,mpirank:2,algo-1]<stderr>:#015 81%|████████▏ | 22/27 [00:26<00:05,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 81%|████████▏ | 22/27 [00:26<00:05,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 81%|████████▏ | 22/27 [00:26<00:05,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 85%|████████▌ | 23/27 [00:27<00:04,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 85%|████████▌ | 23/27 [00:27<00:04,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 85%|████████▌ | 23/27 [00:27<00:04,  1.18s/it][1,mpirank:0,algo-1]<stderr>:#015 85%|████████▌ | 23/27 [00:27<00:04,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 89%|████████▉ | 24/27 [00:28<00:03,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 89%|████████▉ | 24/27 [00:28<00:03,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 89%|████████▉ | 24/27 [00:28<00:03,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 89%|████████▉ | 24/27 [00:28<00:03,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 93%|█████████▎| 25/27 [00:29<00:02,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 93%|█████████▎| 25/27 [00:30<00:02,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015 93%|█████████▎| 25/27 [00:29<00:02,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 93%|█████████▎| 25/27 [00:29<00:02,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015 96%|█████████▋| 26/27 [00:31<00:01,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015 96%|█████████▋| 26/27 [00:31<00:01,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015 96%|█████████▋| 26/27 [00:31<00:01,  1.18s/it][1,mpirank:1,algo-1]<stderr>:#015 96%|█████████▋| 26/27 [00:31<00:01,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.18s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.18s/it][1,mpirank:1,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.18s/it][1,mpirank:2,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.20s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stderr>:#015100%|██████████| 27/27 [00:32<00:00,  1.19s/it]\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:epoch=1: eval_ppl=tensor(1468.0151, device='cuda:0') eval_epoch_loss=tensor(7.2917, device='cuda:0')\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:35:12.153: I smdistributed/modelparallel/torch/checkpoint.py:241] [2] Saving full checkpoint with tag gptneo_3b_model.pt to /opt/ml/checkpoints\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:35:12.153: I smdistributed/modelparallel/torch/checkpoint.py:241] [3] Saving full checkpoint with tag gptneo_3b_model.pt to /opt/ml/checkpoints\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:35:12.153: I smdistributed/modelparallel/torch/checkpoint.py:241] [1] Saving full checkpoint with tag gptneo_3b_model.pt to /opt/ml/checkpoints\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:35:12.154: I smdistributed/modelparallel/torch/checkpoint.py:241] [0] Saving full checkpoint with tag gptneo_3b_model.pt to /opt/ml/checkpoints\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:[2023-04-27 14:35:22.729: W smdistributed/modelparallel/torch/checkpoint.py:269] Saving optimizer for full checkpoint is not supported, skipping...\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:[2023-04-27 14:35:22.729: W smdistributed/modelparallel/torch/checkpoint.py:269] Saving optimizer for full checkpoint is not supported, skipping...\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:[2023-04-27 14:35:22.729: W smdistributed/modelparallel/torch/checkpoint.py:269] Saving optimizer for full checkpoint is not supported, skipping...\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:[2023-04-27 14:35:22.866: W smdistributed/modelparallel/torch/checkpoint.py:269] Saving optimizer for full checkpoint is not supported, skipping...\u001b[0m\n",
      "\u001b[34m[1,mpirank:3,algo-1]<stdout>:saving the final model\u001b[0m\n",
      "\u001b[34m[1,mpirank:1,algo-1]<stdout>:saving the final model\u001b[0m\n",
      "\u001b[34m[1,mpirank:2,algo-1]<stdout>:saving the final model\u001b[0m\n",
      "\u001b[34m[1,mpirank:0,algo-1]<stdout>:saving the final model\u001b[0m\n",
      "\u001b[34m2023-04-27 14:35:24,571 sagemaker-training-toolkit INFO     Waiting for the process to finish and give a return code.\u001b[0m\n",
      "\u001b[34m2023-04-27 14:35:24,571 sagemaker-training-toolkit INFO     Done waiting for a return code. Received 0 from exiting process.\u001b[0m\n",
      "\u001b[34m2023-04-27 14:35:24,571 sagemaker-training-toolkit INFO     Begin writing status file from leader node to worker nodes (if any)\u001b[0m\n",
      "\u001b[34m2023-04-27 14:35:54,599 sagemaker-training-toolkit INFO     Finished writing status file from leader node to worker nodes (if any)\u001b[0m\n",
      "\u001b[34m2023-04-27 14:35:54,600 sagemaker-training-toolkit INFO     Reporting training SUCCESS\u001b[0m\n",
      "\n",
      "2023-04-27 14:36:01 Uploading - Uploading generated training model\n",
      "2023-04-27 14:36:01 Completed - Training job completed\n",
      "Training seconds: 1463\n",
      "Billable seconds: 1463\n"
     ]
    }
   ],
   "source": [
    "estimator.fit({\"train\":train_data_url,\"valid\":valid_data_url})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Store the checkpoint path to reuse in the deploy notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store checkpoint_s3_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   }
  ],
  "instance_type": "ml.m5.large",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  },
  "vscode": {
   "interpreter": {
    "hash": "2df149412efc1526e813459d121195dcad0cc0c344007149632d30b7359a266e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}