{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 2_training_oxford-pet_ddp(Distributed Data Parallel)\n", "---\n", "\n", "본 모듈에서는 Amzaon SageMaker API을 효과적으로 이용하기 위해 multigpu-distributed 학습을 위한 PyTorch 프레임워크 기반 모델 훈련을 수행해 봅니다." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "install_needed = True\n", "# install_needed = False" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "installing deps and restarting kernel\n" ] } ], "source": [ "import sys\n", "import IPython\n", "\n", "if install_needed:\n", " print(\"installing deps and restarting kernel\")\n", " !{sys.executable} -m pip install --upgrade pip --quiet\n", " !{sys.executable} -m pip install -U wget split-folders --quiet\n", " IPython.Application.instance().kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## 1. Sagemaker notebook 설명\n", "

Sagemaker notebook은 완전 관리형 서비스로 컨테이너 기반으로 구성되어 있습니다. 사용자가 직접 컨테이너를 볼 수 없지만, 내부적으로는 아래와 같은 원리로 동작합니다.

\n", "

\n", "\n", "- **S3 (Simple Storage Serivce)** : Object Storage로서 학습할 데이터 파일과 학습 결과인 model, checkpoint, tensorboard를 위한 event 파일, 로그 정보 등을 저장하는데 사용합니다.\n", "- **SageMaker Notebook** : 학습을 위한 스크립트 작성과 디버깅, 그리고 실제 학습을 수행하기 위한 Python을 개발하기 위한 환경을 제공합니다.\n", "- **Amazon Elastic Container Registry(ECR)** : Docker 컨테이너 이미지를 손쉽게 저장, 관리 및 배포할 수 있게 해주는 완전관리형 Docker 컨테이너 레지스트리입니다. Sagemaker는 기본적인 컨테이너를 제공하기 때문에 별도 ECR에 컨테이너 이미지를 등록할 필요는 없습니다. 하지만, 별도의 학습 및 배포 환경이 필요한 경우 custom 컨테이너 이미지를 만들어서 ECR에 등록한 후 이 환경을 활용할 수 있습니다.\n", "\n", "

학습과 추론을 하는 hosting 서비스는 각각 다른 컨테이너 환경에서 수행할 수 있으며, 쉽게 다량으로 컨테이너 환경을 확장할 수 있으므로 다량의 학습과 hosting이 동시에 가능합니다. \n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Sagemaker 학습에 필요한 기본적인 package를 import 합니다.

\n", "

boto3는 HTTP API 호출을 숨기는 편한 추상화 모델을 가지고 있고, Amazon EC2 인스턴스 및 S3 버켓과 같은 AWS 리소스와 동작하는 파이선 클래스를 제공합니다.

\n", "

sagemaker python sdk는 Amazon SageMaker에서 기계 학습 모델을 교육 및 배포하기 위한 오픈 소스 라이브러리입니다.

" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "import sagemaker\n", "from pathlib import Path\n", "from time import strftime\n", "\n", "sagemaker_session = sagemaker.Session()\n", "\n", "bucket = sagemaker_session.default_bucket()\n", "role = sagemaker.get_execution_role()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'2.169.0'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sagemaker.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset 소개 및 split 후 S3 upload하기\n", "

이번 학습에 사용할 이미지 데이터는 Oxford-IIIT Pet Dataset 입니다. Oxford-IIIT Pet Dataset은 37개 다른 종의 개와 고양이 이미지를 각각 200장 씩 제공하고 있으며, Ground Truth 또한 Classification, Object Detection, Segmentation와 관련된 모든 정보가 있으나, 이번 학습에서는 37개 class에 대해 일부 이미지로 Classification 문제를 해결하기 위해 학습을 진행할 예정입니다.

\n", "

\n", "

이미지 파일을 학습하기 위해 SageMaker Notebook 환경으로 upload를 합니다. 폴더 구조는 아래와 같은 형태로 구성되어야 합니다.

\n", "
\n",
    "
\n", " image_path/class1/Aimage_1
\n", " Aimage_2
\n", " ...
\n", " Aimage_N
\n", " image_path/class2/Bimage_1
\n", " Bimage_2
\n", " ...
\n", " Bimage_M
\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

SageMaker 학습을 위해 train/val로 분리한 폴더를 S3내 이전에 지정한 bucket 내 prefix 하위 폴더로 upload합니다.

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Uploading the data to S3\n", "We are going to use the `sagemaker.Session.upload_data` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use later when we start the training job.\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [] }, "outputs": [], "source": [ "def make_dir(img_path, delete=True):\n", " import shutil, os\n", " try:\n", " if not os.path.exists(img_path):\n", " os.makedirs(img_path)\n", " else:\n", " if delete:\n", " shutil.rmtree(img_path)\n", " except OSError:\n", " print(\"Error\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [] }, "outputs": [], "source": [ "!rm -rf dataset/oxford_dataset" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "tags": [] }, "outputs": [], "source": [ "rawimg_path = 'dataset/oxford_dataset/raw'\n", "dataset_path = 'dataset/oxford_dataset/dataset'\n", "output_dir = 'dataset/oxford_dataset/output'\n", "\n", "make_dir(rawimg_path)\n", "make_dir(dataset_path)\n", "make_dir(output_dir)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [] }, "outputs": [], "source": [ "if not (os.path.isfile(\"images.tar.gz\") and tarfile.is_tarfile(\"images.tar.gz\")):\n", " wget.download('https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz')\n", "tar = tarfile.open(\"images.tar.gz\")\n", "tar.extractall(path=rawimg_path)\n", "tar.close()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [] }, "outputs": [], "source": [ "import cv2\n", "import os\n", "import glob\n", "import wget\n", "import tarfile\n", "import splitfolders\n", "import numpy as np\n", "import shutil\n", "\n", "def checkImage(path):\n", " try:\n", " with open(path, 'rb') as f:\n", " data = f.read()\n", " f.seek(-2,2)\n", " value = f.read()\n", "\n", " encoded_img = np.frombuffer(data, dtype = np.uint8)\n", " img_cv = cv2.imdecode(encoded_img, cv2.IMREAD_COLOR)\n", " if img_cv.shape[0]>0 and value == b'\\xff\\xd9':\n", " return True\n", " except:\n", " return False" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [] }, "outputs": [], "source": [ "corrupt_img = ['Egyptian_Mau_14.jpg','Egyptian_Mau_139.jpg','Egyptian_Mau_145.jpg','Egyptian_Mau_156.jpg',\n", " 'Egyptian_Mau_167.jpg','Egyptian_Mau_177.jpg','Egyptian_Mau_186.jpg','Egyptian_Mau_191.jpg',\n", " 'Abyssinian_5.jpg','Abyssinian_34.jpg','chihuahua_121.jpg','beagle_116.jpg']" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dataset/oxford_dataset/raw/images/Egyptian_Mau_191.jpg\n", "dataset/oxford_dataset/raw/images/Abyssinian_100.mat\n", "dataset/oxford_dataset/raw/images/Abyssinian_102.mat\n", "dataset/oxford_dataset/raw/images/Egyptian_Mau_14.jpg\n", "dataset/oxford_dataset/raw/images/Egyptian_Mau_156.jpg\n", "dataset/oxford_dataset/raw/images/Egyptian_Mau_139.jpg\n", "dataset/oxford_dataset/raw/images/Egyptian_Mau_167.jpg\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Corrupt JPEG data: premature end of data segment\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "dataset/oxford_dataset/raw/images/beagle_116.jpg\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Corrupt JPEG data: 240 extraneous bytes before marker 0xd9\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "dataset/oxford_dataset/raw/images/chihuahua_121.jpg\n", "dataset/oxford_dataset/raw/images/Egyptian_Mau_138.jpg\n", "dataset/oxford_dataset/raw/images/Abyssinian_34.jpg\n", "dataset/oxford_dataset/raw/images/Abyssinian_101.mat\n", "dataset/oxford_dataset/raw/images/Egyptian_Mau_177.jpg\n", "dataset/oxford_dataset/raw/images/Egyptian_Mau_145.jpg\n", "dataset/oxford_dataset/raw/images/Egyptian_Mau_186.jpg\n", "dataset/oxford_dataset/raw/images/Abyssinian_5.jpg\n" ] } ], "source": [ "file_dir = os.path.join(rawimg_path, 'images')\n", "\n", "for file_path in glob.glob(file_dir + \"/*\"):\n", " filename = file_path.split(\"/\")[4]\n", " if checkImage(file_path) and filename not in corrupt_img:\n", " dir_name = filename.split(\"_\")\n", " dir_name.pop()\n", " dir_name = \"_\".join(dir_name)\n", " dir_path = os.path.join(output_dir, dir_name)\n", " make_dir(dir_path, False)\n", " target_name = os.path.join(dir_path, filename)\n", " shutil.copyfile(file_path, target_name)\n", " else:\n", " print(file_path)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Copying files: 7377 files [00:01, 4977.55 files/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Note: AWS CLI version 2, the latest major version of the AWS CLI, is now stable and recommended for general use. For more information, see the AWS CLI version 2 installation instructions at: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html\n", "\n", "usage: aws [options] [ ...] [parameters]\n", "To see help text, you can run:\n", "\n", " aws help\n", " aws help\n", " aws help\n", "aws: error: the following arguments are required: paths\n", "Note: AWS CLI version 2, the latest major version of the AWS CLI, is now stable and recommended for general use. For more information, see the AWS CLI version 2 installation instructions at: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html\n", "\n", "usage: aws [options] [ ...] [parameters]\n", "To see help text, you can run:\n", "\n", " aws help\n", " aws help\n", " aws help\n", "aws: error: the following arguments are required: paths\n" ] } ], "source": [ "splitfolders.ratio(output_dir, output=dataset_path, seed=1337, ratio=(.8, .1, .1)) # default values\n", "inputs = 's3://{}/{}'.format(bucket, 'oxford_pet_dataset')\n", "!aws s3 rm $s3_data_path --quiet --recursive\n", "!aws s3 cp $dataset_path $s3_data_path --quiet --recursive" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Distributed Training\n", "\n", "AWS에서 Multigpu distributed training은 `data_parallel`와 `model_parallel` 를 모두 사용할 수 있으며, 아래 예제는 data_parallel 중심으로 학습을 하게 됩니다. \n", "\n", "\n", "\n", "\n", "- **[SageMaker Distributed Data Parallel](https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html)** : AWS의 네트워크 인프라와 Balanced Fusion Buffers 를 이용하여 AWS SageMaker에 최적화된 data parallel 분산학습 알고리즘을 제공합니다.\n", "\n", "- **DataParallel (DP)** : 데이터 샘플의 미니 배치를 여러 개의 더 작은 미니 배치로 나누고 병렬로 작은 미니 배치를 각각 계산하는 방식이며, 단일 host에서 multi-gpu인 경우와 cpu 연산일 경우에 사용합니다. DP의 단점은 GPU가 즐어나면서 communication 비용이 높아지게 되면서 성능저하가 발생하게 되는데 일반적으로 4 gpu 이상일 경우 발생한다고 합니다. 또한, 타 GPU 메모리 대비 0번 GPU 메모리 사용량이 증가하는 현상도 발생합니다. \n", "\n", "- **Distributed Data Parallel (DDP)** : 모듈 수준에서 데이터 병렬 처리를 구현하는 것으로 torch.distributed 패키지의 communication collectives를 사용하여 gradient, parameters, buffers를 동기화합니다. 프로세스 내와 프로세스 간을 사용하는 multi-host의 multi-gpu 와 같은 경우에 사용하게 되는데, 프로세스 내에서는 DDP는 input 모듈을 device_id에 특정한 device로 복제하고, 그에 따라 배치 크기로 input을 분산시키며, outputs는 DataParallel과 유사하게 output_device로 모으게 됩니다. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model training with Distributed Data Parallel\n", "\n", "\n", "The training script provides the code you need for distributed data parallel (DDP) training. The training script is very similar to a PyTorch training script you might run outside of SageMaker.\n", "\n", "In the following code block, you can update the estimator function to use a different instance type, instance count, and distrubtion strategy. You're also passing in the training script you reviewed in the previous cell." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "tags": [] }, "outputs": [], "source": [ "metric_definitions=[\n", " {'Name': 'train:Time', 'Regex': 'Train_Time=(.*?):'},\n", " {'Name': 'train:Loss', 'Regex': 'Train_Loss=(.*?):'},\n", " {'Name': 'train:Prec@1', 'Regex': 'Train_Prec@1=(.*?):'},\n", " {'Name': 'train:Prec@5', 'Regex': 'Train_Prec@5=(.*?):'},\n", " {'Name': 'test:Time', 'Regex': 'Test_Time=(.*?):'},\n", " {'Name': 'test:Loss', 'Regex': 'Test_Loss=(.*?):'},\n", " {'Name': 'test:Prec@1', 'Regex': 'Test_Prec@1=(.*?):'},\n", " {'Name': 'test:Prec@5', 'Regex': 'Test_Prec@5=(.*?):'}\n", "]" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "tags": [] }, "outputs": [], "source": [ "hyperparameters = {\n", " # 'model_name' : 'resnext101_32x8d',\n", " 'model_name' : 'swin_b',\n", " 'num-classes' : 37,\n", " 'height' : 128,\n", " 'width' : 128,\n", " 'num-epochs': 15,\n", " 'batch-size' : 80, # 80 128 136\n", " 'test-batch-size' : 200, \n", " 'lr': 0.0001,\n", " # 'backend': 'nccl', \n", " 'backend': 'smddp', \n", " }" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "tags": [] }, "outputs": [], "source": [ "distribution = {}\n", "\n", "if hyperparameters['backend'] == 'nccl':\n", " # ### MPIRUN 수행\n", " distribution[\"mpi\"]={\"enabled\": True}\n", "elif hyperparameters['backend'] == 'smddp':\n", " ### SageMaker DDP\n", " distribution[\"smdistributed\"] = {\"dataparallel\": {\"enabled\": True}}" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "tags": [] }, "outputs": [], "source": [ "instance_type = 'ml.p3.16xlarge' # 'ml.p3.16xlarge', 'ml.p3dn.24xlarge', 'ml.p4d.24xlarge', 'local_gpu'\n", "# instance_type = 'local_gpu'\n", "instance_count = 2\n", "max_run = 1*60*60" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole\n" ] }, { "data": { "text/plain": [ "'s3://sagemaker-us-west-2-322537213286/oxford_pet_dataset'" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "if instance_type =='local_gpu':\n", " from sagemaker.local import LocalSession\n", "\n", " sagemaker_session = LocalSession()\n", " sagemaker_session.config = {'local': {'local_code': True}}\n", " s3_data_path = f'file://{Path.cwd()}/dataset/oxford_dataset/dataset'\n", "else:\n", " sagemaker_session = sagemaker.Session()\n", " s3_data_path = inputs\n", "s3_data_path" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.pytorch import PyTorch\n", "\n", "estimator = PyTorch(entry_point='pytorch_oxford_ddp.py',\n", " source_dir=f'{Path.cwd()}/training_code/oxford',\n", " role=role,\n", " framework_version='1.13.1',\n", " py_version='py39',\n", " instance_count=instance_count,\n", " instance_type=instance_type,\n", " distribution=distribution,\n", " metric_definitions=metric_definitions,\n", " disable_profiler=True,\n", " debugger_hook_config=False,\n", " max_run=max_run,\n", " hyperparameters=hyperparameters,\n", " sagemaker_session=sagemaker_session\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After we've constructed our `PyTorch` object, we can fit it using the data we uploaded to S3. SageMaker makes sure our data is available in the local filesystem, so our training script can simply read the data from disk.\n" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.\n", "INFO:sagemaker:Creating training-job with name: oxford-ml-p3-16xlarge-0702-08581688288298\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Using provided s3_resource\n" ] } ], "source": [ "current_time = strftime(\"%m%d-%H%M%s\")\n", "i_type = instance_type.replace('.','-')\n", "job_name = f'oxford-{i_type}-{current_time}'\n", "\n", "estimator.fit(\n", " inputs={'training': s3_data_path}, \n", " job_name=job_name,\n", " wait=False\n", ")" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023-07-02 08:58:19 Starting - Starting the training job......\n", "2023-07-02 08:59:15 Starting - Preparing the instances for training.........\n", "2023-07-02 09:00:26 Downloading - Downloading input data......\n", "2023-07-02 09:01:21 Training - Downloading the training image..................\n", "2023-07-02 09:04:28 Training - Training image download completed. Training in progress..\u001b[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device\u001b[0m\n", "\u001b[34mbash: no job control in this shell\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,777 sagemaker-training-toolkit INFO Imported framework sagemaker_pytorch_container.training\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,842 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,855 sagemaker_pytorch_container.training INFO Block until all host DNS lookups succeed.\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,858 sagemaker_pytorch_container.training INFO Invoking SMDataParallel\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,858 sagemaker_pytorch_container.training INFO Invoking user training script.\u001b[0m\n", "\u001b[34m2023-07-02 09:04:56,110 sagemaker-training-toolkit INFO Installing dependencies from requirements.txt:\u001b[0m\n", "\u001b[34m/opt/conda/bin/python3.9 -m pip install -r requirements.txt\u001b[0m\n", "\u001b[34mCollecting albumentations (from -r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading albumentations-1.3.1-py3-none-any.whl (125 kB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.7/125.7 kB 6.2 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mRequirement already satisfied: numpy>=1.11.1 in /opt/conda/lib/python3.9/site-packages (from albumentations->-r requirements.txt (line 1)) (1.23.5)\u001b[0m\n", "\u001b[34mRequirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.9/site-packages (from albumentations->-r requirements.txt (line 1)) (1.10.1)\u001b[0m\n", "\u001b[34mCollecting scikit-image>=0.16.1 (from albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading scikit_image-0.21.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.8/13.8 MB 78.7 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mRequirement already satisfied: PyYAML in /opt/conda/lib/python3.9/site-packages (from albumentations->-r requirements.txt (line 1)) (5.4.1)\u001b[0m\n", "\u001b[34mCollecting qudida>=0.0.4 (from albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading qudida-0.0.4-py3-none-any.whl (3.5 kB)\u001b[0m\n", "\u001b[34mCollecting opencv-python-headless>=4.1.1 (from albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading opencv_python_headless-4.8.0.74-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.1 MB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.1/49.1 MB 37.1 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mRequirement already satisfied: scikit-learn>=0.19.1 in /opt/conda/lib/python3.9/site-packages (from qudida>=0.0.4->albumentations->-r requirements.txt (line 1)) (1.2.2)\u001b[0m\n", "\u001b[34mRequirement already satisfied: typing-extensions in /opt/conda/lib/python3.9/site-packages (from qudida>=0.0.4->albumentations->-r requirements.txt (line 1)) (4.5.0)\u001b[0m\n", "\u001b[34mRequirement already satisfied: networkx>=2.8 in /opt/conda/lib/python3.9/site-packages (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1)) (3.1)\u001b[0m\n", "\u001b[34mRequirement already satisfied: pillow>=9.0.1 in /opt/conda/lib/python3.9/site-packages (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1)) (9.5.0)\u001b[0m\n", "\u001b[34mRequirement already satisfied: imageio>=2.27 in /opt/conda/lib/python3.9/site-packages (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1)) (2.28.0)\u001b[0m\n", "\u001b[34mCollecting tifffile>=2022.8.12 (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading tifffile-2023.4.12-py3-none-any.whl (219 kB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 219.4/219.4 kB 42.2 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mCollecting PyWavelets>=1.1.1 (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading PyWavelets-1.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.9 MB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.9/6.9 MB 89.2 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mRequirement already satisfied: packaging>=21 in /opt/conda/lib/python3.9/site-packages (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1)) (23.1)\u001b[0m\n", "\u001b[34mCollecting lazy_loader>=0.2 (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading lazy_loader-0.3-py3-none-any.whl (9.1 kB)\u001b[0m\n", "\u001b[34mRequirement already satisfied: joblib>=1.1.1 in /opt/conda/lib/python3.9/site-packages (from scikit-learn>=0.19.1->qudida>=0.0.4->albumentations->-r requirements.txt (line 1)) (1.2.0)\u001b[0m\n", "\u001b[34mRequirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.9/site-packages (from scikit-learn>=0.19.1->qudida>=0.0.4->albumentations->-r requirements.txt (line 1)) (3.1.0)\u001b[0m\n", "\u001b[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device\u001b[0m\n", "\u001b[34mbash: no job control in this shell\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,342 sagemaker-training-toolkit INFO Imported framework sagemaker_pytorch_container.training\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,407 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,419 sagemaker_pytorch_container.training INFO Block until all host DNS lookups succeed.\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,422 sagemaker_pytorch_container.training INFO Invoking SMDataParallel\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,422 sagemaker_pytorch_container.training INFO Invoking user training script.\u001b[0m\n", "\u001b[34m2023-07-02 09:04:55,673 sagemaker-training-toolkit INFO Installing dependencies from requirements.txt:\u001b[0m\n", "\u001b[34m/opt/conda/bin/python3.9 -m pip install -r requirements.txt\u001b[0m\n", "\u001b[34mCollecting albumentations (from -r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading albumentations-1.3.1-py3-none-any.whl (125 kB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.7/125.7 kB 6.5 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mRequirement already satisfied: numpy>=1.11.1 in /opt/conda/lib/python3.9/site-packages (from albumentations->-r requirements.txt (line 1)) (1.23.5)\u001b[0m\n", "\u001b[34mRequirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.9/site-packages (from albumentations->-r requirements.txt (line 1)) (1.10.1)\u001b[0m\n", "\u001b[34mCollecting scikit-image>=0.16.1 (from albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading scikit_image-0.21.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.8/13.8 MB 78.8 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mRequirement already satisfied: PyYAML in /opt/conda/lib/python3.9/site-packages (from albumentations->-r requirements.txt (line 1)) (5.4.1)\u001b[0m\n", "\u001b[34mCollecting qudida>=0.0.4 (from albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading qudida-0.0.4-py3-none-any.whl (3.5 kB)\u001b[0m\n", "\u001b[34mCollecting opencv-python-headless>=4.1.1 (from albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading opencv_python_headless-4.8.0.74-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.1 MB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.1/49.1 MB 35.5 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mRequirement already satisfied: scikit-learn>=0.19.1 in /opt/conda/lib/python3.9/site-packages (from qudida>=0.0.4->albumentations->-r requirements.txt (line 1)) (1.2.2)\u001b[0m\n", "\u001b[34mRequirement already satisfied: typing-extensions in /opt/conda/lib/python3.9/site-packages (from qudida>=0.0.4->albumentations->-r requirements.txt (line 1)) (4.5.0)\u001b[0m\n", "\u001b[34mRequirement already satisfied: networkx>=2.8 in /opt/conda/lib/python3.9/site-packages (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1)) (3.1)\u001b[0m\n", "\u001b[34mRequirement already satisfied: pillow>=9.0.1 in /opt/conda/lib/python3.9/site-packages (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1)) (9.5.0)\u001b[0m\n", "\u001b[34mRequirement already satisfied: imageio>=2.27 in /opt/conda/lib/python3.9/site-packages (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1)) (2.28.0)\u001b[0m\n", "\u001b[34mCollecting tifffile>=2022.8.12 (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading tifffile-2023.4.12-py3-none-any.whl (219 kB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 219.4/219.4 kB 36.3 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mCollecting PyWavelets>=1.1.1 (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading PyWavelets-1.4.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.9 MB)\u001b[0m\n", "\u001b[34m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.9/6.9 MB 86.5 MB/s eta 0:00:00\u001b[0m\n", "\u001b[34mRequirement already satisfied: packaging>=21 in /opt/conda/lib/python3.9/site-packages (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1)) (23.1)\u001b[0m\n", "\u001b[34mCollecting lazy_loader>=0.2 (from scikit-image>=0.16.1->albumentations->-r requirements.txt (line 1))\u001b[0m\n", "\u001b[34mDownloading lazy_loader-0.3-py3-none-any.whl (9.1 kB)\u001b[0m\n", "\u001b[34mRequirement already satisfied: joblib>=1.1.1 in /opt/conda/lib/python3.9/site-packages (from scikit-learn>=0.19.1->qudida>=0.0.4->albumentations->-r requirements.txt (line 1)) (1.2.0)\u001b[0m\n", "\u001b[34mRequirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.9/site-packages (from scikit-learn>=0.19.1->qudida>=0.0.4->albumentations->-r requirements.txt (line 1)) (3.1.0)\u001b[0m\n", "\u001b[34mInstalling collected packages: tifffile, PyWavelets, opencv-python-headless, lazy_loader, scikit-image, qudida, albumentations\u001b[0m\n", "\u001b[35mInstalling collected packages: tifffile, PyWavelets, opencv-python-headless, lazy_loader, scikit-image, qudida, albumentations\u001b[0m\n", "\u001b[34mSuccessfully installed PyWavelets-1.4.1 albumentations-1.3.1 lazy_loader-0.3 opencv-python-headless-4.8.0.74 qudida-0.0.4 scikit-image-0.21.0 tifffile-2023.4.12\u001b[0m\n", "\u001b[34mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\n", "\u001b[35mSuccessfully installed PyWavelets-1.4.1 albumentations-1.3.1 lazy_loader-0.3 opencv-python-headless-4.8.0.74 qudida-0.0.4 scikit-image-0.21.0 tifffile-2023.4.12\u001b[0m\n", "\u001b[35mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\n", "\u001b[35m[notice] A new release of pip is available: 23.1.1 -> 23.1.2\u001b[0m\n", "\u001b[35m[notice] To update, run: pip install --upgrade pip\u001b[0m\n", "\u001b[34m[notice] A new release of pip is available: 23.1.1 -> 23.1.2\u001b[0m\n", "\u001b[34m[notice] To update, run: pip install --upgrade pip\u001b[0m\n", "\u001b[34m2023-07-02 09:05:02,535 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code.\u001b[0m\n", "\u001b[34m2023-07-02 09:05:02,535 sagemaker-training-toolkit INFO Done waiting for a return code. Received 0 from exiting process.\u001b[0m\n", "\u001b[34m2023-07-02 09:05:02,615 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)\u001b[0m\n", "\u001b[34m2023-07-02 09:05:02,704 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)\u001b[0m\n", "\u001b[34m2023-07-02 09:05:02,716 sagemaker-training-toolkit INFO Starting MPI run as worker node.\u001b[0m\n", "\u001b[34m2023-07-02 09:05:02,717 sagemaker-training-toolkit INFO Creating SSH daemon.\u001b[0m\n", "\u001b[34m2023-07-02 09:05:02,719 sagemaker-training-toolkit INFO Waiting for MPI workers to establish their SSH connections\u001b[0m\n", "\u001b[34m2023-07-02 09:05:02,720 sagemaker-training-toolkit INFO Cannot connect to host algo-2 at port 22. Retrying...\u001b[0m\n", "\u001b[34m2023-07-02 09:05:02,720 sagemaker-training-toolkit INFO Connection closed\u001b[0m\n", "\u001b[35m2023-07-02 09:05:02,802 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code.\u001b[0m\n", "\u001b[35m2023-07-02 09:05:02,803 sagemaker-training-toolkit INFO Done waiting for a return code. Received 0 from exiting process.\u001b[0m\n", "\u001b[35m2023-07-02 09:05:02,875 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)\u001b[0m\n", "\u001b[35m2023-07-02 09:05:02,959 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)\u001b[0m\n", "\u001b[35m2023-07-02 09:05:02,972 sagemaker-training-toolkit INFO Starting MPI run as worker node.\u001b[0m\n", "\u001b[35m2023-07-02 09:05:02,972 sagemaker-training-toolkit INFO Waiting for MPI Master to create SSH daemon.\u001b[0m\n", "\u001b[35m2023-07-02 09:05:02,985 paramiko.transport INFO Connected (version 2.0, client OpenSSH_8.2p1)\u001b[0m\n", "\u001b[35m2023-07-02 09:05:03,156 paramiko.transport INFO Authentication (publickey) successful!\u001b[0m\n", "\u001b[35m2023-07-02 09:05:03,156 sagemaker-training-toolkit INFO Can connect to host algo-1\u001b[0m\n", "\u001b[35m2023-07-02 09:05:03,156 sagemaker-training-toolkit INFO MPI Master online, creating SSH daemon.\u001b[0m\n", "\u001b[35m2023-07-02 09:05:03,156 sagemaker-training-toolkit INFO Writing environment variables to /etc/environment for the MPI process.\u001b[0m\n", "\u001b[35m2023-07-02 09:05:03,161 sagemaker-training-toolkit INFO Waiting for MPI process to finish.\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,734 paramiko.transport INFO Connected (version 2.0, client OpenSSH_8.2p1)\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,906 paramiko.transport INFO Authentication (publickey) successful!\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,906 sagemaker-training-toolkit INFO Can connect to host algo-2 at port 22\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,907 sagemaker-training-toolkit INFO Connection closed\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,907 sagemaker-training-toolkit INFO Worker algo-2 available for communication\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,907 sagemaker-training-toolkit INFO Network interface name: eth0\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,907 sagemaker-training-toolkit INFO Host: ['algo-1', 'algo-2']\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,908 sagemaker-training-toolkit INFO instance type: ml.p3.16xlarge\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,908 sagemaker-training-toolkit INFO Env Hosts: ['algo-1', 'algo-2'] Hosts: ['algo-1:8', 'algo-2:8'] process_per_hosts: 8 num_processes: 16\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,978 sagemaker-training-toolkit INFO No Neurons detected (normal if no neurons installed)\u001b[0m\n", "\u001b[34m2023-07-02 09:05:03,992 sagemaker-training-toolkit INFO Invoking user script\u001b[0m\n", "\u001b[34mTraining Env:\u001b[0m\n", "\u001b[34m{\n", " \"additional_framework_parameters\": {\n", " \"sagemaker_distributed_dataparallel_custom_mpi_options\": \"\",\n", " \"sagemaker_distributed_dataparallel_enabled\": true,\n", " \"sagemaker_instance_type\": \"ml.p3.16xlarge\"\n", " },\n", " \"channel_input_dirs\": {\n", " \"training\": \"/opt/ml/input/data/training\"\n", " },\n", " \"current_host\": \"algo-1\",\n", " \"current_instance_group\": \"homogeneousCluster\",\n", " \"current_instance_group_hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ],\n", " \"current_instance_type\": \"ml.p3.16xlarge\",\n", " \"distribution_hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ],\n", " \"distribution_instance_groups\": [\n", " \"homogeneousCluster\"\n", " ],\n", " \"framework_module\": \"sagemaker_pytorch_container.training:main\",\n", " \"hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ],\n", " \"hyperparameters\": {\n", " \"backend\": \"smddp\",\n", " \"batch-size\": 80,\n", " \"height\": 128,\n", " \"lr\": 0.0001,\n", " \"model_name\": \"swin_b\",\n", " \"num-classes\": 37,\n", " \"num-epochs\": 15,\n", " \"test-batch-size\": 200,\n", " \"width\": 128\n", " },\n", " \"input_config_dir\": \"/opt/ml/input/config\",\n", " \"input_data_config\": {\n", " \"training\": {\n", " \"TrainingInputMode\": \"File\",\n", " \"S3DistributionType\": \"FullyReplicated\",\n", " \"RecordWrapperType\": \"None\"\n", " }\n", " },\n", " \"input_dir\": \"/opt/ml/input\",\n", " \"instance_groups\": [\n", " \"homogeneousCluster\"\n", " ],\n", " \"instance_groups_dict\": {\n", " \"homogeneousCluster\": {\n", " \"instance_group_name\": \"homogeneousCluster\",\n", " \"instance_type\": \"ml.p3.16xlarge\",\n", " \"hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ]\n", " }\n", " },\n", " \"is_hetero\": false,\n", " \"is_master\": true,\n", " \"is_modelparallel_enabled\": null,\n", " \"is_smddpmprun_installed\": true,\n", " \"job_name\": \"oxford-ml-p3-16xlarge-0702-08581688288298\",\n", " \"log_level\": 20,\n", " \"master_hostname\": \"algo-1\",\n", " \"model_dir\": \"/opt/ml/model\",\n", " \"module_dir\": \"s3://sagemaker-us-west-2-322537213286/oxford-ml-p3-16xlarge-0702-08581688288298/source/sourcedir.tar.gz\",\n", " \"module_name\": \"pytorch_oxford_ddp\",\n", " \"network_interface_name\": \"eth0\",\n", " \"num_cpus\": 64,\n", " \"num_gpus\": 8,\n", " \"num_neurons\": 0,\n", " \"output_data_dir\": \"/opt/ml/output/data\",\n", " \"output_dir\": \"/opt/ml/output\",\n", " \"output_intermediate_dir\": \"/opt/ml/output/intermediate\",\n", " \"resource_config\": {\n", " \"current_host\": \"algo-1\",\n", " \"current_instance_type\": \"ml.p3.16xlarge\",\n", " \"current_group_name\": \"homogeneousCluster\",\n", " \"hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ],\n", " \"instance_groups\": [\n", " {\n", " \"instance_group_name\": \"homogeneousCluster\",\n", " \"instance_type\": \"ml.p3.16xlarge\",\n", " \"hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ]\n", " }\n", " ],\n", " \"network_interface_name\": \"eth0\"\n", " },\n", " \"user_entry_point\": \"pytorch_oxford_ddp.py\"\u001b[0m\n", "\u001b[34m}\u001b[0m\n", "\u001b[34mEnvironment variables:\u001b[0m\n", "\u001b[34mSM_HOSTS=[\"algo-1\",\"algo-2\"]\u001b[0m\n", "\u001b[34mSM_NETWORK_INTERFACE_NAME=eth0\u001b[0m\n", "\u001b[34mSM_HPS={\"backend\":\"smddp\",\"batch-size\":80,\"height\":128,\"lr\":0.0001,\"model_name\":\"swin_b\",\"num-classes\":37,\"num-epochs\":15,\"test-batch-size\":200,\"width\":128}\u001b[0m\n", "\u001b[34mSM_USER_ENTRY_POINT=pytorch_oxford_ddp.py\u001b[0m\n", "\u001b[34mSM_FRAMEWORK_PARAMS={\"sagemaker_distributed_dataparallel_custom_mpi_options\":\"\",\"sagemaker_distributed_dataparallel_enabled\":true,\"sagemaker_instance_type\":\"ml.p3.16xlarge\"}\u001b[0m\n", "\u001b[34mSM_RESOURCE_CONFIG={\"current_group_name\":\"homogeneousCluster\",\"current_host\":\"algo-1\",\"current_instance_type\":\"ml.p3.16xlarge\",\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_groups\":[{\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.p3.16xlarge\"}],\"network_interface_name\":\"eth0\"}\u001b[0m\n", "\u001b[34mSM_INPUT_DATA_CONFIG={\"training\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"}}\u001b[0m\n", "\u001b[34mSM_OUTPUT_DATA_DIR=/opt/ml/output/data\u001b[0m\n", "\u001b[34mSM_CHANNELS=[\"training\"]\u001b[0m\n", "\u001b[34mSM_CURRENT_HOST=algo-1\u001b[0m\n", "\u001b[34mSM_CURRENT_INSTANCE_TYPE=ml.p3.16xlarge\u001b[0m\n", "\u001b[34mSM_CURRENT_INSTANCE_GROUP=homogeneousCluster\u001b[0m\n", "\u001b[34mSM_CURRENT_INSTANCE_GROUP_HOSTS=[\"algo-1\",\"algo-2\"]\u001b[0m\n", "\u001b[34mSM_INSTANCE_GROUPS=[\"homogeneousCluster\"]\u001b[0m\n", "\u001b[34mSM_INSTANCE_GROUPS_DICT={\"homogeneousCluster\":{\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.p3.16xlarge\"}}\u001b[0m\n", "\u001b[34mSM_DISTRIBUTION_INSTANCE_GROUPS=[\"homogeneousCluster\"]\u001b[0m\n", "\u001b[34mSM_IS_HETERO=false\u001b[0m\n", "\u001b[34mSM_MODULE_NAME=pytorch_oxford_ddp\u001b[0m\n", "\u001b[34mSM_LOG_LEVEL=20\u001b[0m\n", "\u001b[34mSM_FRAMEWORK_MODULE=sagemaker_pytorch_container.training:main\u001b[0m\n", "\u001b[34mSM_INPUT_DIR=/opt/ml/input\u001b[0m\n", "\u001b[34mSM_INPUT_CONFIG_DIR=/opt/ml/input/config\u001b[0m\n", "\u001b[34mSM_OUTPUT_DIR=/opt/ml/output\u001b[0m\n", "\u001b[34mSM_NUM_CPUS=64\u001b[0m\n", "\u001b[34mSM_NUM_GPUS=8\u001b[0m\n", "\u001b[34mSM_NUM_NEURONS=0\u001b[0m\n", "\u001b[34mSM_MODEL_DIR=/opt/ml/model\u001b[0m\n", "\u001b[34mSM_MODULE_DIR=s3://sagemaker-us-west-2-322537213286/oxford-ml-p3-16xlarge-0702-08581688288298/source/sourcedir.tar.gz\u001b[0m\n", "\u001b[34mSM_TRAINING_ENV={\"additional_framework_parameters\":{\"sagemaker_distributed_dataparallel_custom_mpi_options\":\"\",\"sagemaker_distributed_dataparallel_enabled\":true,\"sagemaker_instance_type\":\"ml.p3.16xlarge\"},\"channel_input_dirs\":{\"training\":\"/opt/ml/input/data/training\"},\"current_host\":\"algo-1\",\"current_instance_group\":\"homogeneousCluster\",\"current_instance_group_hosts\":[\"algo-1\",\"algo-2\"],\"current_instance_type\":\"ml.p3.16xlarge\",\"distribution_hosts\":[\"algo-1\",\"algo-2\"],\"distribution_instance_groups\":[\"homogeneousCluster\"],\"framework_module\":\"sagemaker_pytorch_container.training:main\",\"hosts\":[\"algo-1\",\"algo-2\"],\"hyperparameters\":{\"backend\":\"smddp\",\"batch-size\":80,\"height\":128,\"lr\":0.0001,\"model_name\":\"swin_b\",\"num-classes\":37,\"num-epochs\":15,\"test-batch-size\":200,\"width\":128},\"input_config_dir\":\"/opt/ml/input/config\",\"input_data_config\":{\"training\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"}},\"input_dir\":\"/opt/ml/input\",\"instance_groups\":[\"homogeneousCluster\"],\"instance_groups_dict\":{\"homogeneousCluster\":{\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.p3.16xlarge\"}},\"is_hetero\":false,\"is_master\":true,\"is_modelparallel_enabled\":null,\"is_smddpmprun_installed\":true,\"job_name\":\"oxford-ml-p3-16xlarge-0702-08581688288298\",\"log_level\":20,\"master_hostname\":\"algo-1\",\"model_dir\":\"/opt/ml/model\",\"module_dir\":\"s3://sagemaker-us-west-2-322537213286/oxford-ml-p3-16xlarge-0702-08581688288298/source/sourcedir.tar.gz\",\"module_name\":\"pytorch_oxford_ddp\",\"network_interface_name\":\"eth0\",\"num_cpus\":64,\"num_gpus\":8,\"num_neurons\":0,\"output_data_dir\":\"/opt/ml/output/data\",\"output_dir\":\"/opt/ml/output\",\"output_intermediate_dir\":\"/opt/ml/output/intermediate\",\"resource_config\":{\"current_group_name\":\"homogeneousCluster\",\"current_host\":\"algo-1\",\"current_instance_type\":\"ml.p3.16xlarge\",\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_groups\":[{\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.p3.16xlarge\"}],\"network_interface_name\":\"eth0\"},\"user_entry_point\":\"pytorch_oxford_ddp.py\"}\u001b[0m\n", "\u001b[34mSM_USER_ARGS=[\"--backend\",\"smddp\",\"--batch-size\",\"80\",\"--height\",\"128\",\"--lr\",\"0.0001\",\"--model_name\",\"swin_b\",\"--num-classes\",\"37\",\"--num-epochs\",\"15\",\"--test-batch-size\",\"200\",\"--width\",\"128\"]\u001b[0m\n", "\u001b[34mSM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate\u001b[0m\n", "\u001b[34mSM_CHANNEL_TRAINING=/opt/ml/input/data/training\u001b[0m\n", "\u001b[34mSM_HP_BACKEND=smddp\u001b[0m\n", "\u001b[34mSM_HP_BATCH-SIZE=80\u001b[0m\n", "\u001b[34mSM_HP_HEIGHT=128\u001b[0m\n", "\u001b[34mSM_HP_LR=0.0001\u001b[0m\n", "\u001b[34mSM_HP_MODEL_NAME=swin_b\u001b[0m\n", "\u001b[34mSM_HP_NUM-CLASSES=37\u001b[0m\n", "\u001b[34mSM_HP_NUM-EPOCHS=15\u001b[0m\n", "\u001b[34mSM_HP_TEST-BATCH-SIZE=200\u001b[0m\n", "\u001b[34mSM_HP_WIDTH=128\u001b[0m\n", "\u001b[34mPYTHONPATH=/opt/ml/code:/opt/conda/bin:/opt/conda/lib/python39.zip:/opt/conda/lib/python3.9:/opt/conda/lib/python3.9/lib-dynload:/opt/conda/lib/python3.9/site-packages\u001b[0m\n", "\u001b[34mInvoking script with the following command:\u001b[0m\n", "\u001b[34mmpirun --host algo-1:8,algo-2:8 -np 16 --allow-run-as-root --tag-output --oversubscribe -mca btl_tcp_if_include eth0 -mca oob_tcp_if_include eth0 -mca plm_rsh_no_tree_spawn 1 -mca pml ob1 -mca btl ^openib -mca orte_abort_on_non_zero_status 1 -mca btl_vader_single_copy_mechanism none -mca plm_rsh_num_concurrent 2 -x NCCL_SOCKET_IFNAME=eth0 -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -x SMDATAPARALLEL_USE_HOMOGENEOUS=1 -x FI_PROVIDER=efa -x RDMAV_FORK_SAFE=1 -x LD_PRELOAD=/opt/conda/lib/python3.9/site-packages/gethostname.cpython-39-x86_64-linux-gnu.so -x SMDATAPARALLEL_SERVER_ADDR=algo-1 -x SMDATAPARALLEL_SERVER_PORT=7592 -x SAGEMAKER_INSTANCE_TYPE=ml.p3.16xlarge smddprun /opt/conda/bin/python3.9 -m mpi4py pytorch_oxford_ddp.py --backend smddp --batch-size 80 --height 128 --lr 0.0001 --model_name swin_b --num-classes 37 --num-epochs 15 --test-batch-size 200 --width 128\u001b[0m\n", "\u001b[34mWarning: Permanently added 'algo-2,10.0.134.115' (ECDSA) to the list of known hosts.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:curl: /opt/conda/lib/libcurl.so.4: no version information available (required by curl)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:curl: /opt/conda/lib/libcurl.so.4: no version information available (required by curl)\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:curl: /opt/conda/lib/libcurl.so.4: no version information available (required by curl)\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:curl: /opt/conda/lib/libcurl.so.4: no version information available (required by curl)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:curl: /opt/conda/lib/libcurl.so.4: no version information available (required by curl)\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:curl: /opt/conda/lib/libcurl.so.4: no version information available (required by curl)\u001b[0m\n", "\u001b[35m2023-07-02 09:05:06,172 sagemaker-training-toolkit INFO Process[es]: [psutil.Process(pid=68, name='orted', status='sleeping', started='09:05:04')]\u001b[0m\n", "\u001b[35m2023-07-02 09:05:06,173 sagemaker-training-toolkit INFO Orted process found [psutil.Process(pid=68, name='orted', status='sleeping', started='09:05:04')]\u001b[0m\n", "\u001b[35m2023-07-02 09:05:06,173 sagemaker-training-toolkit INFO Waiting for orted process [psutil.Process(pid=68, name='orted', status='sleeping', started='09:05:04')]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:start main function\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:DDP Mode\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Bootstrap : Using eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:NCCL version 2.14.3+cuda11.7\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO cudaDriverVersion 12000\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO NET/Socket : Using [0]eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Bootstrap : Using eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Bootstrap : Using eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO NET/Socket : Using [0]eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO NET/Socket : Using [0]eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Bootstrap : Using eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Bootstrap : Using eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Bootstrap : Using eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Bootstrap : Using eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Bootstrap : Using eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Bootstrap : Using eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Bootstrap : Using eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Bootstrap : Using eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Bootstrap : Using eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Bootstrap : Using eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Bootstrap : Using eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO NET/Socket : Using [0]eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Bootstrap : Using eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Bootstrap : Using eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO NET/Socket : Using [0]eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO NET/Socket : Using [0]eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO NET/Socket : Using [0]eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO NET/OFI Using aws-ofi-nccl 1.4.0aws\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO NET/OFI Setting FI_EFA_FORK_SAFE environment variable to 1\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO NET/Socket : Using [0]eth0:10.0.137.57<0>\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO NET/Socket : Using [0]eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO NET/Socket : Using [0]eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO NET/Socket : Using [0]eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO NET/Socket : Using [0]eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO NET/Socket : Using [0]eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO NET/Socket : Using [0]eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO NET/Socket : Using [0]eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] find_ofi_provider:608 NCCL WARN NET/OFI Couldn't find any optimal provider\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] ofi_init:1355 NCCL WARN NET/OFI aws-ofi-nccl initialization failed\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO NCCL_IB_DISABLE set by environment to 1.\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO NET/Socket : Using [0]eth0:10.0.134.115<0>\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Trees [0] 4/-1/-1->7->6 [1] 4/-1/-1->7->6\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Trees [0] -1/-1/-1->4->7 [1] -1/-1/-1->4->7\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Trees [0] 2/-1/-1->3->0 [1] 2/-1/-1->3->0\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Trees [0] 1/-1/-1->2->3 [1] 1/-1/-1->2->3\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Trees [0] 6/-1/-1->5->1 [1] 6/-1/-1->5->1\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Trees [0] 5/-1/-1->1->2 [1] 5/-1/-1->1->2\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/02 : 0 3 2 1 5 6 7 4 8 11 10 9 13 14 15 12\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/02 : 0 3 2 1 5 6 7 4 8 11 10 9 13 14 15 12\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Trees [0] 3/8/-1->0->-1 [1] 3/-1/-1->0->8\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Trees [0] 9/-1/-1->10->11 [1] 9/-1/-1->10->11\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Trees [0] 12/-1/-1->15->14 [1] 12/-1/-1->15->14\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Trees [0] -1/-1/-1->12->15 [1] -1/-1/-1->12->15\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Trees [0] 10/-1/-1->11->8 [1] 10/-1/-1->11->8\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Trees [0] 13/-1/-1->9->10 [1] 13/-1/-1->9->10\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Trees [0] 14/-1/-1->13->9 [1] 14/-1/-1->13->9\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Trees [0] 15/-1/-1->14->13 [1] 15/-1/-1->14->13\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Trees [0] 11/-1/-1->8->0 [1] 11/0/-1->8->-1\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 00/0 : 9[180] -> 13[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 00/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 00/0 : 13[1c0] -> 14[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 00/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 00/0 : 8[170] -> 11[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 01/0 : 9[180] -> 13[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 01/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 01/0 : 8[170] -> 11[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 01/0 : 13[1c0] -> 14[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 01/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 00/0 : 10[190] -> 9[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 00/0 : 14[1d0] -> 15[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 00/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 00/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 00/0 : 12[1b0] -> 0[170] [send] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 01/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 01/0 : 10[190] -> 9[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 01/0 : 14[1d0] -> 15[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 00/0 : 4[1b0] -> 8[170] [send] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 00/0 : 11[1a0] -> 10[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 01/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 00/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 01/0 : 11[1a0] -> 10[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 00/0 : 15[1e0] -> 12[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 01/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 00/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 01/0 : 15[1e0] -> 12[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 01/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 00/0 : 14[1d0] -> 13[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 00/0 : 13[1c0] -> 9[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 01/0 : 14[1d0] -> 13[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 01/0 : 13[1c0] -> 9[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 00/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 00/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 01/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 01/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 00/0 : 9[180] -> 10[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 01/0 : 9[180] -> 10[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 00/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 00/0 : 10[190] -> 11[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 01/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 01/0 : 10[190] -> 11[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 00/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 01/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 00/0 : 11[1a0] -> 8[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 01/0 : 11[1a0] -> 8[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 00/1 : 10[190] -> 12[1b0] via P2P/indirect/8[170]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 00/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 00/1 : 9[180] -> 12[1b0] via P2P/indirect/8[170]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 00/1 : 2[190] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 01/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 00/1 : 1[180] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 01/1 : 10[190] -> 12[1b0] via P2P/indirect/8[170]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 01/1 : 2[190] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 01/1 : 9[180] -> 12[1b0] via P2P/indirect/8[170]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 01/1 : 1[180] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 01/0 : 4[1b0] -> 8[170] [send] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 01/0 : 12[1b0] -> 0[170] [send] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:1288 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 00/0 : 4[1b0] -> 8[170] [receive] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:1355 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/0 : 12[1b0] -> 0[170] [receive] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:1288 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 01/0 : 4[1b0] -> 8[170] [receive] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:1355 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/0 : 12[1b0] -> 0[170] [receive] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 00/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 01/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 00/0 : 12[1b0] -> 15[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 01/0 : 12[1b0] -> 15[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 00/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 01/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 00/0 : 15[1e0] -> 14[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 01/0 : 15[1e0] -> 14[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:1288 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 00/0 : 0[170] -> 8[170] [receive] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:1355 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/0 : 8[170] -> 0[170] [receive] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:1288 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 01/0 : 0[170] -> 8[170] [receive] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:1355 [0] NCCL INFO NET/Socket: Using 2 threads and 8 sockets per thread\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/0 : 8[170] -> 0[170] [receive] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 00/0 : 8[170] -> 0[170] [send] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/0 : 0[170] -> 8[170] [send] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 01/0 : 8[170] -> 0[170] [send] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/0 : 0[170] -> 8[170] [send] via NET/Socket/0\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 00/1 : 8[170] -> 13[1c0] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 00/1 : 11[1a0] -> 12[1b0] via P2P/indirect/8[170]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/1 : 0[170] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 00/1 : 3[1a0] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 01/1 : 11[1a0] -> 12[1b0] via P2P/indirect/8[170]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 01/1 : 8[170] -> 13[1c0] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 01/1 : 3[1a0] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/1 : 0[170] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 00/1 : 11[1a0] -> 13[1c0] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 00/1 : 3[1a0] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 01/1 : 11[1a0] -> 13[1c0] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 01/1 : 3[1a0] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 00/1 : 11[1a0] -> 14[1d0] via P2P/indirect/15[1e0]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 00/1 : 10[190] -> 13[1c0] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 00/1 : 3[1a0] -> 6[1d0] via P2P/indirect/7[1e0]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 00/1 : 2[190] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 01/1 : 11[1a0] -> 14[1d0] via P2P/indirect/15[1e0]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 01/1 : 10[190] -> 13[1c0] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 01/1 : 2[190] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 01/1 : 3[1a0] -> 6[1d0] via P2P/indirect/7[1e0]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 00/1 : 12[1b0] -> 9[180] via P2P/indirect/13[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 00/1 : 9[180] -> 14[1d0] via P2P/indirect/13[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 00/1 : 10[190] -> 15[1e0] via P2P/indirect/14[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 00/1 : 4[1b0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 00/1 : 1[180] -> 6[1d0] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 00/1 : 2[190] -> 7[1e0] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 01/1 : 12[1b0] -> 9[180] via P2P/indirect/13[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 01/1 : 4[1b0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 01/1 : 9[180] -> 14[1d0] via P2P/indirect/13[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 01/1 : 10[190] -> 15[1e0] via P2P/indirect/14[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 01/1 : 1[180] -> 6[1d0] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 01/1 : 2[190] -> 7[1e0] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 00/1 : 9[180] -> 15[1e0] via P2P/indirect/11[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 00/1 : 13[1c0] -> 8[170] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 00/1 : 8[170] -> 14[1d0] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/1 : 0[170] -> 6[1d0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 00/1 : 5[1c0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 01/1 : 13[1c0] -> 8[170] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 00/1 : 1[180] -> 7[1e0] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 01/1 : 5[1c0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 01/1 : 9[180] -> 15[1e0] via P2P/indirect/11[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/1 : 0[170] -> 6[1d0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 01/1 : 8[170] -> 14[1d0] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 01/1 : 1[180] -> 7[1e0] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 00/1 : 8[170] -> 15[1e0] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 00/1 : 14[1d0] -> 8[170] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 01/1 : 8[170] -> 15[1e0] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 00/1 : 6[1d0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/1 : 0[170] -> 7[1e0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 01/1 : 14[1d0] -> 8[170] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 01/1 : 6[1d0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/1 : 0[170] -> 7[1e0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 00/1 : 15[1e0] -> 8[170] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 01/1 : 15[1e0] -> 8[170] via P2P/indirect/12[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 00/1 : 7[1e0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 01/1 : 7[1e0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 00/1 : 15[1e0] -> 9[180] via P2P/indirect/13[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 01/1 : 15[1e0] -> 9[180] via P2P/indirect/13[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 00/1 : 7[1e0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 01/1 : 7[1e0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 00/1 : 14[1d0] -> 9[180] via P2P/indirect/13[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 00/1 : 15[1e0] -> 10[190] via P2P/indirect/11[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 00/1 : 7[1e0] -> 2[190] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 01/1 : 14[1d0] -> 9[180] via P2P/indirect/13[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 00/1 : 6[1d0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 01/1 : 15[1e0] -> 10[190] via P2P/indirect/11[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 01/1 : 7[1e0] -> 2[190] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 01/1 : 6[1d0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 00/1 : 13[1c0] -> 10[190] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 00/1 : 5[1c0] -> 2[190] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 00/1 : 14[1d0] -> 11[1a0] via P2P/indirect/10[190]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 01/1 : 13[1c0] -> 10[190] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 00/1 : 6[1d0] -> 3[1a0] via P2P/indirect/2[190]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 01/1 : 14[1d0] -> 11[1a0] via P2P/indirect/10[190]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 01/1 : 5[1c0] -> 2[190] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 01/1 : 6[1d0] -> 3[1a0] via P2P/indirect/2[190]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 00/1 : 13[1c0] -> 11[1a0] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 00/1 : 5[1c0] -> 3[1a0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 00/1 : 12[1b0] -> 10[190] via P2P/indirect/14[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 00/1 : 4[1b0] -> 2[190] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 01/1 : 13[1c0] -> 11[1a0] via P2P/indirect/9[180]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 01/1 : 12[1b0] -> 10[190] via P2P/indirect/14[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 01/1 : 4[1b0] -> 2[190] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 00/1 : 12[1b0] -> 11[1a0] via P2P/indirect/8[170]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 01/1 : 5[1c0] -> 3[1a0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 01/1 : 12[1b0] -> 11[1a0] via P2P/indirect/8[170]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 00/1 : 4[1b0] -> 3[1a0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 01/1 : 4[1b0] -> 3[1a0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO comm 0x561378d17780 rank 2 nranks 16 cudaDev 2 busId 190 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO comm 0x56024a3c5210 rank 1 nranks 16 cudaDev 1 busId 180 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO comm 0x56039d2aa540 rank 12 nranks 16 cudaDev 4 busId 1b0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO comm 0x55707bbdf320 rank 15 nranks 16 cudaDev 7 busId 1e0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO comm 0x5600dc60cb50 rank 5 nranks 16 cudaDev 5 busId 1c0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO comm 0x5572e795d650 rank 9 nranks 16 cudaDev 1 busId 180 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO comm 0x564db584f6f0 rank 6 nranks 16 cudaDev 6 busId 1d0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO comm 0x56242eef3780 rank 10 nranks 16 cudaDev 2 busId 190 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO comm 0x56488c875190 rank 4 nranks 16 cudaDev 4 busId 1b0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO comm 0x559ef39502d0 rank 14 nranks 16 cudaDev 6 busId 1d0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO comm 0x565245beb570 rank 8 nranks 16 cudaDev 0 busId 170 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO comm 0x562722c4e440 rank 7 nranks 16 cudaDev 7 busId 1e0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:NCCL version 2.14.3+cuda11.7\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO comm 0x564bf2a9f520 rank 3 nranks 16 cudaDev 3 busId 1a0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO comm 0x560bb622a530 rank 0 nranks 16 cudaDev 0 busId 170 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO comm 0x55d3d168cde0 rank 13 nranks 16 cudaDev 5 busId 1c0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO comm 0x559039580860 rank 11 nranks 16 cudaDev 3 busId 1a0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Using network Socket\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Trees [0] 6/-1/-1->5->1 [1] 6/-1/-1->5->1 [2] 1/-1/-1->5->6 [3] 1/-1/-1->5->6 [4] 4/-1/-1->5->7 [5] 7/-1/-1->5->4 [6] 6/-1/-1->5->1 [7] 6/-1/-1->5->1 [8] 1/-1/-1->5->6 [9] 1/-1/-1->5->6 [10] 4/-1/-1->5->7 [11] 7/-1/-1->5->4\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 00/12 : 0 3 2 1 5 6 7 4\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 01/12 : 0 3 2 1 5 6 7 4\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 02/12 : 0 4 7 6 5 1 2 3\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 03/12 : 0 4 7 6 5 1 2 3\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 04/12 : 0 1 3 7 5 4 6 2\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 05/12 : 0 2 6 4 5 7 3 1\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 06/12 : 0 3 2 1 5 6 7 4\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 07/12 : 0 3 2 1 5 6 7 4\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 08/12 : 0 4 7 6 5 1 2 3\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 09/12 : 0 4 7 6 5 1 2 3\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 10/12 : 0 1 3 7 5 4 6 2\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 11/12 : 0 2 6 4 5 7 3 1\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Trees [0] 3/-1/-1->0->-1 [1] 3/-1/-1->0->-1 [2] 4/-1/-1->0->-1 [3] 4/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 2/-1/-1->0->-1 [6] 3/-1/-1->0->-1 [7] 3/-1/-1->0->-1 [8] 4/-1/-1->0->-1 [9] 4/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 2/-1/-1->0->-1\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Trees [0] 1/-1/-1->2->3 [1] 1/-1/-1->2->3 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] -1/-1/-1->2->6 [5] 6/-1/-1->2->0 [6] 1/-1/-1->2->3 [7] 1/-1/-1->2->3 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] -1/-1/-1->2->6 [11] 6/-1/-1->2->0\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Trees [0] 5/-1/-1->1->2 [1] 5/-1/-1->1->2 [2] 2/-1/-1->1->5 [3] 2/-1/-1->1->5 [4] 3/-1/-1->1->0 [5] -1/-1/-1->1->3 [6] 5/-1/-1->1->2 [7] 5/-1/-1->1->2 [8] 2/-1/-1->1->5 [9] 2/-1/-1->1->5 [10] 3/-1/-1->1->0 [11] -1/-1/-1->1->3\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Trees [0] 4/-1/-1->7->6 [1] 4/-1/-1->7->6 [2] 6/-1/-1->7->4 [3] 6/-1/-1->7->4 [4] 5/-1/-1->7->3 [5] 3/-1/-1->7->5 [6] 4/-1/-1->7->6 [7] 4/-1/-1->7->6 [8] 6/-1/-1->7->4 [9] 6/-1/-1->7->4 [10] 5/-1/-1->7->3 [11] 3/-1/-1->7->5\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 5/-1/-1->6->7 [3] 5/-1/-1->6->7 [4] 2/-1/-1->6->4 [5] 4/-1/-1->6->2 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 5/-1/-1->6->7 [9] 5/-1/-1->6->7 [10] 2/-1/-1->6->4 [11] 4/-1/-1->6->2\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Trees [0] -1/-1/-1->4->7 [1] -1/-1/-1->4->7 [2] 7/-1/-1->4->0 [3] 7/-1/-1->4->0 [4] 6/-1/-1->4->5 [5] 5/-1/-1->4->6 [6] -1/-1/-1->4->7 [7] -1/-1/-1->4->7 [8] 7/-1/-1->4->0 [9] 7/-1/-1->4->0 [10] 6/-1/-1->4->5 [11] 5/-1/-1->4->6\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Trees [0] 2/-1/-1->3->0 [1] 2/-1/-1->3->0 [2] -1/-1/-1->3->2 [3] -1/-1/-1->3->2 [4] 7/-1/-1->3->1 [5] 1/-1/-1->3->7 [6] 2/-1/-1->3->0 [7] 2/-1/-1->3->0 [8] -1/-1/-1->3->2 [9] -1/-1/-1->3->2 [10] 7/-1/-1->3->1 [11] 1/-1/-1->3->7\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 05/0 : 4[1b0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 04/0 : 0[170] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 10/0 : 0[170] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 11/0 : 4[1b0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 5/-1/-1->6->7 [3] 5/-1/-1->6->7 [4] 2/-1/-1->6->4 [5] 4/-1/-1->6->2 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 5/-1/-1->6->7 [9] 5/-1/-1->6->7 [10] 2/-1/-1->6->4 [11] 4/-1/-1->6->2\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Trees [0] 2/-1/-1->3->0 [1] 2/-1/-1->3->0 [2] -1/-1/-1->3->2 [3] -1/-1/-1->3->2 [4] 7/-1/-1->3->1 [5] 1/-1/-1->3->7 [6] 2/-1/-1->3->0 [7] 2/-1/-1->3->0 [8] -1/-1/-1->3->2 [9] -1/-1/-1->3->2 [10] 7/-1/-1->3->1 [11] 1/-1/-1->3->7\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/12 : 0 3 2 1 5 6 7 4\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/12 : 0 3 2 1 5 6 7 4\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 02/12 : 0 4 7 6 5 1 2 3\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 03/12 : 0 4 7 6 5 1 2 3\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 04/12 : 0 1 3 7 5 4 6 2\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 05/12 : 0 2 6 4 5 7 3 1\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 06/12 : 0 3 2 1 5 6 7 4\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 07/12 : 0 3 2 1 5 6 7 4\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 08/12 : 0 4 7 6 5 1 2 3\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 09/12 : 0 4 7 6 5 1 2 3\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 10/12 : 0 1 3 7 5 4 6 2\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 11/12 : 0 2 6 4 5 7 3 1\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Trees [0] -1/-1/-1->4->7 [1] -1/-1/-1->4->7 [2] 7/-1/-1->4->0 [3] 7/-1/-1->4->0 [4] 6/-1/-1->4->5 [5] 5/-1/-1->4->6 [6] -1/-1/-1->4->7 [7] -1/-1/-1->4->7 [8] 7/-1/-1->4->0 [9] 7/-1/-1->4->0 [10] 6/-1/-1->4->5 [11] 5/-1/-1->4->6\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Trees [0] 5/-1/-1->1->2 [1] 5/-1/-1->1->2 [2] 2/-1/-1->1->5 [3] 2/-1/-1->1->5 [4] 3/-1/-1->1->0 [5] -1/-1/-1->1->3 [6] 5/-1/-1->1->2 [7] 5/-1/-1->1->2 [8] 2/-1/-1->1->5 [9] 2/-1/-1->1->5 [10] 3/-1/-1->1->0 [11] -1/-1/-1->1->3\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Trees [0] 4/-1/-1->7->6 [1] 4/-1/-1->7->6 [2] 6/-1/-1->7->4 [3] 6/-1/-1->7->4 [4] 5/-1/-1->7->3 [5] 3/-1/-1->7->5 [6] 4/-1/-1->7->6 [7] 4/-1/-1->7->6 [8] 6/-1/-1->7->4 [9] 6/-1/-1->7->4 [10] 5/-1/-1->7->3 [11] 3/-1/-1->7->5\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Trees [0] 1/-1/-1->2->3 [1] 1/-1/-1->2->3 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] -1/-1/-1->2->6 [5] 6/-1/-1->2->0 [6] 1/-1/-1->2->3 [7] 1/-1/-1->2->3 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] -1/-1/-1->2->6 [11] 6/-1/-1->2->0\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Trees [0] 6/-1/-1->5->1 [1] 6/-1/-1->5->1 [2] 1/-1/-1->5->6 [3] 1/-1/-1->5->6 [4] 4/-1/-1->5->7 [5] 7/-1/-1->5->4 [6] 6/-1/-1->5->1 [7] 6/-1/-1->5->1 [8] 1/-1/-1->5->6 [9] 1/-1/-1->5->6 [10] 4/-1/-1->5->7 [11] 7/-1/-1->5->4\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Trees [0] 3/-1/-1->0->-1 [1] 3/-1/-1->0->-1 [2] 4/-1/-1->0->-1 [3] 4/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 2/-1/-1->0->-1 [6] 3/-1/-1->0->-1 [7] 3/-1/-1->0->-1 [8] 4/-1/-1->0->-1 [9] 4/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 2/-1/-1->0->-1\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 05/0 : 4[1b0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 04/0 : 0[170] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 00/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 02/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 01/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 03/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 00/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 02/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 06/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 08/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 01/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 03/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 11/0 : 4[1b0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 09/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 07/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 06/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 08/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 07/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 09/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 00/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 02/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 05/0 : 0[170] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 04/0 : 4[1b0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 03/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 01/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 02/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 06/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 08/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 11/0 : 0[170] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 00/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 03/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 09/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 07/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 10/0 : 4[1b0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 01/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 08/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 09/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 06/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 04/0 : 4[1b0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 07/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 10/0 : 4[1b0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 05/0 : 5[1c0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 11/0 : 5[1c0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 04/0 : 1[180] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 10/0 : 1[180] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 05/0 : 5[1c0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 11/0 : 5[1c0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 02/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 03/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 02/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 03/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 00/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 04/0 : 6[1d0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 08/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 04/0 : 6[1d0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 08/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 02/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 10/0 : 0[170] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 10/0 : 6[1d0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 09/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 10/0 : 6[1d0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 09/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 01/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 03/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 08/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 05/0 : 2[190] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 06/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 00/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 05/0 : 0[170] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 11/0 : 2[190] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 09/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 07/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 02/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 01/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 11/0 : 0[170] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 04/0 : 1[180] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 03/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 06/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 10/0 : 1[180] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 05/0 : 7[1e0] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 00/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 08/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 11/0 : 7[1e0] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 07/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 01/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 04/0 : 2[190] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 05/0 : 6[1d0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 05/0 : 2[190] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 00/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 06/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 09/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 10/0 : 2[190] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 11/0 : 2[190] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 11/0 : 6[1d0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 01/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 07/0 : 0[170] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 05/0 : 7[1e0] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 06/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 11/0 : 7[1e0] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 07/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 04/0 : 3[1a0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 00/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 00/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 04/0 : 2[190] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 05/0 : 6[1d0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 10/0 : 3[1a0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 01/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 01/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 10/0 : 2[190] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 11/0 : 6[1d0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 06/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 02/0 : 0[170] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 06/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 07/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 03/0 : 0[170] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 07/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 04/0 : 3[1a0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 00/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 10/0 : 3[1a0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 08/0 : 0[170] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 02/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 01/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 03/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 09/0 : 0[170] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 08/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 02/0 : 0[170] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 06/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 07/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 09/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 03/0 : 0[170] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 00/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 08/0 : 0[170] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 02/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 09/0 : 0[170] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 01/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 03/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 06/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 08/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 07/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 09/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 05/0 : 3[1a0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 04/0 : 7[1e0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 11/0 : 3[1a0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 10/0 : 7[1e0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 00/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 05/0 : 3[1a0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 04/0 : 7[1e0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 02/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 11/0 : 3[1a0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 10/0 : 7[1e0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 01/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 03/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 06/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 00/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 08/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 07/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 09/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 05/0 : 1[180] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 02/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 00/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 02/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 04/0 : 5[1c0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 11/0 : 1[180] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 01/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 03/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 01/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 10/0 : 5[1c0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 08/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 06/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 03/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 06/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 09/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 07/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 04/0 : 4[1b0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 08/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 07/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 09/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 05/0 : 1[180] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 10/0 : 4[1b0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 00/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 02/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 04/0 : 5[1c0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 11/0 : 1[180] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 01/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 03/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 10/0 : 5[1c0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 06/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 08/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 07/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 09/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 04/0 : 4[1b0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 00/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 10/0 : 4[1b0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 01/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 02/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 06/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 03/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 00/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 07/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 08/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 09/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 01/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 00/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 02/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 01/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 03/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 06/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 05/0 : 4[1b0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 06/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 08/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 07/0 : 1[180] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 11/0 : 4[1b0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 07/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 09/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 02/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 03/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Connected all rings\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 08/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 09/0 : 6[1d0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 00/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 02/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 05/0 : 1[180] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 01/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 03/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 06/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 08/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 07/0 : 2[190] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 11/0 : 1[180] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 09/0 : 5[1c0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 04/0 : 5[1c0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 10/0 : 5[1c0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 05/0 : 4[1b0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 04/0 : 2[190] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 05/0 : 1[180] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 11/0 : 4[1b0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 10/0 : 2[190] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 05/0 : 3[1a0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 00/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 11/0 : 1[180] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 11/0 : 3[1a0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 01/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 04/0 : 5[1c0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 02/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 05/0 : 6[1d0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 06/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 03/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 10/0 : 5[1c0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 07/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 11/0 : 6[1d0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 08/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 09/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 04/0 : 2[190] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 05/0 : 3[1a0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 10/0 : 2[190] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 00/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 04/0 : 6[1d0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 05/0 : 2[190] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 11/0 : 3[1a0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 10/0 : 6[1d0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 00/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 11/0 : 2[190] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 01/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 06/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 01/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 00/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 07/0 : 4[1b0] -> 7[1e0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 02/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 06/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 01/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 05/0 : 6[1d0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 07/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 03/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 06/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 04/0 : 7[1e0] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 08/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 11/0 : 6[1d0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 02/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 10/0 : 7[1e0] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 07/0 : 5[1c0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 09/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 03/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 02/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 08/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 03/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 09/0 : 1[180] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 04/0 : 6[1d0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 05/0 : 2[190] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 04/0 : 7[1e0] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 08/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 10/0 : 6[1d0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 10/0 : 7[1e0] -> 3[1a0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 09/0 : 4[1b0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 11/0 : 2[190] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 02/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 00/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 03/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 01/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 08/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 06/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 09/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 07/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 02/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 00/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 03/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 01/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 05/0 : 7[1e0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 04/0 : 3[1a0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 11/0 : 7[1e0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 08/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 06/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 10/0 : 3[1a0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 09/0 : 7[1e0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 07/0 : 3[1a0] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 00/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 02/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 05/0 : 7[1e0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 04/0 : 3[1a0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 01/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 11/0 : 7[1e0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 10/0 : 3[1a0] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 03/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 06/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 08/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 07/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 00/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 09/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 05/0 : 5[1c0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 00/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 02/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 04/0 : 1[180] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 02/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 01/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 11/0 : 5[1c0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 10/0 : 1[180] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 03/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 06/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 08/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 01/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 03/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 07/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 09/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 06/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 08/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 07/0 : 7[1e0] -> 6[1d0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 09/0 : 3[1a0] -> 2[190] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 05/0 : 5[1c0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 04/0 : 1[180] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 02/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 00/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 10/0 : 1[180] -> 0[170] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 11/0 : 5[1c0] -> 4[1b0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 03/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 01/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 08/1 : 7[1e0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 08/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 06/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 08/1 : 3[1a0] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 09/0 : 2[190] -> 1[180] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 07/0 : 6[1d0] -> 5[1c0] via P2P/IPC\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 04/1 : 6[1d0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 04/1 : 2[190] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 09/1 : 3[1a0] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 09/1 : 7[1e0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 08/1 : 3[1a0] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 04/1 : 7[1e0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 04/1 : 3[1a0] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 05/1 : 6[1d0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 08/1 : 7[1e0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 05/1 : 2[190] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 05/1 : 7[1e0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 05/1 : 3[1a0] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 04/1 : 2[190] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Connected all trees\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO 12 coll channels, 16 p2p channels, 2 p2p channels per peer\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 04/1 : 6[1d0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 09/1 : 7[1e0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 09/1 : 3[1a0] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 12/1 : 7[1e0] -> 2[190] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 12/1 : 3[1a0] -> 6[1d0] via P2P/indirect/7[1e0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 04/1 : 7[1e0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 05/1 : 6[1d0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 04/1 : 3[1a0] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 05/1 : 2[190] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 12/1 : 5[1c0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO Channel 13/1 : 7[1e0] -> 2[190] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 05/1 : 7[1e0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO Channel 13/1 : 3[1a0] -> 6[1d0] via P2P/indirect/7[1e0]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 05/1 : 3[1a0] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 12/1 : 1[180] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 12/1 : 7[1e0] -> 2[190] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 12/1 : 6[1d0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 13/1 : 5[1c0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 13/1 : 1[180] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 12/1 : 2[190] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 12/1 : 3[1a0] -> 6[1d0] via P2P/indirect/7[1e0]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 10/1 : 0[170] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO Channel 13/1 : 7[1e0] -> 2[190] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 13/1 : 6[1d0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 11/1 : 0[170] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 10/1 : 4[1b0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 13/1 : 2[190] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO Channel 13/1 : 3[1a0] -> 6[1d0] via P2P/indirect/7[1e0]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 12/1 : 5[1c0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 11/1 : 4[1b0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 12/1 : 6[1d0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 12/1 : 1[180] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 12/1 : 2[190] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 13/1 : 5[1c0] -> 0[170] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 13/1 : 6[1d0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 13/1 : 1[180] -> 4[1b0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 13/1 : 2[190] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 10/1 : 6[1d0] -> 3[1a0] via P2P/indirect/2[190]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 10/1 : 0[170] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 10/1 : 1[180] -> 6[1d0] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO Channel 11/1 : 6[1d0] -> 3[1a0] via P2P/indirect/2[190]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 10/1 : 4[1b0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 11/1 : 1[180] -> 6[1d0] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 10/1 : 5[1c0] -> 2[190] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 10/1 : 2[190] -> 7[1e0] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 11/1 : 0[170] -> 5[1c0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 06/1 : 4[1b0] -> 2[190] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 11/1 : 5[1c0] -> 2[190] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 11/1 : 4[1b0] -> 1[180] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 06/1 : 1[180] -> 7[1e0] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO Channel 11/1 : 2[190] -> 7[1e0] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 07/1 : 4[1b0] -> 2[190] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 10/1 : 2[190] -> 7[1e0] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO Channel 07/1 : 1[180] -> 7[1e0] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 06/1 : 0[170] -> 6[1d0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 10/1 : 6[1d0] -> 3[1a0] via P2P/indirect/2[190]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 10/1 : 5[1c0] -> 2[190] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO Channel 11/1 : 2[190] -> 7[1e0] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 10/1 : 1[180] -> 6[1d0] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 11/1 : 5[1c0] -> 2[190] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO Channel 11/1 : 6[1d0] -> 3[1a0] via P2P/indirect/2[190]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 07/1 : 0[170] -> 6[1d0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 11/1 : 1[180] -> 6[1d0] via P2P/indirect/5[1c0]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 06/1 : 5[1c0] -> 3[1a0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 14/1 : 0[170] -> 7[1e0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO Channel 07/1 : 5[1c0] -> 3[1a0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 06/1 : 0[170] -> 6[1d0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO Channel 15/1 : 0[170] -> 7[1e0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 06/1 : 4[1b0] -> 2[190] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 14/1 : 4[1b0] -> 3[1a0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 07/1 : 0[170] -> 6[1d0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 06/1 : 5[1c0] -> 3[1a0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO Channel 15/1 : 4[1b0] -> 3[1a0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 07/1 : 4[1b0] -> 2[190] via P2P/indirect/6[1d0]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 06/1 : 1[180] -> 7[1e0] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO Channel 07/1 : 5[1c0] -> 3[1a0] via P2P/indirect/1[180]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO Channel 07/1 : 1[180] -> 7[1e0] via P2P/indirect/3[1a0]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 14/1 : 0[170] -> 7[1e0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 14/1 : 4[1b0] -> 3[1a0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO Channel 15/1 : 0[170] -> 7[1e0] via P2P/indirect/4[1b0]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO Channel 15/1 : 4[1b0] -> 3[1a0] via P2P/indirect/0[170]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:algo-2:217:217 [6] NCCL INFO comm 0x559ef3b4b330 rank 6 nranks 8 cudaDev 6 busId 1d0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:algo-2:212:212 [3] NCCL INFO comm 0x55903977b8c0 rank 3 nranks 8 cudaDev 3 busId 1a0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:algo-2:213:213 [1] NCCL INFO comm 0x5572e7b586b0 rank 1 nranks 8 cudaDev 1 busId 180 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:algo-2:208:208 [7] NCCL INFO comm 0x55707bdda380 rank 7 nranks 8 cudaDev 7 busId 1e0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:algo-2:605:605 [0] NCCL INFO comm 0x565245de65d0 rank 0 nranks 8 cudaDev 0 busId 170 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:algo-2:216:216 [5] NCCL INFO comm 0x55d3d1887e40 rank 5 nranks 8 cudaDev 5 busId 1c0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:algo-2:142:142 [4] NCCL INFO comm 0x56039d4a55a0 rank 4 nranks 8 cudaDev 4 busId 1b0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:algo-2:209:209 [2] NCCL INFO comm 0x56242f0ee7e0 rank 2 nranks 8 cudaDev 2 busId 190 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:algo-1:339:339 [1] NCCL INFO comm 0x56024a5c0270 rank 1 nranks 8 cudaDev 1 busId 180 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:algo-1:342:342 [5] NCCL INFO comm 0x5600dc807bb0 rank 5 nranks 8 cudaDev 5 busId 1c0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:algo-1:208:208 [3] NCCL INFO comm 0x564bf2c9a580 rank 3 nranks 8 cudaDev 3 busId 1a0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:algo-1:210:210 [4] NCCL INFO comm 0x56488ca701f0 rank 4 nranks 8 cudaDev 4 busId 1b0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:algo-1:206:206 [2] NCCL INFO comm 0x561378f127e0 rank 2 nranks 8 cudaDev 2 busId 190 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:algo-1:343:343 [6] NCCL INFO comm 0x564db5a4a750 rank 6 nranks 8 cudaDev 6 busId 1d0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:algo-1:669:669 [0] NCCL INFO comm 0x560bb2c9aae0 rank 0 nranks 8 cudaDev 0 busId 170 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Running smdistributed.dataparallel v1.7.0\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:SMDDP: Multi node ENA mode\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:algo-1:344:344 [7] NCCL INFO comm 0x562722e494a0 rank 7 nranks 8 cudaDev 7 busId 1e0 - Init COMPLETE\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 8\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 9\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 10\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 11\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 6\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 7\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 5\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 4\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 12\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 13\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 14\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 15\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:INFO:torch.distributed.distributed_c10d:Rank 8: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:INFO:torch.distributed.distributed_c10d:Rank 12: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:INFO:torch.distributed.distributed_c10d:Rank 10: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:INFO:torch.distributed.distributed_c10d:Rank 14: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:INFO:torch.distributed.distributed_c10d:Rank 13: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:INFO:torch.distributed.distributed_c10d:Rank 15: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:INFO:torch.distributed.distributed_c10d:Rank 11: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:INFO:torch.distributed.distributed_c10d:Rank 9: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:INFO:torch.distributed.distributed_c10d:Rank 6: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:INFO:torch.distributed.distributed_c10d:Rank 7: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:INFO:torch.distributed.distributed_c10d:Rank 5: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:INFO:torch.distributed.distributed_c10d:Rank 4: Completed store-based barrier for key:store_based_barrier_key:1 with 16 nodes.\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:=> using pre-trained model 'swin_b'\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]: warnings.warn(\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=Swin_B_Weights.IMAGENET1K_V1`. You can also use `weights=Swin_B_Weights.DEFAULT` to get the most up-to-date weights.\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]: warnings.warn(msg)\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:#015 0%| | 0.00/335M [00:00:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:Downloading: \"https://download.pytorch.org/models/swin_b-68c6b09e.pth\" to /root/.cache/torch/hub/checkpoints/swin_b-68c6b09e.pth\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 2%|▏ | 6.88M/335M [00:00<00:04, 71.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 2%|▏ | 7.32M/335M [00:00<00:04, 76.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 2%|▏ | 7.49M/335M [00:00<00:04, 78.5MB/s][1,mpirank:11,algo-2]:#015 2%|▏ | 7.51M/335M [00:00<00:04, 78.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 2%|▏ | 7.59M/335M [00:00<00:04, 79.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 2%|▏ | 7.84M/335M [00:00<00:04, 81.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 0%| | 0.00/335M [00:00:#015 3%|▎ | 8.56M/335M [00:00<00:03, 89.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 3%|▎ | 8.73M/335M [00:00<00:03, 91.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 3%|▎ | 8.91M/335M [00:00<00:03, 93.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 3%|▎ | 9.07M/335M [00:00<00:03, 95.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 3%|▎ | 8.92M/335M [00:00<00:03, 93.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 3%|▎ | 9.13M/335M [00:00<00:03, 95.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 3%|▎ | 9.23M/335M [00:00<00:03, 96.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 0%| | 0.00/335M [00:00:#015 3%|▎ | 9.50M/335M [00:00<00:03, 99.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 4%|▍ | 13.7M/335M [00:00<00:04, 68.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 4%|▍ | 14.5M/335M [00:00<00:02, 152MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 4%|▍ | 14.6M/335M [00:00<00:04, 70.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 4%|▍ | 15.0M/335M [00:00<00:04, 71.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 4%|▍ | 15.0M/335M [00:00<00:04, 70.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 5%|▍ | 15.2M/335M [00:00<00:04, 71.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 5%|▍ | 15.6M/335M [00:00<00:04, 71.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 3%|▎ | 10.0M/335M [00:00<00:03, 105MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 5%|▌ | 17.1M/335M [00:00<00:04, 73.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 5%|▌ | 17.5M/335M [00:00<00:04, 74.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 5%|▌ | 17.8M/335M [00:00<00:04, 74.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 5%|▌ | 17.8M/335M [00:00<00:04, 75.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 5%|▌ | 18.1M/335M [00:00<00:04, 75.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 5%|▌ | 18.2M/335M [00:00<00:04, 75.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 5%|▌ | 18.4M/335M [00:00<00:04, 76.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 6%|▌ | 19.0M/335M [00:00<00:04, 77.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 6%|▌ | 20.2M/335M [00:00<00:04, 66.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 6%|▋ | 21.8M/335M [00:00<00:04, 68.4MB/s][1,mpirank:8,algo-2]:#015 6%|▋ | 21.4M/335M [00:00<00:04, 66.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 6%|▋ | 21.8M/335M [00:00<00:04, 68.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 7%|▋ | 22.0M/335M [00:00<00:04, 68.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 6%|▌ | 20.0M/335M [00:00<00:03, 104MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 7%|▋ | 22.5M/335M [00:00<00:04, 69.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 7%|▋ | 24.3M/335M [00:00<00:04, 66.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 7%|▋ | 24.7M/335M [00:00<00:05, 65.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 7%|▋ | 25.1M/335M [00:00<00:05, 64.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 8%|▊ | 25.3M/335M [00:00<00:05, 64.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 8%|▊ | 25.6M/335M [00:00<00:05, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 8%|▊ | 25.7M/335M [00:00<00:05, 61.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 8%|▊ | 25.9M/335M [00:00<00:05, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 8%|▊ | 26.6M/335M [00:00<00:05, 57.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 8%|▊ | 26.6M/335M [00:00<00:05, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 8%|▊ | 27.8M/335M [00:00<00:05, 58.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 8%|▊ | 28.3M/335M [00:00<00:05, 58.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 8%|▊ | 28.3M/335M [00:00<00:05, 58.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 9%|▊ | 28.6M/335M [00:00<00:05, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 9%|▊ | 29.0M/335M [00:00<00:04, 75.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 9%|▊ | 29.2M/335M [00:00<00:05, 59.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 9%|▉ | 29.9M/335M [00:00<00:03, 81.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 9%|▉ | 30.7M/335M [00:00<00:05, 62.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 9%|▉ | 31.1M/335M [00:00<00:05, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 9%|▉ | 31.4M/335M [00:00<00:05, 63.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 9%|▉ | 31.6M/335M [00:00<00:05, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 9%|▉ | 31.8M/335M [00:00<00:05, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 10%|▉ | 31.9M/335M [00:00<00:05, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 10%|▉ | 32.1M/335M [00:00<00:05, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 10%|▉ | 32.5M/335M [00:00<00:05, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 10%|▉ | 32.9M/335M [00:00<00:05, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 10%|█ | 33.7M/335M [00:00<00:05, 59.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 10%|█ | 34.1M/335M [00:00<00:05, 57.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 10%|█ | 34.1M/335M [00:00<00:05, 56.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 10%|█ | 34.3M/335M [00:00<00:06, 50.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 10%|█ | 35.1M/335M [00:00<00:06, 45.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 11%|█ | 36.7M/335M [00:00<00:06, 45.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 11%|█ | 37.2M/335M [00:00<00:09, 32.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 11%|█ | 37.6M/335M [00:00<00:09, 31.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 11%|█▏ | 37.8M/335M [00:00<00:09, 31.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 11%|█▏ | 38.0M/335M [00:00<00:09, 31.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 11%|█▏ | 38.2M/335M [00:00<00:08, 37.5MB/s][1,mpirank:1,algo-1]:#015 11%|█▏ | 38.1M/335M [00:00<00:08, 38.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 11%|█▏ | 38.1M/335M [00:00<00:09, 31.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 11%|█▏ | 38.2M/335M [00:00<00:10, 30.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 11%|█▏ | 38.3M/335M [00:00<00:09, 31.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 12%|█▏ | 39.0M/335M [00:00<00:09, 32.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 12%|█▏ | 39.3M/335M [00:00<00:10, 30.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 12%|█▏ | 39.5M/335M [00:00<00:10, 30.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 12%|█▏ | 39.5M/335M [00:00<00:10, 30.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 12%|█▏ | 39.6M/335M [00:00<00:10, 30.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 12%|█▏ | 39.8M/335M [00:00<00:09, 31.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 12%|█▏ | 41.5M/335M [00:00<00:09, 32.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 12%|█▏ | 41.6M/335M [00:00<00:08, 35.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 13%|█▎ | 43.5M/335M [00:01<00:08, 37.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 13%|█▎ | 44.0M/335M [00:01<00:08, 37.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 13%|█▎ | 44.1M/335M [00:01<00:08, 37.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 13%|█▎ | 44.2M/335M [00:00<00:07, 42.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 13%|█▎ | 44.2M/335M [00:00<00:07, 42.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 13%|█▎ | 44.2M/335M [00:01<00:08, 37.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 13%|█▎ | 44.2M/335M [00:01<00:08, 36.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 13%|█▎ | 44.2M/335M [00:01<00:08, 37.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 13%|█▎ | 44.5M/335M [00:01<00:08, 36.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 13%|█▎ | 44.6M/335M [00:01<00:08, 35.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 13%|█▎ | 44.6M/335M [00:01<00:08, 34.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 13%|█▎ | 44.7M/335M [00:01<00:08, 34.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 13%|█▎ | 44.7M/335M [00:01<00:08, 34.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 13%|█▎ | 45.0M/335M [00:01<00:08, 35.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 14%|█▎ | 45.8M/335M [00:01<00:08, 35.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 14%|█▎ | 46.0M/335M [00:01<00:08, 37.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 14%|█▍ | 48.5M/335M [00:01<00:08, 35.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 15%|█▍ | 48.9M/335M [00:01<00:08, 34.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 15%|█▍ | 49.0M/335M [00:01<00:08, 35.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 15%|█▍ | 49.0M/335M [00:01<00:08, 36.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 15%|█▍ | 49.1M/335M [00:01<00:08, 35.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 15%|█▍ | 49.2M/335M [00:01<00:08, 34.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 15%|█▍ | 49.1M/335M [00:01<00:08, 34.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 15%|█▍ | 49.2M/335M [00:01<00:08, 36.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 15%|█▍ | 49.2M/335M [00:01<00:08, 36.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 15%|█▍ | 49.2M/335M [00:01<00:08, 36.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 15%|█▍ | 49.2M/335M [00:01<00:08, 34.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 15%|█▍ | 49.3M/335M [00:01<00:08, 36.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 15%|█▍ | 49.8M/335M [00:01<00:08, 34.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 15%|█▍ | 49.9M/335M [00:01<00:07, 39.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 15%|█▍ | 50.0M/335M [00:01<00:07, 40.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 15%|█▍ | 50.3M/335M [00:01<00:08, 37.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 16%|█▌ | 53.8M/335M [00:01<00:07, 39.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 16%|█▌ | 54.2M/335M [00:01<00:07, 39.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 16%|█▌ | 54.2M/335M [00:01<00:07, 39.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 16%|█▌ | 54.2M/335M [00:01<00:07, 39.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 16%|█▌ | 54.5M/335M [00:01<00:07, 39.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 16%|█▌ | 54.4M/335M [00:01<00:07, 39.3MB/s][1,mpirank:14,algo-2]:#015 16%|█▌ | 54.5M/335M [00:01<00:07, 40.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 16%|█▌ | 54.5M/335M [00:01<00:07, 39.3MB/s][1,mpirank:0,algo-1]:#015 16%|█▋ | 54.5M/335M [00:01<00:07, 40.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 16%|█▌ | 54.5M/335M [00:01<00:07, 40.2MB/s][1,mpirank:9,algo-2]:#015 16%|█▌ | 54.5M/335M [00:01<00:07, 39.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 16%|█▋ | 54.6M/335M [00:01<00:07, 40.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 16%|█▋ | 55.2M/335M [00:01<00:07, 39.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 16%|█▋ | 55.3M/335M [00:01<00:06, 43.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 17%|█▋ | 55.4M/335M [00:01<00:06, 43.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 17%|█▋ | 55.5M/335M [00:01<00:07, 41.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 17%|█▋ | 58.5M/335M [00:01<00:06, 41.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 18%|█▊ | 58.8M/335M [00:01<00:07, 41.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 18%|█▊ | 58.9M/335M [00:01<00:07, 41.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 18%|█▊ | 58.9M/335M [00:01<00:06, 42.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 18%|█▊ | 59.0M/335M [00:01<00:06, 41.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 18%|█▊ | 59.0M/335M [00:01<00:07, 41.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 18%|█▊ | 59.0M/335M [00:01<00:07, 41.1MB/s][1,mpirank:8,algo-2]:#015 18%|█▊ | 59.0M/335M [00:01<00:07, 41.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 18%|█▊ | 59.2M/335M [00:01<00:06, 42.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 18%|█▊ | 59.1M/335M [00:01<00:06, 42.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 18%|█▊ | 59.2M/335M [00:01<00:06, 42.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 18%|█▊ | 59.1M/335M [00:01<00:06, 42.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 18%|█▊ | 60.0M/335M [00:01<00:06, 42.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 18%|█▊ | 60.4M/335M [00:01<00:06, 45.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 18%|█▊ | 60.4M/335M [00:01<00:06, 45.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 18%|█▊ | 60.5M/335M [00:01<00:06, 44.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 19%|█▉ | 64.2M/335M [00:01<00:06, 46.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 19%|█▉ | 64.6M/335M [00:01<00:06, 46.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 19%|█▉ | 64.7M/335M [00:01<00:06, 46.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 19%|█▉ | 64.7M/335M [00:01<00:06, 46.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 19%|█▉ | 64.8M/335M [00:01<00:06, 46.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 19%|█▉ | 64.8M/335M [00:01<00:06, 46.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 19%|█▉ | 64.9M/335M [00:01<00:06, 46.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 19%|█▉ | 64.8M/335M [00:01<00:06, 46.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 19%|█▉ | 65.0M/335M [00:01<00:06, 46.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 19%|█▉ | 65.0M/335M [00:01<00:06, 46.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 19%|█▉ | 65.0M/335M [00:01<00:06, 46.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 19%|█▉ | 65.0M/335M [00:01<00:06, 46.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 20%|█▉ | 65.6M/335M [00:01<00:06, 46.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 20%|█▉ | 65.7M/335M [00:01<00:05, 47.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 20%|█▉ | 65.8M/335M [00:01<00:05, 48.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 20%|█▉ | 65.9M/335M [00:01<00:05, 47.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 21%|██ | 69.7M/335M [00:01<00:05, 49.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 21%|██ | 70.1M/335M [00:01<00:05, 49.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 21%|██ | 70.1M/335M [00:01<00:05, 49.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 21%|██ | 70.2M/335M [00:01<00:05, 49.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 21%|██ | 70.2M/335M [00:01<00:05, 49.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 21%|██ | 70.3M/335M [00:01<00:05, 49.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 21%|██ | 70.3M/335M [00:01<00:05, 49.0MB/s][1,mpirank:8,algo-2]:#015 21%|██ | 70.3M/335M [00:01<00:05, 49.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 21%|██ | 70.4M/335M [00:01<00:05, 49.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 21%|██ | 70.4M/335M [00:01<00:05, 49.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 21%|██ | 70.4M/335M [00:01<00:05, 49.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 21%|██ | 70.5M/335M [00:01<00:05, 49.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 21%|██▏ | 71.3M/335M [00:01<00:05, 50.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 21%|██▏ | 71.6M/335M [00:01<00:05, 51.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 21%|██▏ | 71.4M/335M [00:01<00:05, 50.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 21%|██▏ | 71.9M/335M [00:01<00:05, 51.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 22%|██▏ | 75.2M/335M [00:01<00:05, 51.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 23%|██▎ | 75.6M/335M [00:01<00:05, 51.3MB/s][1,mpirank:10,algo-2]:#015 23%|██▎ | 75.6M/335M [00:01<00:05, 51.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 23%|██▎ | 75.7M/335M [00:01<00:05, 51.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 23%|██▎ | 75.7M/335M [00:01<00:05, 51.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 23%|██▎ | 75.8M/335M [00:01<00:05, 51.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 23%|██▎ | 75.8M/335M [00:01<00:05, 51.4MB/s][1,mpirank:9,algo-2]:#015 23%|██▎ | 75.8M/335M [00:01<00:05, 51.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 23%|██▎ | 75.9M/335M [00:01<00:05, 51.6MB/s][1,mpirank:0,algo-1]:#015 23%|██▎ | 75.9M/335M [00:01<00:05, 51.5MB/s][1,mpirank:5,algo-1]:#015 23%|██▎ | 75.9M/335M [00:01<00:05, 51.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 23%|██▎ | 76.0M/335M [00:01<00:05, 51.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 23%|██▎ | 76.8M/335M [00:01<00:05, 52.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 23%|██▎ | 77.0M/335M [00:01<00:05, 52.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 23%|██▎ | 77.0M/335M [00:01<00:05, 52.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 23%|██▎ | 77.2M/335M [00:01<00:05, 52.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 24%|██▍ | 81.7M/335M [00:01<00:04, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 24%|██▍ | 82.1M/335M [00:01<00:04, 55.8MB/s][1,mpirank:10,algo-2]:#015 24%|██▍ | 82.0M/335M [00:01<00:04, 56.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 24%|██▍ | 82.1M/335M [00:01<00:04, 56.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 24%|██▍ | 82.1M/335M [00:01<00:04, 55.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 25%|██▍ | 82.2M/335M [00:01<00:04, 55.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 25%|██▍ | 82.3M/335M [00:01<00:04, 55.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 25%|██▍ | 82.2M/335M [00:01<00:04, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 25%|██▍ | 82.4M/335M [00:01<00:04, 56.1MB/s][1,mpirank:0,algo-1]:#015 25%|██▍ | 82.4M/335M [00:01<00:04, 56.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 25%|██▍ | 82.3M/335M [00:01<00:04, 55.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 25%|██▍ | 82.3M/335M [00:01<00:04, 54.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 25%|██▍ | 83.0M/335M [00:01<00:04, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 25%|██▍ | 83.3M/335M [00:01<00:04, 56.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 25%|██▍ | 83.3M/335M [00:01<00:04, 56.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 25%|██▍ | 83.5M/335M [00:01<00:04, 56.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 26%|██▌ | 87.8M/335M [00:01<00:04, 58.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 26%|██▋ | 88.1M/335M [00:01<00:04, 57.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 26%|██▋ | 88.1M/335M [00:01<00:04, 57.8MB/s][1,mpirank:13,algo-2]:#015 26%|██▋ | 88.1M/335M [00:01<00:04, 57.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 26%|██▋ | 88.1M/335M [00:01<00:04, 57.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 26%|██▋ | 88.1M/335M [00:01<00:04, 57.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 26%|██▋ | 88.1M/335M [00:01<00:04, 57.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 26%|██▋ | 88.2M/335M [00:01<00:04, 57.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 26%|██▋ | 88.3M/335M [00:01<00:04, 57.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 26%|██▋ | 88.2M/335M [00:01<00:04, 57.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 26%|██▋ | 88.3M/335M [00:01<00:04, 57.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 26%|██▋ | 88.7M/335M [00:01<00:04, 57.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 26%|██▋ | 88.8M/335M [00:01<00:04, 57.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 27%|██▋ | 89.0M/335M [00:01<00:04, 57.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 27%|██▋ | 89.0M/335M [00:01<00:04, 57.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 27%|██▋ | 89.1M/335M [00:01<00:04, 56.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 28%|██▊ | 93.6M/335M [00:01<00:04, 58.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 28%|██▊ | 94.0M/335M [00:02<00:04, 59.0MB/s][1,mpirank:10,algo-2]:#015 28%|██▊ | 94.0M/335M [00:02<00:04, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 28%|██▊ | 94.1M/335M [00:02<00:04, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 28%|██▊ | 94.1M/335M [00:02<00:04, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 28%|██▊ | 94.0M/335M [00:01<00:04, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 28%|██▊ | 94.1M/335M [00:02<00:04, 59.0MB/s][1,mpirank:8,algo-2]:#015 28%|██▊ | 94.2M/335M [00:02<00:04, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 28%|██▊ | 94.2M/335M [00:01<00:04, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 28%|██▊ | 94.2M/335M [00:01<00:04, 58.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 28%|██▊ | 94.3M/335M [00:01<00:04, 59.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 28%|██▊ | 94.9M/335M [00:01<00:04, 59.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 28%|██▊ | 95.1M/335M [00:02<00:04, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 28%|██▊ | 95.4M/335M [00:01<00:04, 60.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 28%|██▊ | 95.4M/335M [00:01<00:04, 60.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 29%|██▊ | 95.7M/335M [00:02<00:04, 60.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 30%|██▉ | 100M/335M [00:02<00:03, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 30%|███ | 101M/335M [00:02<00:03, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 30%|███ | 101M/335M [00:02<00:03, 62.4MB/s] [1,mpirank:13,algo-2]:#015 30%|███ | 101M/335M [00:02<00:03, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 30%|███ | 101M/335M [00:02<00:03, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 30%|███ | 101M/335M [00:02<00:03, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 30%|███ | 101M/335M [00:02<00:03, 62.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 30%|███ | 101M/335M [00:02<00:03, 62.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 30%|███ | 101M/335M [00:02<00:03, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 30%|███ | 101M/335M [00:02<00:03, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 30%|███ | 101M/335M [00:02<00:03, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 30%|███ | 102M/335M [00:02<00:03, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 30%|███ | 102M/335M [00:02<00:03, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 30%|███ | 102M/335M [00:02<00:03, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 30%|███ | 102M/335M [00:02<00:03, 63.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 31%|███ | 102M/335M [00:02<00:03, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 32%|███▏ | 107M/335M [00:02<00:03, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 32%|███▏ | 107M/335M [00:02<00:03, 63.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 32%|███▏ | 107M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 32%|███▏ | 107M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 32%|███▏ | 107M/335M [00:02<00:03, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 32%|███▏ | 107M/335M [00:02<00:03, 62.6MB/s][1,mpirank:11,algo-2]:#015 32%|███▏ | 107M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 32%|███▏ | 107M/335M [00:02<00:03, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 32%|███▏ | 107M/335M [00:02<00:03, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 32%|███▏ | 107M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 32%|███▏ | 107M/335M [00:02<00:03, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 32%|███▏ | 108M/335M [00:02<00:03, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 32%|███▏ | 108M/335M [00:02<00:03, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 32%|███▏ | 108M/335M [00:02<00:03, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 32%|███▏ | 108M/335M [00:02<00:03, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 32%|███▏ | 109M/335M [00:02<00:03, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.9MB/s][1,mpirank:4,algo-1]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.7MB/s][1,mpirank:11,algo-2]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.8MB/s][1,mpirank:9,algo-2]:#015 34%|███▎ | 113M/335M [00:02<00:03, 61.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 34%|███▍ | 113M/335M [00:02<00:03, 61.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 34%|███▍ | 114M/335M [00:02<00:03, 62.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 34%|███▍ | 114M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 34%|███▍ | 114M/335M [00:02<00:03, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 34%|███▍ | 114M/335M [00:02<00:03, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 34%|███▍ | 115M/335M [00:02<00:03, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 35%|███▌ | 119M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 35%|███▌ | 119M/335M [00:02<00:03, 61.9MB/s][1,mpirank:13,algo-2]:#015 36%|███▌ | 119M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 36%|███▌ | 119M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 35%|███▌ | 119M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 36%|███▌ | 119M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 36%|███▌ | 119M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 35%|███▌ | 119M/335M [00:02<00:03, 61.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 35%|███▌ | 119M/335M [00:02<00:03, 61.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 36%|███▌ | 119M/335M [00:02<00:03, 61.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 36%|███▌ | 119M/335M [00:02<00:03, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 36%|███▌ | 120M/335M [00:02<00:03, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 36%|███▌ | 120M/335M [00:02<00:03, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 36%|███▌ | 120M/335M [00:02<00:03, 62.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 36%|███▌ | 120M/335M [00:02<00:03, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 36%|███▌ | 121M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 37%|███▋ | 125M/335M [00:02<00:03, 64.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 37%|███▋ | 126M/335M [00:02<00:03, 64.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 37%|███▋ | 126M/335M [00:02<00:03, 64.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 37%|███▋ | 126M/335M [00:02<00:03, 64.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 37%|███▋ | 126M/335M [00:02<00:03, 64.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 38%|███▊ | 126M/335M [00:02<00:03, 64.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 37%|███▋ | 126M/335M [00:02<00:03, 64.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 38%|███▊ | 126M/335M [00:02<00:03, 64.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 38%|███▊ | 126M/335M [00:02<00:03, 64.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 38%|███▊ | 126M/335M [00:02<00:03, 64.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 38%|███▊ | 126M/335M [00:02<00:03, 64.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 38%|███▊ | 127M/335M [00:02<00:03, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 38%|███▊ | 127M/335M [00:02<00:03, 63.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 38%|███▊ | 127M/335M [00:02<00:03, 63.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 38%|███▊ | 127M/335M [00:02<00:03, 63.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 38%|███▊ | 127M/335M [00:02<00:03, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 39%|███▉ | 132M/335M [00:02<00:03, 61.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 39%|███▉ | 132M/335M [00:02<00:03, 61.9MB/s][1,mpirank:13,algo-2]:#015 39%|███▉ | 132M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 39%|███▉ | 132M/335M [00:02<00:03, 61.9MB/s][1,mpirank:14,algo-2]:#015 39%|███▉ | 132M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 39%|███▉ | 132M/335M [00:02<00:03, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 39%|███▉ | 132M/335M [00:02<00:03, 62.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 39%|███▉ | 132M/335M [00:02<00:03, 61.7MB/s][1,mpirank:15,algo-2]:#015 39%|███▉ | 132M/335M [00:02<00:03, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 39%|███▉ | 132M/335M [00:02<00:03, 61.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 39%|███▉ | 132M/335M [00:02<00:03, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 40%|███▉ | 133M/335M [00:02<00:03, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 40%|███▉ | 133M/335M [00:02<00:03, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 40%|███▉ | 133M/335M [00:02<00:03, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 40%|███▉ | 133M/335M [00:02<00:03, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 40%|███▉ | 133M/335M [00:02<00:03, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 41%|████ | 138M/335M [00:02<00:03, 63.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 41%|████ | 138M/335M [00:02<00:03, 62.9MB/s][1,mpirank:0,algo-1]:#015 41%|████ | 138M/335M [00:02<00:03, 62.8MB/s][1,mpirank:8,algo-2]:#015 41%|████ | 138M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 41%|████ | 138M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 41%|████ | 138M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 41%|████ | 138M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 41%|████ | 138M/335M [00:02<00:03, 62.8MB/s][1,mpirank:4,algo-1]:#015 41%|████ | 138M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 41%|████▏ | 138M/335M [00:02<00:03, 62.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 41%|████▏ | 138M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 41%|████▏ | 139M/335M [00:02<00:03, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 41%|████▏ | 139M/335M [00:02<00:03, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 41%|████▏ | 139M/335M [00:02<00:03, 62.9MB/s][1,mpirank:2,algo-1]:#015 41%|████▏ | 139M/335M [00:02<00:03, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 42%|████▏ | 139M/335M [00:02<00:03, 62.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 43%|████▎ | 144M/335M [00:02<00:03, 64.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.5MB/s][1,mpirank:8,algo-2]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.6MB/s][1,mpirank:4,algo-1]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 43%|████▎ | 144M/335M [00:02<00:03, 63.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 43%|████▎ | 145M/335M [00:02<00:03, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 43%|████▎ | 146M/335M [00:02<00:03, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 43%|████▎ | 146M/335M [00:02<00:03, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 43%|████▎ | 146M/335M [00:02<00:03, 64.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 43%|████▎ | 146M/335M [00:02<00:03, 64.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 45%|████▍ | 151M/335M [00:02<00:02, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 45%|████▌ | 151M/335M [00:02<00:02, 64.8MB/s][1,mpirank:0,algo-1]:#015 45%|████▌ | 151M/335M [00:02<00:02, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 45%|████▌ | 151M/335M [00:02<00:02, 64.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 45%|████▌ | 151M/335M [00:02<00:02, 64.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 45%|████▍ | 151M/335M [00:02<00:02, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 45%|████▌ | 151M/335M [00:02<00:02, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 45%|████▌ | 151M/335M [00:02<00:02, 64.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 45%|████▌ | 151M/335M [00:02<00:02, 64.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 45%|████▌ | 151M/335M [00:02<00:02, 65.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 45%|████▌ | 151M/335M [00:02<00:02, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 45%|████▌ | 152M/335M [00:02<00:02, 64.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 45%|████▌ | 152M/335M [00:02<00:02, 64.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 45%|████▌ | 152M/335M [00:02<00:02, 64.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 45%|████▌ | 152M/335M [00:02<00:02, 64.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 45%|████▌ | 152M/335M [00:02<00:02, 64.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 47%|████▋ | 158M/335M [00:03<00:02, 65.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 47%|████▋ | 157M/335M [00:03<00:02, 64.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 47%|████▋ | 158M/335M [00:03<00:02, 65.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 47%|████▋ | 158M/335M [00:03<00:02, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 47%|████▋ | 158M/335M [00:02<00:02, 65.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 47%|████▋ | 158M/335M [00:02<00:02, 65.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 47%|████▋ | 158M/335M [00:03<00:02, 65.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 49%|████▊ | 163M/335M [00:03<00:03, 57.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 49%|████▊ | 163M/335M [00:03<00:03, 56.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 49%|████▊ | 163M/335M [00:03<00:03, 56.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 49%|████▊ | 163M/335M [00:03<00:03, 56.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 49%|████▊ | 163M/335M [00:03<00:03, 56.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 49%|████▊ | 163M/335M [00:03<00:03, 56.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 49%|████▊ | 163M/335M [00:03<00:03, 56.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 49%|████▊ | 163M/335M [00:03<00:03, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 49%|████▊ | 163M/335M [00:03<00:03, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 49%|████▉ | 164M/335M [00:03<00:03, 55.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 49%|████▉ | 164M/335M [00:03<00:03, 55.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 49%|████▉ | 164M/335M [00:03<00:03, 54.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 49%|████▉ | 164M/335M [00:03<00:03, 54.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 49%|████▉ | 165M/335M [00:03<00:03, 53.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 49%|████▉ | 165M/335M [00:03<00:03, 53.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 49%|████▉ | 165M/335M [00:03<00:03, 52.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.4MB/s][1,mpirank:0,algo-1]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.3MB/s][1,mpirank:10,algo-2]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 50%|█████ | 169M/335M [00:03<00:03, 50.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 51%|█████ | 170M/335M [00:03<00:03, 50.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 51%|█████ | 170M/335M [00:03<00:03, 50.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 51%|█████ | 170M/335M [00:03<00:03, 51.4MB/s][1,mpirank:2,algo-1]:#015 51%|█████ | 170M/335M [00:03<00:03, 51.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 51%|█████ | 170M/335M [00:03<00:03, 51.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 52%|█████▏ | 175M/335M [00:03<00:03, 55.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 52%|█████▏ | 175M/335M [00:03<00:03, 55.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 52%|█████▏ | 176M/335M [00:03<00:02, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 52%|█████▏ | 176M/335M [00:03<00:02, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 52%|█████▏ | 176M/335M [00:03<00:03, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 52%|█████▏ | 176M/335M [00:03<00:02, 55.8MB/s][1,mpirank:10,algo-2]:#015 52%|█████▏ | 176M/335M [00:03<00:02, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 52%|█████▏ | 176M/335M [00:03<00:02, 56.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 52%|█████▏ | 176M/335M [00:03<00:02, 56.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 52%|█████▏ | 176M/335M [00:03<00:02, 55.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 53%|█████▎ | 176M/335M [00:03<00:02, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 53%|█████▎ | 176M/335M [00:03<00:02, 56.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 53%|█████▎ | 176M/335M [00:03<00:02, 56.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 53%|█████▎ | 177M/335M [00:03<00:02, 56.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 53%|█████▎ | 177M/335M [00:03<00:02, 56.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 53%|█████▎ | 177M/335M [00:03<00:02, 56.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 54%|█████▍ | 181M/335M [00:03<00:02, 56.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 54%|█████▍ | 181M/335M [00:03<00:02, 56.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 54%|█████▍ | 181M/335M [00:03<00:02, 56.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 54%|█████▍ | 181M/335M [00:03<00:02, 56.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 54%|█████▍ | 181M/335M [00:03<00:02, 56.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 54%|█████▍ | 181M/335M [00:03<00:02, 56.5MB/s][1,mpirank:15,algo-2]:#015 54%|█████▍ | 181M/335M [00:03<00:02, 56.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 54%|█████▍ | 182M/335M [00:03<00:02, 56.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 54%|█████▍ | 182M/335M [00:03<00:02, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 54%|█████▍ | 182M/335M [00:03<00:02, 56.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 54%|█████▍ | 182M/335M [00:03<00:02, 55.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 54%|█████▍ | 182M/335M [00:03<00:02, 55.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 54%|█████▍ | 182M/335M [00:03<00:02, 54.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 54%|█████▍ | 182M/335M [00:03<00:02, 53.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 54%|█████▍ | 182M/335M [00:03<00:03, 53.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 54%|█████▍ | 182M/335M [00:03<00:03, 53.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 56%|█████▌ | 187M/335M [00:03<00:03, 51.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 56%|█████▌ | 187M/335M [00:03<00:03, 51.7MB/s][1,mpirank:8,algo-2]:#015 56%|█████▌ | 187M/335M [00:03<00:02, 51.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 56%|█████▌ | 187M/335M [00:03<00:03, 51.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 56%|█████▌ | 187M/335M [00:03<00:03, 51.7MB/s][1,mpirank:0,algo-1]:#015 56%|█████▌ | 187M/335M [00:03<00:03, 51.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 56%|█████▌ | 187M/335M [00:03<00:03, 51.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 56%|█████▌ | 187M/335M [00:03<00:03, 51.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 56%|█████▌ | 187M/335M [00:03<00:02, 51.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 56%|█████▌ | 187M/335M [00:03<00:03, 51.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 56%|█████▌ | 187M/335M [00:03<00:02, 52.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 56%|█████▌ | 187M/335M [00:03<00:02, 52.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 56%|█████▌ | 187M/335M [00:03<00:02, 52.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 56%|█████▌ | 188M/335M [00:03<00:02, 52.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 56%|█████▌ | 188M/335M [00:03<00:02, 52.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 56%|█████▌ | 188M/335M [00:03<00:02, 53.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 57%|█████▋ | 193M/335M [00:03<00:02, 53.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 57%|█████▋ | 193M/335M [00:03<00:02, 53.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 57%|█████▋ | 193M/335M [00:03<00:02, 53.4MB/s][1,mpirank:15,algo-2]:#015 57%|█████▋ | 193M/335M [00:03<00:02, 53.5MB/s][1,mpirank:0,algo-1]:#015 57%|█████▋ | 193M/335M [00:03<00:02, 53.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 57%|█████▋ | 193M/335M [00:03<00:02, 53.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 57%|█████▋ | 193M/335M [00:03<00:02, 53.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 57%|█████▋ | 193M/335M [00:03<00:02, 53.6MB/s][1,mpirank:5,algo-1]:#015 57%|█████▋ | 193M/335M [00:03<00:02, 53.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 57%|█████▋ | 192M/335M [00:03<00:02, 52.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 58%|█████▊ | 193M/335M [00:03<00:02, 53.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 58%|█████▊ | 193M/335M [00:03<00:02, 53.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 58%|█████▊ | 193M/335M [00:03<00:02, 54.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 58%|█████▊ | 193M/335M [00:03<00:02, 54.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 58%|█████▊ | 193M/335M [00:03<00:02, 54.4MB/s][1,mpirank:6,algo-1]:#015 58%|█████▊ | 193M/335M [00:03<00:02, 54.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 56.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 56.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 56.1MB/s][1,mpirank:14,algo-2]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.6MB/s][1,mpirank:15,algo-2]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.5MB/s][1,mpirank:0,algo-1]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.8MB/s][1,mpirank:9,algo-2]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 56.0MB/s][1,mpirank:6,algo-1]:#015 59%|█████▉ | 199M/335M [00:03<00:02, 56.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 61%|██████ | 205M/335M [00:03<00:02, 58.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.4MB/s][1,mpirank:4,algo-1]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.2MB/s][1,mpirank:12,algo-2]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 61%|██████ | 205M/335M [00:03<00:02, 58.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 61%|██████ | 205M/335M [00:03<00:02, 58.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 61%|██████ | 205M/335M [00:03<00:02, 59.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 61%|██████▏ | 206M/335M [00:03<00:02, 59.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 61%|██████▏ | 206M/335M [00:03<00:02, 59.7MB/s][1,mpirank:2,algo-1]:#015 61%|██████▏ | 206M/335M [00:03<00:02, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 63%|██████▎ | 210M/335M [00:04<00:02, 57.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 63%|██████▎ | 210M/335M [00:04<00:02, 57.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 56.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 57.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 56.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 56.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 56.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 56.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 56.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 56.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 55.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 63%|██████▎ | 211M/335M [00:04<00:02, 55.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 63%|██████▎ | 211M/335M [00:03<00:02, 54.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 64%|██████▍ | 216M/335M [00:04<00:02, 48.2MB/s][1,mpirank:14,algo-2]:#015 64%|██████▍ | 216M/335M [00:04<00:02, 48.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 64%|██████▍ | 216M/335M [00:04<00:02, 48.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 64%|██████▍ | 216M/335M [00:04<00:02, 48.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 64%|██████▍ | 216M/335M [00:04<00:02, 48.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 65%|██████▍ | 216M/335M [00:04<00:02, 48.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 65%|██████▍ | 216M/335M [00:04<00:02, 48.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 65%|██████▍ | 217M/335M [00:04<00:02, 48.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 65%|██████▍ | 216M/335M [00:04<00:02, 48.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 65%|██████▍ | 216M/335M [00:04<00:02, 48.4MB/s][1,mpirank:5,algo-1]:#015 65%|██████▍ | 216M/335M [00:04<00:02, 48.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 65%|██████▍ | 217M/335M [00:04<00:02, 48.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 65%|██████▍ | 217M/335M [00:04<00:02, 48.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 65%|██████▍ | 217M/335M [00:04<00:02, 48.7MB/s][1,mpirank:2,algo-1]:#015 65%|██████▍ | 217M/335M [00:04<00:02, 48.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 65%|██████▍ | 217M/335M [00:04<00:02, 46.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.4MB/s][1,mpirank:11,algo-2]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.7MB/s][1,mpirank:15,algo-2]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.6MB/s][1,mpirank:5,algo-1]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 49.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.9MB/s][1,mpirank:9,algo-2]:#015 66%|██████▌ | 222M/335M [00:04<00:02, 50.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 66%|██████▋ | 222M/335M [00:04<00:02, 51.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 66%|██████▋ | 222M/335M [00:04<00:02, 51.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 66%|██████▋ | 222M/335M [00:04<00:02, 50.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 66%|██████▋ | 223M/335M [00:04<00:02, 50.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.8MB/s][1,mpirank:11,algo-2]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 55.0MB/s][1,mpirank:5,algo-1]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 55.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 68%|██████▊ | 228M/335M [00:04<00:02, 54.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 68%|██████▊ | 229M/335M [00:04<00:02, 55.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 68%|██████▊ | 229M/335M [00:04<00:02, 55.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 68%|██████▊ | 229M/335M [00:04<00:02, 55.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 69%|██████▊ | 230M/335M [00:04<00:01, 56.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 70%|██████▉ | 234M/335M [00:04<00:01, 58.5MB/s][1,mpirank:11,algo-2]:#015 70%|██████▉ | 234M/335M [00:04<00:01, 58.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 70%|██████▉ | 235M/335M [00:04<00:01, 58.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 70%|██████▉ | 235M/335M [00:04<00:01, 58.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 70%|██████▉ | 235M/335M [00:04<00:01, 58.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 70%|███████ | 235M/335M [00:04<00:01, 58.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 70%|███████ | 235M/335M [00:04<00:01, 58.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 70%|███████ | 235M/335M [00:04<00:01, 58.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 70%|███████ | 235M/335M [00:04<00:01, 58.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 70%|███████ | 235M/335M [00:04<00:01, 58.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 70%|███████ | 235M/335M [00:04<00:01, 58.7MB/s][1,mpirank:5,algo-1]:#015 70%|███████ | 235M/335M [00:04<00:01, 58.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 70%|███████ | 235M/335M [00:04<00:01, 58.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 70%|███████ | 235M/335M [00:04<00:01, 58.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 70%|███████ | 235M/335M [00:04<00:01, 59.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 71%|███████ | 237M/335M [00:04<00:01, 60.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 72%|███████▏ | 240M/335M [00:04<00:01, 55.6MB/s][1,mpirank:11,algo-2]:#015 72%|███████▏ | 240M/335M [00:04<00:01, 55.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 72%|███████▏ | 240M/335M [00:04<00:01, 55.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 72%|███████▏ | 240M/335M [00:04<00:01, 55.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 72%|███████▏ | 240M/335M [00:04<00:01, 55.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.6MB/s][1,mpirank:9,algo-2]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 72%|███████▏ | 241M/335M [00:04<00:01, 55.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 72%|███████▏ | 243M/335M [00:04<00:01, 56.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 74%|███████▎ | 247M/335M [00:04<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 74%|███████▎ | 247M/335M [00:04<00:01, 59.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 74%|███████▎ | 247M/335M [00:04<00:01, 59.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 74%|███████▎ | 247M/335M [00:04<00:01, 59.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 74%|███████▎ | 247M/335M [00:04<00:01, 59.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 74%|███████▍ | 247M/335M [00:04<00:01, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 74%|███████▍ | 247M/335M [00:04<00:01, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 74%|███████▍ | 247M/335M [00:04<00:01, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 74%|███████▍ | 248M/335M [00:04<00:01, 59.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 74%|███████▍ | 247M/335M [00:04<00:01, 59.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 74%|███████▍ | 247M/335M [00:04<00:01, 59.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 74%|███████▍ | 247M/335M [00:04<00:01, 59.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 74%|███████▍ | 248M/335M [00:04<00:01, 59.8MB/s][1,mpirank:1,algo-1]:#015 74%|███████▍ | 248M/335M [00:04<00:01, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 74%|███████▍ | 248M/335M [00:04<00:01, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 74%|███████▍ | 249M/335M [00:04<00:01, 60.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 76%|███████▌ | 254M/335M [00:04<00:01, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 76%|███████▋ | 256M/335M [00:04<00:01, 62.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 78%|███████▊ | 260M/335M [00:04<00:01, 64.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 78%|███████▊ | 260M/335M [00:04<00:01, 64.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 78%|███████▊ | 260M/335M [00:04<00:01, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.7MB/s][1,mpirank:9,algo-2]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 78%|███████▊ | 261M/335M [00:04<00:01, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 78%|███████▊ | 262M/335M [00:04<00:01, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 79%|███████▉ | 266M/335M [00:05<00:01, 59.4MB/s][1,mpirank:14,algo-2]:#015 79%|███████▉ | 266M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 80%|███████▉ | 267M/335M [00:04<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 80%|███████▉ | 267M/335M [00:05<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 80%|████████ | 268M/335M [00:05<00:01, 59.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 81%|████████ | 272M/335M [00:05<00:01, 59.4MB/s][1,mpirank:11,algo-2]:#015 81%|████████ | 272M/335M [00:05<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 81%|████████ | 272M/335M [00:05<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 81%|████████▏ | 273M/335M [00:05<00:01, 59.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 82%|████████▏ | 274M/335M [00:05<00:01, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 83%|████████▎ | 278M/335M [00:05<00:00, 61.1MB/s][1,mpirank:11,algo-2]:#015 83%|████████▎ | 278M/335M [00:05<00:00, 61.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 61.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 61.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 61.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 61.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 60.0MB/s][1,mpirank:2,algo-1]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 59.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 60.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 59.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 59.9MB/s][1,mpirank:4,algo-1]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 60.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 60.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 59.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 83%|████████▎ | 279M/335M [00:05<00:00, 59.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 83%|████████▎ | 280M/335M [00:05<00:00, 59.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 85%|████████▍ | 284M/335M [00:05<00:00, 60.8MB/s][1,mpirank:11,algo-2]:#015 85%|████████▍ | 284M/335M [00:05<00:00, 60.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 59.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 60.2MB/s][1,mpirank:10,algo-2]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 59.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 60.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 60.1MB/s][1,mpirank:8,algo-2]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 60.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 60.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 60.2MB/s][1,mpirank:15,algo-2]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 59.8MB/s][1,mpirank:5,algo-1]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 60.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 60.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 85%|████████▍ | 285M/335M [00:05<00:00, 60.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 85%|████████▌ | 286M/335M [00:05<00:00, 60.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 87%|████████▋ | 291M/335M [00:05<00:00, 61.7MB/s][1,mpirank:11,algo-2]:#015 87%|████████▋ | 291M/335M [00:05<00:00, 61.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 87%|████████▋ | 291M/335M [00:05<00:00, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 87%|████████▋ | 291M/335M [00:05<00:00, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 87%|████████▋ | 291M/335M [00:05<00:00, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 87%|████████▋ | 291M/335M [00:05<00:00, 61.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.6MB/s][1,mpirank:4,algo-1]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.5MB/s][1,mpirank:9,algo-2]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 87%|████████▋ | 292M/335M [00:05<00:00, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 88%|████████▊ | 297M/335M [00:05<00:00, 62.8MB/s][1,mpirank:11,algo-2]:#015 88%|████████▊ | 297M/335M [00:05<00:00, 62.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 89%|████████▊ | 298M/335M [00:05<00:00, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.4MB/s][1,mpirank:0,algo-1]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.5MB/s][1,mpirank:1,algo-1]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 89%|████████▉ | 298M/335M [00:05<00:00, 63.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 89%|████████▉ | 299M/335M [00:05<00:00, 63.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 90%|█████████ | 303M/335M [00:05<00:00, 63.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 90%|█████████ | 303M/335M [00:05<00:00, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.6MB/s][1,mpirank:4,algo-1]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 91%|█████████ | 304M/335M [00:05<00:00, 63.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 91%|█████████ | 304M/335M [00:05<00:00, 62.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 91%|█████████ | 305M/335M [00:05<00:00, 63.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 92%|█████████▏| 309M/335M [00:05<00:00, 59.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 92%|█████████▏| 309M/335M [00:05<00:00, 60.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.2MB/s][1,mpirank:8,algo-2]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.3MB/s][1,mpirank:4,algo-1]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.3MB/s][1,mpirank:12,algo-2]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 92%|█████████▏| 310M/335M [00:05<00:00, 60.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 93%|█████████▎| 311M/335M [00:05<00:00, 60.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 94%|█████████▍| 315M/335M [00:05<00:00, 61.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 94%|█████████▍| 315M/335M [00:05<00:00, 61.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.2MB/s][1,mpirank:8,algo-2]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 94%|█████████▍| 316M/335M [00:05<00:00, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 94%|█████████▍| 317M/335M [00:05<00:00, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 94%|█████████▍| 317M/335M [00:05<00:00, 62.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 95%|█████████▍| 317M/335M [00:05<00:00, 62.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 96%|█████████▌| 322M/335M [00:06<00:00, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 96%|█████████▌| 322M/335M [00:06<00:00, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 96%|█████████▌| 322M/335M [00:06<00:00, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 96%|█████████▋| 323M/335M [00:05<00:00, 64.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 63.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 63.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 96%|█████████▋| 323M/335M [00:05<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 63.7MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 64.0MB/s][1,mpirank:5,algo-1]:#015 96%|█████████▋| 323M/335M [00:06<00:00, 64.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 96%|█████████▋| 324M/335M [00:06<00:00, 63.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015 98%|█████████▊| 328M/335M [00:06<00:00, 64.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015 98%|█████████▊| 328M/335M [00:06<00:00, 64.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015 98%|█████████▊| 328M/335M [00:06<00:00, 62.6MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 63.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.0MB/s][1,mpirank:4,algo-1]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015 98%|█████████▊| 329M/335M [00:06<00:00, 64.0MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015 98%|█████████▊| 330M/335M [00:06<00:00, 64.1MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015100%|█████████▉| 335M/335M [00:06<00:00, 65.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015100%|█████████▉| 335M/335M [00:06<00:00, 65.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:#015100%|██████████| 335M/335M [00:06<00:00, 57.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:#015100%|██████████| 335M/335M [00:06<00:00, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:#015100%|██████████| 335M/335M [00:06<00:00, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:#015100%|██████████| 335M/335M [00:06<00:00, 56.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:#015100%|██████████| 335M/335M [00:06<00:00, 56.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:#015100%|██████████| 335M/335M [00:06<00:00, 56.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:#015100%|██████████| 335M/335M [00:06<00:00, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:#015100%|██████████| 335M/335M [00:06<00:00, 56.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:#015100%|██████████| 335M/335M [00:06<00:00, 56.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:#015100%|██████████| 335M/335M [00:06<00:00, 56.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:#015100%|██████████| 335M/335M [00:06<00:00, 56.2MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:#015100%|██████████| 335M/335M [00:06<00:00, 56.3MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:#015100%|██████████| 335M/335M [00:06<00:00, 56.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:#015100%|██████████| 335M/335M [00:06<00:00, 56.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015100%|█████████▉| 335M/335M [00:06<00:00, 62.8MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:#015100%|██████████| 335M/335M [00:06<00:00, 57.4MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015100%|█████████▉| 335M/335M [00:06<00:00, 55.5MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:#015100%|██████████| 335M/335M [00:06<00:00, 55.9MB/s]\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]: local_rank : 0, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]: local_rank : 3, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]: local_rank : 1, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]: local_rank : 2, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]: local_rank : 5, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]: local_rank : 6, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]: local_rank : 4, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]: local_rank : 7, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]: local_rank : 6, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]: local_rank : 3, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]: local_rank : 7, local_batch_size : 40[1,mpirank:12,algo-2]:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:INFO:__main__:Processes 369/5899 (6%) of train data\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]: local_rank : 1, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]: local_rank : 0, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:INFO:__main__:Get test data loader\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]: local_rank : 2, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]: local_rank : 5, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]: local_rank : 4, local_batch_size : 40\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:INFO:__main__:Processes 734/734 (100%) of test data\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:[2023-07-02 09:05:27.948 algo-1:669 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:[2023-07-02 09:05:27.951 algo-1:208 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:[2023-07-02 09:05:27.957 algo-1:339 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:[2023-07-02 09:05:27.962 algo-1:342 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:[2023-07-02 09:05:27.965 algo-1:343 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:[2023-07-02 09:05:27.965 algo-1:206 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:[2023-07-02 09:05:27.970 algo-1:210 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:[2023-07-02 09:05:27.979 algo-1:344 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:[2023-07-02 09:05:27.979 algo-2:208 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:[2023-07-02 09:05:27.979 algo-2:217 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:[2023-07-02 09:05:27.979 algo-2:212 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:[2023-07-02 09:05:27.980 algo-2:213 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:[2023-07-02 09:05:27.982 algo-2:605 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:[2023-07-02 09:05:27.983 algo-2:209 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:[2023-07-02 09:05:27.983 algo-2:216 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:[2023-07-02 09:05:27.993 algo-2:142 INFO utils.py:28] RULE_JOB_STOP_SIGNAL_FILENAME: None\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:[2023-07-02 09:05:28.430 algo-1:342 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:[2023-07-02 09:05:28.430 algo-1:343 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:[2023-07-02 09:05:28.430 algo-1:339 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:[2023-07-02 09:05:28.430 algo-1:210 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:[2023-07-02 09:05:28.430 algo-1:344 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:[2023-07-02 09:05:28.436 algo-1:669 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:[2023-07-02 09:05:28.441 algo-1:208 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:[2023-07-02 09:05:28.445 algo-2:217 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:[2023-07-02 09:05:28.446 algo-2:208 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:[2023-07-02 09:05:28.447 algo-2:605 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:[2023-07-02 09:05:28.447 algo-2:209 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:[2023-07-02 09:05:28.453 algo-2:213 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:[2023-07-02 09:05:28.454 algo-1:206 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:[2023-07-02 09:05:28.454 algo-2:216 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:[2023-07-02 09:05:28.461 algo-2:212 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:[2023-07-02 09:05:28.503 algo-2:142 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][0/10] Train_Time=2.080: avg-2.080, Train_Speed=307.712 (307.712), Train_Loss=13.2123851776:(13.2124), Train_Prec@1=0.000:(0.000), Train_Prec@5=0.000:(0.000)\u001b[0m\n", "\u001b[34m[1,mpirank:13,algo-2]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:12,algo-2]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:4,algo-1]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:6,algo-1]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:2,algo-1]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:14,algo-2]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:9,algo-2]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:11,algo-2]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:1,algo-1]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:10,algo-2]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:3,algo-1]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:8,algo-2]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:5,algo-1]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:15,algo-2]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:7,algo-1]:INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][1/10] Train_Time=0.260: avg-1.170, Train_Speed=2463.922 (547.098), Train_Loss=10.5491075516:(11.8807), Train_Prec@1=0.000:(0.000), Train_Prec@5=0.000:(0.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][2/10] Train_Time=0.249: avg-0.863, Train_Speed=2573.658 (741.802), Train_Loss=8.6122722626:(10.7913), Train_Prec@1=0.000:(0.000), Train_Prec@5=0.000:(0.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][3/10] Train_Time=0.310: avg-0.724, Train_Speed=2067.538 (883.418), Train_Loss=7.2212743759:(9.8988), Train_Prec@1=0.000:(0.000), Train_Prec@5=0.000:(0.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][4/10] Train_Time=0.271: avg-0.634, Train_Speed=2357.742 (1009.692), Train_Loss=5.9680967331:(9.1126), Train_Prec@1=0.000:(0.000), Train_Prec@5=10.000:(2.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][5/10] Train_Time=0.192: avg-0.560, Train_Speed=3338.320 (1142.518), Train_Loss=5.2399535179:(8.4672), Train_Prec@1=5.000:(0.833), Train_Prec@5=20.000:(5.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][6/10] Train_Time=0.082: avg-0.492, Train_Speed=7800.581 (1301.175), Train_Loss=4.8414807320:(7.9492), Train_Prec@1=0.000:(0.714), Train_Prec@5=15.000:(6.429)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][7/10] Train_Time=0.077: avg-0.440, Train_Speed=8324.561 (1454.578), Train_Loss=4.6641449928:(7.5386), Train_Prec@1=2.500:(0.938), Train_Prec@5=22.500:(8.438)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][8/10] Train_Time=0.074: avg-0.399, Train_Speed=8637.064 (1602.661), Train_Loss=4.6902103424:(7.2221), Train_Prec@1=2.500:(1.111), Train_Prec@5=7.500:(8.333)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [1][9/10] Train_Time=0.070: avg-0.366, Train_Speed=9100.722 (1746.560), Train_Loss=4.8944625854:(7.1653), Train_Prec@1=0.000:(1.084), Train_Prec@5=0.000:(8.130)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.561:(5.561), Test_Speed=115.085:(115.085), Test_Loss=4.4823:(4.4823), Test_Prec@1=2.500:(2.500), Test_Prec@5=13.000:(13.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=7.731:(6.646), Test_Speed=82.779:(96.294), Test_Loss=3.6749:(4.0786), Test_Prec@1=3.500:(3.000), Test_Prec@5=22.500:(17.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=4.289:(5.860), Test_Speed=149.223:(109.206), Test_Loss=3.6397:(3.9323), Test_Prec@1=12.500:(6.167), Test_Prec@5=23.500:(19.667)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=1.988:(4.892), Test_Speed=321.992:(130.819), Test_Loss=4.1083:(3.9644), Test_Prec@1=0.746:(5.177), Test_Prec@5=11.194:(18.120)[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][0/10] Train_Time=0.203: avg-0.203, Train_Speed=3151.282 (3151.282), Train_Loss=4.1077394485:(4.1077), Train_Prec@1=5.000:(5.000), Train_Prec@5=20.000:(20.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][1/10] Train_Time=0.078: avg-0.141, Train_Speed=8164.508 (4547.392), Train_Loss=4.0614018440:(4.0846), Train_Prec@1=5.000:(5.000), Train_Prec@5=15.000:(17.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][2/10] Train_Time=0.084: avg-0.122, Train_Speed=7603.598 (5250.913), Train_Loss=3.9333519936:(4.0342), Train_Prec@1=10.000:(6.667), Train_Prec@5=15.000:(16.667)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][3/10] Train_Time=0.077: avg-0.111, Train_Speed=8302.229 (5782.195), Train_Loss=3.8538944721:(3.9891), Train_Prec@1=7.500:(6.875), Train_Prec@5=12.500:(15.625)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][4/10] Train_Time=0.078: avg-0.104, Train_Speed=8206.099 (6145.229), Train_Loss=3.7976939678:(3.9508), Train_Prec@1=0.000:(5.500), Train_Prec@5=15.000:(15.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][5/10] Train_Time=0.085: avg-0.101, Train_Speed=7510.482 (6337.225), Train_Loss=3.6345515251:(3.8981), Train_Prec@1=5.000:(5.417), Train_Prec@5=12.500:(15.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][6/10] Train_Time=0.086: avg-0.099, Train_Speed=7413.710 (6471.464), Train_Loss=3.7581901550:(3.8781), Train_Prec@1=0.000:(4.643), Train_Prec@5=7.500:(13.929)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][7/10] Train_Time=0.086: avg-0.097, Train_Speed=7436.499 (6578.170), Train_Loss=3.7341594696:(3.8601), Train_Prec@1=0.000:(4.062), Train_Prec@5=22.500:(15.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][8/10] Train_Time=0.083: avg-0.096, Train_Speed=7696.368 (6686.105), Train_Loss=3.7751414776:(3.8507), Train_Prec@1=2.500:(3.889), Train_Prec@5=15.000:(15.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [2][9/10] Train_Time=0.084: avg-0.095, Train_Speed=7608.033 (6768.120), Train_Loss=3.7229185104:(3.8476), Train_Prec@1=0.000:(3.794), Train_Prec@5=0.000:(14.634)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.592:(5.592), Test_Speed=114.441:(114.441), Test_Loss=3.4037:(3.4037), Test_Prec@1=11.000:(11.000), Test_Prec@5=33.000:(33.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=4.327:(4.960), Test_Speed=147.912:(129.041), Test_Loss=3.5968:(3.5003), Test_Prec@1=2.000:(6.500), Test_Prec@5=11.000:(22.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=5.739:(5.219), Test_Speed=111.515:(122.618), Test_Loss=3.5669:(3.5225), Test_Prec@1=7.000:(6.667), Test_Prec@5=30.500:(24.833)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=3.903:(4.890), Test_Speed=163.973:(130.869), Test_Loss=3.6831:(3.5518), Test_Prec@1=0.000:(5.450), Test_Prec@5=5.970:(21.390)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][0/10] Train_Time=0.206: avg-0.206, Train_Speed=3105.980 (3105.980), Train_Loss=3.6086375713:(3.6086), Train_Prec@1=2.500:(2.500), Train_Prec@5=17.500:(17.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][1/10] Train_Time=0.082: avg-0.144, Train_Speed=7841.112 (4449.463), Train_Loss=3.7509589195:(3.6798), Train_Prec@1=2.500:(2.500), Train_Prec@5=20.000:(18.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][2/10] Train_Time=0.085: avg-0.124, Train_Speed=7528.754 (5151.837), Train_Loss=3.7196998596:(3.6931), Train_Prec@1=2.500:(2.500), Train_Prec@5=12.500:(16.667)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][3/10] Train_Time=0.081: avg-0.113, Train_Speed=7941.093 (5647.772), Train_Loss=3.5043473244:(3.6459), Train_Prec@1=5.000:(3.125), Train_Prec@5=27.500:(19.375)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][4/10] Train_Time=0.084: avg-0.108, Train_Speed=7598.080 (5953.401), Train_Loss=3.5085463524:(3.6184), Train_Prec@1=2.500:(3.000), Train_Prec@5=20.000:(19.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][5/10] Train_Time=0.079: avg-0.103, Train_Speed=8096.580 (6228.169), Train_Loss=3.4984855652:(3.5984), Train_Prec@1=10.000:(4.167), Train_Prec@5=32.500:(21.667)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][6/10] Train_Time=0.077: avg-0.099, Train_Speed=8289.461 (6457.564), Train_Loss=3.6021206379:(3.5990), Train_Prec@1=5.000:(4.286), Train_Prec@5=20.000:(21.429)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][7/10] Train_Time=0.077: avg-0.096, Train_Speed=8361.324 (6646.735), Train_Loss=3.4643046856:(3.5821), Train_Prec@1=5.000:(4.375), Train_Prec@5=32.500:(22.812)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][8/10] Train_Time=0.078: avg-0.094, Train_Speed=8192.400 (6789.057), Train_Loss=3.3196282387:(3.5530), Train_Prec@1=12.500:(5.278), Train_Prec@5=35.000:(24.167)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [3][9/10] Train_Time=0.078: avg-0.093, Train_Speed=8166.052 (6905.501), Train_Loss=3.6429014206:(3.5552), Train_Prec@1=0.000:(5.149), Train_Prec@5=22.222:(24.119)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.524:(5.524), Test_Speed=115.858:(115.858), Test_Loss=3.0949:(3.0949), Test_Prec@1=23.500:(23.500), Test_Prec@5=57.500:(57.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=10.058:(7.791), Test_Speed=63.629:(82.145), Test_Loss=3.4141:(3.2545), Test_Prec@1=9.500:(16.500), Test_Prec@5=26.500:(42.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=4.780:(6.787), Test_Speed=133.888:(94.292), Test_Loss=3.3243:(3.2777), Test_Prec@1=13.500:(15.500), Test_Prec@5=30.500:(38.167)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=1.204:(5.392), Test_Speed=531.433:(118.702), Test_Loss=3.3187:(3.2852), Test_Prec@1=7.463:(14.033), Test_Prec@5=41.791:(38.828)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][0/10] Train_Time=0.192: avg-0.192, Train_Speed=3325.753 (3325.753), Train_Loss=3.3506069183:(3.3506), Train_Prec@1=5.000:(5.000), Train_Prec@5=37.500:(37.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][1/10] Train_Time=0.083: avg-0.138, Train_Speed=7734.443 (4651.427), Train_Loss=3.6223335266:(3.4865), Train_Prec@1=10.000:(7.500), Train_Prec@5=15.000:(26.250)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][2/10] Train_Time=0.075: avg-0.117, Train_Speed=8511.973 (5479.881), Train_Loss=3.2405631542:(3.4045), Train_Prec@1=20.000:(11.667), Train_Prec@5=37.500:(30.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][3/10] Train_Time=0.081: avg-0.108, Train_Speed=7865.042 (5929.422), Train_Loss=3.4384264946:(3.4130), Train_Prec@1=12.500:(11.875), Train_Prec@5=40.000:(32.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][4/10] Train_Time=0.087: avg-0.104, Train_Speed=7349.197 (6167.728), Train_Loss=3.4443652630:(3.4193), Train_Prec@1=5.000:(10.500), Train_Prec@5=30.000:(32.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][5/10] Train_Time=0.082: avg-0.100, Train_Speed=7850.395 (6396.224), Train_Loss=3.4348251820:(3.4219), Train_Prec@1=5.000:(9.583), Train_Prec@5=32.500:(32.083)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][6/10] Train_Time=0.075: avg-0.097, Train_Speed=8491.697 (6629.946), Train_Loss=3.2087364197:(3.3914), Train_Prec@1=12.500:(10.000), Train_Prec@5=40.000:(33.214)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][7/10] Train_Time=0.076: avg-0.094, Train_Speed=8401.483 (6809.426), Train_Loss=3.3682034016:(3.3885), Train_Prec@1=2.500:(9.062), Train_Prec@5=32.500:(33.125)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][8/10] Train_Time=0.078: avg-0.092, Train_Speed=8184.008 (6938.921), Train_Loss=3.1167869568:(3.3583), Train_Prec@1=12.500:(9.444), Train_Prec@5=55.000:(35.556)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [4][9/10] Train_Time=0.082: avg-0.091, Train_Speed=7812.259 (7017.369), Train_Loss=2.6225409508:(3.3404), Train_Prec@1=22.222:(9.756), Train_Prec@5=55.556:(36.043)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.441:(5.441), Test_Speed=117.627:(117.627), Test_Loss=2.8117:(2.8117), Test_Prec@1=23.500:(23.500), Test_Prec@5=63.000:(63.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=4.669:(5.055), Test_Speed=137.073:(126.608), Test_Loss=2.8677:(2.8397), Test_Prec@1=14.000:(18.750), Test_Prec@5=58.000:(60.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=3.009:(4.373), Test_Speed=212.674:(146.350), Test_Loss=2.5927:(2.7574), Test_Prec@1=29.000:(22.167), Test_Prec@5=62.000:(61.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=2.827:(3.986), Test_Speed=226.412:(160.542), Test_Loss=3.4538:(2.8845), Test_Prec@1=6.716:(19.346), Test_Prec@5=28.358:(55.041)[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][0/10] Train_Time=0.193: avg-0.193, Train_Speed=3317.923 (3317.923), Train_Loss=2.9986789227:(2.9987), Train_Prec@1=15.000:(15.000), Train_Prec@5=47.500:(47.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][1/10] Train_Time=0.089: avg-0.141, Train_Speed=7174.844 (4537.522), Train_Loss=2.9379010201:(2.9683), Train_Prec@1=10.000:(12.500), Train_Prec@5=47.500:(47.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][2/10] Train_Time=0.090: avg-0.124, Train_Speed=7141.547 (5165.335), Train_Loss=2.9515042305:(2.9627), Train_Prec@1=15.000:(13.333), Train_Prec@5=60.000:(51.667)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][3/10] Train_Time=0.084: avg-0.114, Train_Speed=7656.743 (5622.726), Train_Loss=2.8524458408:(2.9351), Train_Prec@1=17.500:(14.375), Train_Prec@5=52.500:(51.875)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][4/10] Train_Time=0.083: avg-0.108, Train_Speed=7679.959 (5941.010), Train_Loss=2.9669632912:(2.9415), Train_Prec@1=30.000:(17.500), Train_Prec@5=60.000:(53.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][5/10] Train_Time=0.077: avg-0.103, Train_Speed=8358.965 (6241.939), Train_Loss=2.8268237114:(2.9224), Train_Prec@1=22.500:(18.333), Train_Prec@5=47.500:(52.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][6/10] Train_Time=0.078: avg-0.099, Train_Speed=8235.803 (6465.552), Train_Loss=3.0546443462:(2.9413), Train_Prec@1=17.500:(18.214), Train_Prec@5=52.500:(52.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][7/10] Train_Time=0.085: avg-0.097, Train_Speed=7567.727 (6585.441), Train_Loss=2.6283802986:(2.9022), Train_Prec@1=27.500:(19.375), Train_Prec@5=62.500:(53.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][8/10] Train_Time=0.077: avg-0.095, Train_Speed=8361.340 (6744.609), Train_Loss=2.7966251373:(2.8904), Train_Prec@1=22.500:(19.722), Train_Prec@5=57.500:(54.167)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [5][9/10] Train_Time=0.071: avg-0.093, Train_Speed=8998.583 (6917.889), Train_Loss=2.9459862709:(2.8918), Train_Prec@1=0.000:(19.241), Train_Prec@5=55.556:(54.201)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.519:(5.519), Test_Speed=115.972:(115.972), Test_Loss=1.9735:(1.9735), Test_Prec@1=44.500:(44.500), Test_Prec@5=79.000:(79.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=5.225:(5.372), Test_Speed=122.480:(119.137), Test_Loss=2.5305:(2.2520), Test_Prec@1=26.000:(35.250), Test_Prec@5=68.000:(73.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=8.564:(6.436), Test_Speed=74.727:(99.439), Test_Loss=1.8235:(2.1092), Test_Prec@1=52.500:(41.000), Test_Prec@5=80.500:(75.833)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=1.558:(5.217), Test_Speed=410.657:(122.683), Test_Loss=2.3916:(2.1607), Test_Prec@1=24.627:(38.011), Test_Prec@5=76.866:(76.022)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][0/10] Train_Time=0.199: avg-0.199, Train_Speed=3208.670 (3208.670), Train_Loss=2.6435601711:(2.6436), Train_Prec@1=20.000:(20.000), Train_Prec@5=65.000:(65.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][1/10] Train_Time=0.085: avg-0.142, Train_Speed=7548.294 (4503.126), Train_Loss=2.2736904621:(2.4586), Train_Prec@1=32.500:(26.250), Train_Prec@5=75.000:(70.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][2/10] Train_Time=0.084: avg-0.123, Train_Speed=7605.666 (5211.801), Train_Loss=2.3207097054:(2.4127), Train_Prec@1=30.000:(27.500), Train_Prec@5=72.500:(70.833)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][3/10] Train_Time=0.090: avg-0.115, Train_Speed=7101.012 (5583.147), Train_Loss=2.6631219387:(2.4753), Train_Prec@1=30.000:(28.125), Train_Prec@5=62.500:(68.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][4/10] Train_Time=0.076: avg-0.107, Train_Speed=8367.318 (5981.188), Train_Loss=2.0550198555:(2.3912), Train_Prec@1=50.000:(32.500), Train_Prec@5=77.500:(70.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][5/10] Train_Time=0.084: avg-0.103, Train_Speed=7649.394 (6206.787), Train_Loss=2.1495366096:(2.3509), Train_Prec@1=40.000:(33.750), Train_Prec@5=75.000:(71.250)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][6/10] Train_Time=0.085: avg-0.101, Train_Speed=7518.847 (6365.472), Train_Loss=2.2757160664:(2.3402), Train_Prec@1=37.500:(34.286), Train_Prec@5=70.000:(71.071)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][7/10] Train_Time=0.075: avg-0.097, Train_Speed=8487.562 (6570.830), Train_Loss=1.9579169750:(2.2924), Train_Prec@1=47.500:(35.938), Train_Prec@5=75.000:(71.562)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][8/10] Train_Time=0.077: avg-0.095, Train_Speed=8342.402 (6729.617), Train_Loss=2.3541994095:(2.2993), Train_Prec@1=30.000:(35.278), Train_Prec@5=72.500:(71.667)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [6][9/10] Train_Time=0.071: avg-0.093, Train_Speed=8999.065 (6903.720), Train_Loss=2.4439833164:(2.3028), Train_Prec@1=22.222:(34.959), Train_Prec@5=66.667:(71.545)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.673:(5.673), Test_Speed=112.824:(112.824), Test_Loss=1.3090:(1.3090), Test_Prec@1=59.000:(59.000), Test_Prec@5=91.500:(91.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=6.410:(6.041), Test_Speed=99.841:(105.936), Test_Loss=2.3080:(1.8085), Test_Prec@1=23.000:(41.000), Test_Prec@5=76.500:(84.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=6.697:(6.260), Test_Speed=95.558:(102.235), Test_Loss=1.2497:(1.6222), Test_Prec@1=68.000:(50.000), Test_Prec@5=90.000:(86.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=2.555:(5.334), Test_Speed=250.455:(119.988), Test_Loss=1.3643:(1.5751), Test_Prec@1=69.403:(53.542), Test_Prec@5=91.045:(86.921)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][0/10] Train_Time=0.199: avg-0.199, Train_Speed=3211.219 (3211.219), Train_Loss=2.1009678841:(2.1010), Train_Prec@1=40.000:(40.000), Train_Prec@5=80.000:(80.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][1/10] Train_Time=0.085: avg-0.142, Train_Speed=7544.692 (4504.994), Train_Loss=1.7594921589:(1.9302), Train_Prec@1=42.500:(41.250), Train_Prec@5=82.500:(81.250)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][2/10] Train_Time=0.081: avg-0.122, Train_Speed=7924.584 (5261.853), Train_Loss=1.9830576181:(1.9478), Train_Prec@1=47.500:(43.333), Train_Prec@5=77.500:(80.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][3/10] Train_Time=0.084: avg-0.112, Train_Speed=7598.489 (5700.064), Train_Loss=1.8839476109:(1.9319), Train_Prec@1=50.000:(45.000), Train_Prec@5=72.500:(78.125)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][4/10] Train_Time=0.083: avg-0.106, Train_Speed=7740.907 (6017.352), Train_Loss=1.6288344860:(1.8713), Train_Prec@1=57.500:(47.500), Train_Prec@5=87.500:(80.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][5/10] Train_Time=0.084: avg-0.103, Train_Speed=7651.043 (6239.397), Train_Loss=1.4133325815:(1.7949), Train_Prec@1=62.500:(50.000), Train_Prec@5=90.000:(81.667)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][6/10] Train_Time=0.075: avg-0.099, Train_Speed=8484.129 (6484.492), Train_Loss=1.8177483082:(1.7982), Train_Prec@1=50.000:(50.000), Train_Prec@5=92.500:(83.214)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][7/10] Train_Time=0.074: avg-0.096, Train_Speed=8601.798 (6690.343), Train_Loss=1.4221055508:(1.7512), Train_Prec@1=65.000:(51.875), Train_Prec@5=85.000:(83.438)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][8/10] Train_Time=0.076: avg-0.093, Train_Speed=8411.408 (6845.983), Train_Loss=1.6067453623:(1.7351), Train_Prec@1=55.000:(52.222), Train_Prec@5=87.500:(83.889)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [7][9/10] Train_Time=0.070: avg-0.091, Train_Speed=9143.545 (7022.441), Train_Loss=0.9219682217:(1.7153), Train_Prec@1=77.778:(52.846), Train_Prec@5=77.778:(83.740)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.529:(5.529), Test_Speed=115.754:(115.754), Test_Loss=0.9726:(0.9726), Test_Prec@1=70.000:(70.000), Test_Prec@5=97.500:(97.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=7.165:(6.347), Test_Speed=89.321:(100.834), Test_Loss=1.2605:(1.1165), Test_Prec@1=63.000:(66.500), Test_Prec@5=89.000:(93.250)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=5.711:(6.135), Test_Speed=112.056:(104.317), Test_Loss=0.8266:(1.0199), Test_Prec@1=76.000:(69.667), Test_Prec@5=94.000:(93.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=1.909:(5.079), Test_Speed=335.250:(126.018), Test_Loss=0.9211:(1.0018), Test_Prec@1=76.866:(70.981), Test_Prec@5=93.284:(93.460)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][0/10] Train_Time=0.191: avg-0.191, Train_Speed=3355.531 (3355.531), Train_Loss=1.5858459473:(1.5858), Train_Prec@1=55.000:(55.000), Train_Prec@5=82.500:(82.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][1/10] Train_Time=0.092: avg-0.141, Train_Speed=6958.418 (4527.691), Train_Loss=1.1652669907:(1.3756), Train_Prec@1=67.500:(61.250), Train_Prec@5=95.000:(88.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][2/10] Train_Time=0.083: avg-0.122, Train_Speed=7695.097 (5247.699), Train_Loss=1.4644672871:(1.4052), Train_Prec@1=62.500:(61.667), Train_Prec@5=87.500:(88.333)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][3/10] Train_Time=0.086: avg-0.113, Train_Speed=7455.302 (5667.233), Train_Loss=1.4803739786:(1.4240), Train_Prec@1=55.000:(60.000), Train_Prec@5=87.500:(88.125)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][4/10] Train_Time=0.077: avg-0.106, Train_Speed=8338.608 (6055.205), Train_Loss=1.4431465864:(1.4278), Train_Prec@1=47.500:(57.500), Train_Prec@5=92.500:(89.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][5/10] Train_Time=0.081: avg-0.102, Train_Speed=7922.016 (6302.743), Train_Loss=1.4755251408:(1.4358), Train_Prec@1=62.500:(58.333), Train_Prec@5=87.500:(88.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][6/10] Train_Time=0.082: avg-0.099, Train_Speed=7817.259 (6482.151), Train_Loss=1.1890274286:(1.4005), Train_Prec@1=70.000:(60.000), Train_Prec@5=90.000:(88.929)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][7/10] Train_Time=0.075: avg-0.096, Train_Speed=8558.254 (6684.856), Train_Loss=1.1216810942:(1.3657), Train_Prec@1=60.000:(60.000), Train_Prec@5=95.000:(89.688)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][8/10] Train_Time=0.076: avg-0.094, Train_Speed=8444.574 (6843.305), Train_Loss=1.3731523752:(1.3665), Train_Prec@1=67.500:(60.833), Train_Prec@5=92.500:(90.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [8][9/10] Train_Time=0.068: avg-0.091, Train_Speed=9365.578 (7032.705), Train_Loss=1.2134242058:(1.3628), Train_Prec@1=66.667:(60.976), Train_Prec@5=88.889:(89.973)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.504:(5.504), Test_Speed=116.271:(116.271), Test_Loss=0.7213:(0.7213), Test_Prec@1=76.500:(76.500), Test_Prec@5=98.000:(98.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=8.114:(6.809), Test_Speed=78.878:(93.992), Test_Loss=0.6891:(0.7052), Test_Prec@1=78.500:(77.500), Test_Prec@5=97.500:(97.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=6.000:(6.539), Test_Speed=106.672:(97.870), Test_Loss=0.7161:(0.7088), Test_Prec@1=81.500:(78.833), Test_Prec@5=95.000:(96.833)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=1.422:(5.260), Test_Speed=450.133:(121.675), Test_Loss=0.6422:(0.6967), Test_Prec@1=78.358:(78.747), Test_Prec@5=94.776:(96.458)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][0/10] Train_Time=0.202: avg-0.202, Train_Speed=3170.809 (3170.809), Train_Loss=1.1237343550:(1.1237), Train_Prec@1=67.500:(67.500), Train_Prec@5=87.500:(87.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][1/10] Train_Time=0.083: avg-0.142, Train_Speed=7732.304 (4497.369), Train_Loss=1.4102339745:(1.2670), Train_Prec@1=65.000:(66.250), Train_Prec@5=77.500:(82.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][2/10] Train_Time=0.078: avg-0.121, Train_Speed=8256.687 (5302.055), Train_Loss=0.7826581597:(1.1055), Train_Prec@1=82.500:(71.667), Train_Prec@5=95.000:(86.667)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][3/10] Train_Time=0.075: avg-0.109, Train_Speed=8582.355 (5862.210), Train_Loss=1.3019566536:(1.1546), Train_Prec@1=62.500:(69.375), Train_Prec@5=92.500:(88.125)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][4/10] Train_Time=0.080: avg-0.103, Train_Speed=8050.111 (6199.179), Train_Loss=1.3896020651:(1.2016), Train_Prec@1=65.000:(68.500), Train_Prec@5=90.000:(88.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][5/10] Train_Time=0.076: avg-0.099, Train_Speed=8436.400 (6485.839), Train_Loss=1.2484924793:(1.2094), Train_Prec@1=60.000:(67.083), Train_Prec@5=90.000:(88.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][6/10] Train_Time=0.076: avg-0.095, Train_Speed=8469.417 (6710.352), Train_Loss=1.2255108356:(1.2117), Train_Prec@1=52.500:(65.000), Train_Prec@5=97.500:(90.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][7/10] Train_Time=0.075: avg-0.093, Train_Speed=8477.045 (6889.841), Train_Loss=1.1420165300:(1.2030), Train_Prec@1=65.000:(65.000), Train_Prec@5=90.000:(90.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][8/10] Train_Time=0.076: avg-0.091, Train_Speed=8456.466 (7034.643), Train_Loss=1.1703811884:(1.1994), Train_Prec@1=67.500:(65.278), Train_Prec@5=90.000:(90.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [9][9/10] Train_Time=0.070: avg-0.089, Train_Speed=9127.210 (7199.708), Train_Loss=0.4490440786:(1.1811), Train_Prec@1=88.889:(65.854), Train_Prec@5=100.000:(90.244)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.628:(5.628), Test_Speed=113.714:(113.714), Test_Loss=0.6146:(0.6146), Test_Prec@1=78.500:(78.500), Test_Prec@5=99.000:(99.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=4.553:(5.091), Test_Speed=140.552:(125.717), Test_Loss=0.5520:(0.5833), Test_Prec@1=83.000:(80.750), Test_Prec@5=97.500:(98.250)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=5.734:(5.305), Test_Speed=111.615:(120.636), Test_Loss=0.4670:(0.5445), Test_Prec@1=87.500:(83.000), Test_Prec@5=97.500:(98.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=2.198:(4.528), Test_Speed=291.126:(141.327), Test_Loss=0.4645:(0.5299), Test_Prec@1=85.075:(83.379), Test_Prec@5=99.254:(98.229)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][0/10] Train_Time=0.209: avg-0.209, Train_Speed=3068.576 (3068.576), Train_Loss=0.9989653826:(0.9990), Train_Prec@1=67.500:(67.500), Train_Prec@5=92.500:(92.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][1/10] Train_Time=0.081: avg-0.145, Train_Speed=7917.329 (4422.926), Train_Loss=0.6035019159:(0.8012), Train_Prec@1=80.000:(73.750), Train_Prec@5=100.000:(96.250)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][2/10] Train_Time=0.086: avg-0.125, Train_Speed=7473.243 (5119.454), Train_Loss=1.2590620518:(0.9538), Train_Prec@1=65.000:(70.833), Train_Prec@5=95.000:(95.833)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][3/10] Train_Time=0.085: avg-0.115, Train_Speed=7485.559 (5558.717), Train_Loss=1.3455102444:(1.0518), Train_Prec@1=55.000:(66.875), Train_Prec@5=85.000:(93.125)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][4/10] Train_Time=0.087: avg-0.110, Train_Speed=7352.023 (5843.801), Train_Loss=1.0824329853:(1.0579), Train_Prec@1=70.000:(67.500), Train_Prec@5=92.500:(93.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][5/10] Train_Time=0.075: avg-0.104, Train_Speed=8482.590 (6163.353), Train_Loss=1.1350269318:(1.0707), Train_Prec@1=62.500:(66.667), Train_Prec@5=92.500:(92.917)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][6/10] Train_Time=0.084: avg-0.101, Train_Speed=7621.266 (6336.517), Train_Loss=0.7468171120:(1.0245), Train_Prec@1=77.500:(68.214), Train_Prec@5=95.000:(93.214)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][7/10] Train_Time=0.082: avg-0.099, Train_Speed=7785.856 (6487.472), Train_Loss=1.1302279234:(1.0377), Train_Prec@1=75.000:(69.062), Train_Prec@5=90.000:(92.812)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][8/10] Train_Time=0.077: avg-0.096, Train_Speed=8339.172 (6651.581), Train_Loss=1.0281229019:(1.0366), Train_Prec@1=65.000:(68.611), Train_Prec@5=97.500:(93.333)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [10][9/10] Train_Time=0.077: avg-0.094, Train_Speed=8263.402 (6783.904), Train_Loss=1.3782483339:(1.0450), Train_Prec@1=33.333:(67.751), Train_Prec@5=88.889:(93.225)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.526:(5.526), Test_Speed=115.808:(115.808), Test_Loss=0.6036:(0.6036), Test_Prec@1=78.500:(78.500), Test_Prec@5=99.000:(99.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=7.225:(6.376), Test_Speed=88.581:(100.381), Test_Loss=0.4806:(0.5421), Test_Prec@1=83.500:(81.000), Test_Prec@5=98.000:(98.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=4.251:(5.667), Test_Speed=150.568:(112.928), Test_Loss=0.3729:(0.4857), Test_Prec@1=91.000:(84.333), Test_Prec@5=99.000:(98.667)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=2.291:(4.823), Test_Speed=279.344:(132.690), Test_Loss=0.3230:(0.4560), Test_Prec@1=91.045:(85.559), Test_Prec@5=100.000:(98.910)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][0/10] Train_Time=0.201: avg-0.201, Train_Speed=3189.490 (3189.490), Train_Loss=0.8286902308:(0.8287), Train_Prec@1=72.500:(72.500), Train_Prec@5=97.500:(97.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][1/10] Train_Time=0.084: avg-0.142, Train_Speed=7627.433 (4498.067), Train_Loss=1.1078059673:(0.9682), Train_Prec@1=65.000:(68.750), Train_Prec@5=92.500:(95.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][2/10] Train_Time=0.086: avg-0.123, Train_Speed=7468.054 (5185.475), Train_Loss=0.7389863729:(0.8918), Train_Prec@1=72.500:(70.000), Train_Prec@5=97.500:(95.833)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][3/10] Train_Time=0.087: avg-0.114, Train_Speed=7316.981 (5592.782), Train_Loss=1.2803102732:(0.9889), Train_Prec@1=67.500:(69.375), Train_Prec@5=85.000:(93.125)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][4/10] Train_Time=0.090: avg-0.109, Train_Speed=7146.874 (5847.072), Train_Loss=0.9544256926:(0.9820), Train_Prec@1=72.500:(70.000), Train_Prec@5=90.000:(92.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][5/10] Train_Time=0.085: avg-0.105, Train_Speed=7501.680 (6070.218), Train_Loss=0.9359269142:(0.9744), Train_Prec@1=72.500:(70.417), Train_Prec@5=92.500:(92.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][6/10] Train_Time=0.076: avg-0.101, Train_Speed=8407.341 (6321.250), Train_Loss=0.8768768311:(0.9604), Train_Prec@1=80.000:(71.786), Train_Prec@5=95.000:(92.857)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][7/10] Train_Time=0.075: avg-0.098, Train_Speed=8518.661 (6531.864), Train_Loss=0.9064676166:(0.9537), Train_Prec@1=75.000:(72.188), Train_Prec@5=95.000:(93.125)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][8/10] Train_Time=0.081: avg-0.096, Train_Speed=7902.715 (6660.233), Train_Loss=0.7273988128:(0.9285), Train_Prec@1=82.500:(73.333), Train_Prec@5=92.500:(93.056)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [11][9/10] Train_Time=0.070: avg-0.093, Train_Speed=9180.111 (6848.212), Train_Loss=0.9379640222:(0.9288), Train_Prec@1=66.667:(73.171), Train_Prec@5=100.000:(93.225)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.533:(5.533), Test_Speed=115.663:(115.663), Test_Loss=0.5710:(0.5710), Test_Prec@1=81.000:(81.000), Test_Prec@5=99.000:(99.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=4.310:(4.922), Test_Speed=148.501:(130.041), Test_Loss=0.4356:(0.5033), Test_Prec@1=86.500:(83.750), Test_Prec@5=98.500:(98.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=4.128:(4.657), Test_Speed=155.049:(137.429), Test_Loss=0.2413:(0.4160), Test_Prec@1=94.500:(87.333), Test_Prec@5=99.500:(99.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=4.924:(4.724), Test_Speed=129.963:(135.484), Test_Loss=0.3322:(0.4007), Test_Prec@1=89.552:(87.738), Test_Prec@5=100.000:(99.183)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][0/10] Train_Time=0.192: avg-0.192, Train_Speed=3333.138 (3333.138), Train_Loss=0.8345602751:(0.8346), Train_Prec@1=80.000:(80.000), Train_Prec@5=95.000:(95.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][1/10] Train_Time=0.085: avg-0.139, Train_Speed=7488.730 (4613.061), Train_Loss=0.7747018933:(0.8046), Train_Prec@1=77.500:(78.750), Train_Prec@5=92.500:(93.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][2/10] Train_Time=0.076: avg-0.118, Train_Speed=8366.625 (5424.228), Train_Loss=0.7131794095:(0.7741), Train_Prec@1=82.500:(80.000), Train_Prec@5=97.500:(95.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][3/10] Train_Time=0.079: avg-0.108, Train_Speed=8079.531 (5909.784), Train_Loss=0.6930959225:(0.7539), Train_Prec@1=80.000:(80.000), Train_Prec@5=95.000:(95.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][4/10] Train_Time=0.080: avg-0.103, Train_Speed=8034.089 (6239.757), Train_Loss=0.7309237719:(0.7493), Train_Prec@1=77.500:(79.500), Train_Prec@5=97.500:(95.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][5/10] Train_Time=0.079: avg-0.099, Train_Speed=8059.227 (6483.720), Train_Loss=0.9301735163:(0.7794), Train_Prec@1=72.500:(78.333), Train_Prec@5=92.500:(95.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][6/10] Train_Time=0.075: avg-0.095, Train_Speed=8578.032 (6718.034), Train_Loss=0.6907287836:(0.7668), Train_Prec@1=77.500:(78.214), Train_Prec@5=90.000:(94.286)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][7/10] Train_Time=0.078: avg-0.093, Train_Speed=8182.955 (6871.809), Train_Loss=0.7705169916:(0.7672), Train_Prec@1=75.000:(77.812), Train_Prec@5=95.000:(94.375)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][8/10] Train_Time=0.075: avg-0.091, Train_Speed=8482.777 (7019.938), Train_Loss=0.8436209559:(0.7757), Train_Prec@1=72.500:(77.222), Train_Prec@5=92.500:(94.167)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [12][9/10] Train_Time=0.071: avg-0.089, Train_Speed=9011.736 (7178.601), Train_Loss=0.8510024548:(0.7776), Train_Prec@1=77.778:(77.236), Train_Prec@5=88.889:(94.038)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.628:(5.628), Test_Speed=113.716:(113.716), Test_Loss=0.4730:(0.4730), Test_Prec@1=82.000:(82.000), Test_Prec@5=99.500:(99.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=3.759:(4.693), Test_Speed=170.260:(136.359), Test_Loss=0.3683:(0.4207), Test_Prec@1=86.000:(84.000), Test_Prec@5=99.500:(99.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=4.316:(4.568), Test_Speed=148.278:(140.113), Test_Loss=0.2196:(0.3536), Test_Prec@1=94.500:(87.500), Test_Prec@5=99.000:(99.333)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=4.551:(4.564), Test_Speed=140.615:(140.238), Test_Loss=0.1971:(0.3251), Test_Prec@1=94.030:(88.692), Test_Prec@5=100.000:(99.455)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][0/10] Train_Time=0.239: avg-0.239, Train_Speed=2682.513 (2682.513), Train_Loss=0.7501503229:(0.7502), Train_Prec@1=85.000:(85.000), Train_Prec@5=90.000:(90.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][1/10] Train_Time=0.078: avg-0.158, Train_Speed=8168.001 (4038.660), Train_Loss=0.9136956334:(0.8319), Train_Prec@1=72.500:(78.750), Train_Prec@5=92.500:(91.250)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][2/10] Train_Time=0.077: avg-0.131, Train_Speed=8353.009 (4878.596), Train_Loss=0.5808140039:(0.7482), Train_Prec@1=75.000:(77.500), Train_Prec@5=97.500:(93.333)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][3/10] Train_Time=0.080: avg-0.118, Train_Speed=7989.897 (5404.755), Train_Loss=0.4589876533:(0.6759), Train_Prec@1=80.000:(78.125), Train_Prec@5=97.500:(94.375)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][4/10] Train_Time=0.088: avg-0.112, Train_Speed=7280.996 (5698.441), Train_Loss=0.7718492746:(0.6951), Train_Prec@1=70.000:(76.500), Train_Prec@5=95.000:(94.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][5/10] Train_Time=0.087: avg-0.108, Train_Speed=7360.216 (5921.256), Train_Loss=0.5784369111:(0.6757), Train_Prec@1=80.000:(77.083), Train_Prec@5=95.000:(94.583)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][6/10] Train_Time=0.085: avg-0.105, Train_Speed=7569.208 (6111.334), Train_Loss=0.4758160710:(0.6471), Train_Prec@1=80.000:(77.500), Train_Prec@5=100.000:(95.357)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][7/10] Train_Time=0.081: avg-0.102, Train_Speed=7905.424 (6289.762), Train_Loss=0.8234942555:(0.6692), Train_Prec@1=72.500:(76.875), Train_Prec@5=92.500:(95.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][8/10] Train_Time=0.075: avg-0.099, Train_Speed=8501.449 (6476.986), Train_Loss=0.6405639648:(0.6660), Train_Prec@1=75.000:(76.667), Train_Prec@5=97.500:(95.278)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [13][9/10] Train_Time=0.069: avg-0.096, Train_Speed=9287.132 (6679.085), Train_Loss=0.8634245396:(0.6708), Train_Prec@1=77.778:(76.694), Train_Prec@5=88.889:(95.122)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.551:(5.551), Test_Speed=115.293:(115.293), Test_Loss=0.4177:(0.4177), Test_Prec@1=84.000:(84.000), Test_Prec@5=99.500:(99.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=7.227:(6.389), Test_Speed=88.554:(100.170), Test_Loss=0.3299:(0.3738), Test_Prec@1=89.000:(86.500), Test_Prec@5=99.000:(99.250)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=3.267:(5.348), Test_Speed=195.897:(119.661), Test_Loss=0.2432:(0.3303), Test_Prec@1=94.500:(89.167), Test_Prec@5=98.500:(99.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=2.443:(4.622), Test_Speed=262.017:(138.469), Test_Loss=0.1863:(0.3040), Test_Prec@1=92.537:(89.782), Test_Prec@5=100.000:(99.183)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][0/10] Train_Time=0.208: avg-0.208, Train_Speed=3083.479 (3083.479), Train_Loss=0.6051532030:(0.6052), Train_Prec@1=82.500:(82.500), Train_Prec@5=97.500:(97.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][1/10] Train_Time=0.097: avg-0.152, Train_Speed=6591.323 (4201.472), Train_Loss=0.7911351323:(0.6981), Train_Prec@1=70.000:(76.250), Train_Prec@5=92.500:(95.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][2/10] Train_Time=0.080: avg-0.128, Train_Speed=8017.565 (4993.760), Train_Loss=0.8843487501:(0.7602), Train_Prec@1=75.000:(75.833), Train_Prec@5=90.000:(93.333)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][3/10] Train_Time=0.076: avg-0.115, Train_Speed=8390.858 (5556.118), Train_Loss=0.6116586924:(0.7231), Train_Prec@1=85.000:(78.125), Train_Prec@5=95.000:(93.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][4/10] Train_Time=0.079: avg-0.108, Train_Speed=8109.137 (5929.476), Train_Loss=0.9750863910:(0.7735), Train_Prec@1=70.000:(76.500), Train_Prec@5=90.000:(93.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][5/10] Train_Time=0.075: avg-0.102, Train_Speed=8528.497 (6246.755), Train_Loss=0.7230252624:(0.7651), Train_Prec@1=77.500:(76.667), Train_Prec@5=97.500:(93.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][6/10] Train_Time=0.078: avg-0.099, Train_Speed=8196.748 (6466.523), Train_Loss=0.9424335361:(0.7904), Train_Prec@1=65.000:(75.000), Train_Prec@5=95.000:(93.929)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][7/10] Train_Time=0.077: avg-0.096, Train_Speed=8313.650 (6651.244), Train_Loss=0.5744396448:(0.7634), Train_Prec@1=82.500:(75.938), Train_Prec@5=100.000:(94.688)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][8/10] Train_Time=0.077: avg-0.094, Train_Speed=8334.212 (6803.905), Train_Loss=0.7431421280:(0.7612), Train_Prec@1=70.000:(75.278), Train_Prec@5=97.500:(95.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [14][9/10] Train_Time=0.070: avg-0.092, Train_Speed=9160.932 (6983.586), Train_Loss=0.2147655785:(0.7478), Train_Prec@1=100.000:(75.881), Train_Prec@5=100.000:(95.122)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.520:(5.520), Test_Speed=115.943:(115.943), Test_Loss=0.4138:(0.4138), Test_Prec@1=86.500:(86.500), Test_Prec@5=100.000:(100.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=5.114:(5.317), Test_Speed=125.144:(120.368), Test_Loss=0.2462:(0.3300), Test_Prec@1=92.500:(89.500), Test_Prec@5=99.500:(99.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=5.327:(5.320), Test_Speed=120.141:(120.292), Test_Loss=0.2575:(0.3059), Test_Prec@1=94.000:(91.000), Test_Prec@5=98.500:(99.333)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=3.602:(4.891), Test_Speed=177.672:(130.857), Test_Loss=0.2758:(0.3004), Test_Prec@1=88.060:(90.463), Test_Prec@5=99.254:(99.319)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][0/10] Train_Time=0.200: avg-0.200, Train_Speed=3194.864 (3194.864), Train_Loss=0.7077129483:(0.7077), Train_Prec@1=77.500:(77.500), Train_Prec@5=100.000:(100.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][1/10] Train_Time=0.084: avg-0.142, Train_Speed=7654.957 (4508.193), Train_Loss=0.7298768759:(0.7188), Train_Prec@1=72.500:(75.000), Train_Prec@5=97.500:(98.750)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][2/10] Train_Time=0.079: avg-0.121, Train_Speed=8122.652 (5293.347), Train_Loss=0.6259239912:(0.6878), Train_Prec@1=77.500:(75.833), Train_Prec@5=95.000:(97.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][3/10] Train_Time=0.076: avg-0.110, Train_Speed=8398.156 (5832.409), Train_Loss=0.2863280177:(0.5875), Train_Prec@1=95.000:(80.625), Train_Prec@5=100.000:(98.125)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][4/10] Train_Time=0.076: avg-0.103, Train_Speed=8396.816 (6211.830), Train_Loss=0.3539518118:(0.5408), Train_Prec@1=90.000:(82.500), Train_Prec@5=97.500:(98.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][5/10] Train_Time=0.075: avg-0.098, Train_Speed=8494.105 (6503.047), Train_Loss=0.4953709245:(0.5332), Train_Prec@1=85.000:(82.917), Train_Prec@5=97.500:(97.917)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][6/10] Train_Time=0.076: avg-0.095, Train_Speed=8379.239 (6717.934), Train_Loss=0.7050105333:(0.5577), Train_Prec@1=77.500:(82.143), Train_Prec@5=95.000:(97.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][7/10] Train_Time=0.079: avg-0.093, Train_Speed=8060.554 (6860.781), Train_Loss=0.7342107296:(0.5798), Train_Prec@1=77.500:(81.562), Train_Prec@5=97.500:(97.500)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][8/10] Train_Time=0.076: avg-0.091, Train_Speed=8421.156 (7005.000), Train_Loss=0.5453255773:(0.5760), Train_Prec@1=75.000:(80.833), Train_Prec@5=100.000:(97.778)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Epoch: [15][9/10] Train_Time=0.073: avg-0.090, Train_Speed=8734.198 (7146.487), Train_Loss=0.9097739458:(0.5841), Train_Prec@1=66.667:(80.488), Train_Prec@5=100.000:(97.832)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [0/4] Test_Time=5.628:(5.628), Test_Speed=113.707:(113.707), Test_Loss=0.4443:(0.4443), Test_Prec@1=87.000:(87.000), Test_Prec@5=99.000:(99.000)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [1/4] Test_Time=6.238:(5.933), Test_Speed=102.593:(107.864), Test_Loss=0.2860:(0.3651), Test_Prec@1=90.000:(88.500), Test_Prec@5=100.000:(99.500)[1,mpirank:0,algo-1]:\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [2/4] Test_Time=3.799:(5.222), Test_Speed=168.459:(122.559), Test_Loss=0.2269:(0.3190), Test_Prec@1=95.500:(90.833), Test_Prec@5=98.500:(99.167)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Test: [3/4] Test_Time=2.757:(4.606), Test_Speed=232.135:(138.957), Test_Loss=0.1984:(0.2970), Test_Prec@1=91.791:(91.008), Test_Prec@5=100.000:(99.319)\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:INFO:util:Saving the model.\u001b[0m\n", "\u001b[34m[1,mpirank:0,algo-1]:Saving the model.\u001b[0m\n", "\u001b[34m2023-07-02 09:15:42,211 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code.\u001b[0m\n", "\u001b[34m2023-07-02 09:15:42,211 sagemaker-training-toolkit INFO Done waiting for a return code. Received 0 from exiting process.\u001b[0m\n", "\u001b[34m2023-07-02 09:15:42,212 sagemaker-training-toolkit INFO Begin writing status file from leader node to worker nodes\u001b[0m\n", "\u001b[34m2023-07-02 09:15:42,212 sagemaker-training-toolkit INFO Start writing mpirun finished status to algo-2\u001b[0m\n", "\u001b[34m2023-07-02 09:15:42,369 sagemaker-training-toolkit INFO output from subprocess run CompletedProcess(args=['ssh', 'algo-2', 'touch', '/tmp/done.algo-1'], returncode=0, stdout='', stderr='')\u001b[0m\n", "\u001b[34m2023-07-02 09:15:42,369 sagemaker-training-toolkit INFO Finished writing status file\u001b[0m\n", "\u001b[35m2023-07-02 09:15:42,244 sagemaker-training-toolkit INFO Invoked on_terminate from psutil.wait_for_procs\u001b[0m\n", "\u001b[35m2023-07-02 09:15:42,244 sagemaker-training-toolkit INFO process psutil.Process(pid=68, name='orted', status='terminated', started='09:05:04') terminated with exit code None\u001b[0m\n", "\u001b[35m2023-07-02 09:15:42,244 sagemaker-training-toolkit INFO Reporting status for ORTEd process. gone: [psutil.Process(pid=68, name='orted', status='terminated', started='09:05:04')] alive: []\u001b[0m\n", "\u001b[35m2023-07-02 09:15:42,244 sagemaker-training-toolkit INFO Orted process exited\u001b[0m\n", "\u001b[34m2023-07-02 09:16:12,399 sagemaker-training-toolkit INFO Finished writing status file from leader node to worker nodes\u001b[0m\n", "\u001b[34m2023-07-02 09:16:12,400 sagemaker-training-toolkit INFO Reporting training SUCCESS\u001b[0m\n", "\u001b[35m2023-07-02 09:16:12,274 sagemaker-training-toolkit INFO Begin looking for status file on algo-2\u001b[0m\n", "\u001b[35m2023-07-02 09:16:12,275 sagemaker-training-toolkit INFO MPI training job status file found. Exit gracefully\u001b[0m\n", "\u001b[35m2023-07-02 09:16:12,275 sagemaker-training-toolkit INFO End looking for status file\u001b[0m\n", "\u001b[35m2023-07-02 09:16:12,275 sagemaker-training-toolkit INFO MPI process finished.\u001b[0m\n", "\u001b[35m2023-07-02 09:16:12,275 sagemaker-training-toolkit INFO Reporting training SUCCESS\u001b[0m\n", "\n", "2023-07-02 09:16:31 Uploading - Uploading generated training model\n", "2023-07-02 09:18:37 Completed - Training job completed\n", "Training seconds: 2182\n", "Billable seconds: 2182\n" ] } ], "source": [ "estimator.logs()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 결과 확인" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "tags": [] }, "outputs": [], "source": [ "model_dir = './result/model'" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s3://sagemaker-us-west-2-322537213286/oxford-ml-p3-16xlarge-0702-08121688285551/output/\n", "2023-07-02 08:30:57 1.8 GiB model.tar.gz\n" ] } ], "source": [ "artifacts_dir = estimator.model_data.replace('model.tar.gz', '')\n", "print(artifacts_dir)\n", "!aws s3 ls --human-readable {artifacts_dir}" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "download: s3://sagemaker-us-west-2-322537213286/oxford-ml-p3-16xlarge-0702-08121688285551/output/model.tar.gz to result/model/model.tar.gz\n" ] } ], "source": [ "!aws s3 cp {artifacts_dir}model.tar.gz {model_dir}/model.tar.gz\n", "!tar -xzf {model_dir}/model.tar.gz -C {model_dir}" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "tags": [] }, "outputs": [], "source": [ "import json , os\n", "\n", "with open(os.path.join(model_dir, 'model_history.p'), \"r\") as f:\n", " model_history = json.load(f)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "tags": [] }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "def plot_training_curves(history): \n", " \n", " fig, axes = plt.subplots(1, 3, figsize=(18, 4), sharex=True)\n", " \n", " ax = axes[0]\n", " ax.plot(history['epoch'], history['losses'], label='train')\n", " ax.plot(history['val_avg_epoch'], history['val_avg_losses'], label='validation')\n", " ax.set(\n", " title='model loss',\n", " ylabel='loss',\n", " xlabel='epoch')\n", " ax.legend()\n", " \n", " # ax = axes[1]\n", " # ax.plot(history['epoch'], history['batch_time'], label='train')\n", " # ax.plot(history['val_avg_epoch'], history['val_avg_batch_time'], label='validation')\n", " # ax.set(\n", " # title='model batch_time',\n", " # ylabel='batch_time',\n", " # xlabel='epoch')\n", " # ax.legend()\n", " \n", " \n", " ax = axes[1]\n", " ax.plot(history['epoch'], history['top1'], label='train')\n", " ax.plot(history['val_avg_epoch'], history['val_avg_top1'], label='validation')\n", " ax.set(\n", " title='top1 accuracy',\n", " ylabel='accuracy',\n", " xlabel='epoch')\n", " ax.legend()\n", " \n", " ax = axes[2]\n", " ax.plot(history['epoch'], history['top5'], label='train')\n", " ax.plot(history['val_avg_epoch'], history['val_avg_top5'], label='validation')\n", " ax.set(\n", " title='top5 accuracy',\n", " ylabel='accuracy',\n", " xlabel='epoch')\n", " ax.legend()\n", " fig.tight_layout()\n", " \n", "plot_training_curves(model_history)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "instance_type": "ml.m5.large", "kernelspec": { "display_name": "conda_pytorch_p310", "language": "python", "name": "conda_pytorch_p310" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" }, "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." }, "nbformat": 4, "nbformat_minor": 4 }