+ POD_NAME=tf-resnet50-horovod-job-worker-0 + shift + /opt/kube/kubectl exec tf-resnet50-horovod-job-worker-0 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "300417024" -mca ess_base_vpid 1 -mca ess_base_num_procs "3" -mca orte_node_regex "tf-resnet[2:50]-horovod-job-launcher-qkqc4,tf-resnet[2:50]-horovod-job-worker-0,tf-resnet[2:50]-horovod-job-worker-1@0(3)" -mca orte_hnp_uri "300417024.0;tcp://192.168.24.125:33923" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "300417024.0;tcp://192.168.24.125:33923" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca pmix "^s1,s2,cray,isolated" + POD_NAME=tf-resnet50-horovod-job-worker-1 + shift + /opt/kube/kubectl exec tf-resnet50-horovod-job-worker-1 -- /bin/sh -c PATH=/usr/local/bin:$PATH ; export PATH ; LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/usr/local/lib:$DYLD_LIBRARY_PATH ; export DYLD_LIBRARY_PATH ; /usr/local/bin/orted -mca ess "env" -mca ess_base_jobid "300417024" -mca ess_base_vpid 2 -mca ess_base_num_procs "3" -mca orte_node_regex "tf-resnet[2:50]-horovod-job-launcher-qkqc4,tf-resnet[2:50]-horovod-job-worker-0,tf-resnet[2:50]-horovod-job-worker-1@0(3)" -mca orte_hnp_uri "300417024.0;tcp://192.168.24.125:33923" -mca plm "rsh" --tree-spawn -mca orte_parent_uri "300417024.0;tcp://192.168.24.125:33923" -mca plm_rsh_agent "/etc/mpi/kubexec.sh" -mca orte_default_hostfile "/etc/mpi/hostfile" -mca pmix "^s1,s2,cray,isolated" WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/horovod/tensorflow/__init__.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Deprecated in favor of operator or tf.math.divide. 2019-07-15 02:16:25.182578: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-15 02:16:25.182607: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-15 02:16:25.182763: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-15 02:16:25.183014: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-15 02:16:25.182964: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-15 02:16:25.182964: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-15 02:16:25.183024: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-15 02:16:25.183242: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-07-15 02:16:26.649049: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.656817: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.659803: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.665367: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.681387: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.686103: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.702480: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.712511: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.720818: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.727555: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.732274: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5f88db0 executing computations on platform CUDA. Devices: 2019-07-15 02:16:26.732308: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.732318: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.732325: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.732332: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.734300: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.748481: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.749097: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.750450: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.753362: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x503d050 executing computations on platform CUDA. Devices: 2019-07-15 02:16:26.753411: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.753429: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.753442: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.753455: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.753778: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300065000 Hz 2019-07-15 02:16:26.755697: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x60878f0 executing computations on platform Host. Devices: 2019-07-15 02:16:26.755726: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-07-15 02:16:26.756831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1d.0 totalMemory: 15.75GiB freeMemory: 14.55GiB 2019-07-15 02:16:26.756859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-07-15 02:16:26.758441: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300065000 Hz 2019-07-15 02:16:26.758690: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.760531: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x513bc00 executing computations on platform Host. Devices: 2019-07-15 02:16:26.760582: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-07-15 02:16:26.762432: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1c.0 totalMemory: 15.75GiB freeMemory: 14.55GiB 2019-07-15 02:16:26.762479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-07-15 02:16:26.763394: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4770190 executing computations on platform CUDA. Devices: 2019-07-15 02:16:26.763426: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.763435: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.763442: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.763449: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.767606: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300065000 Hz 2019-07-15 02:16:26.769394: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x486ed20 executing computations on platform Host. Devices: 2019-07-15 02:16:26.769423: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-07-15 02:16:26.770777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1e.0 totalMemory: 15.75GiB freeMemory: 14.55GiB 2019-07-15 02:16:26.770817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-07-15 02:16:26.784090: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.785797: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6179800 executing computations on platform CUDA. Devices: 2019-07-15 02:16:26.785824: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.785834: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.785841: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.785849: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.788521: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300065000 Hz 2019-07-15 02:16:26.790761: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6278370 executing computations on platform Host. Devices: 2019-07-15 02:16:26.790801: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-07-15 02:16:26.791705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1b.0 totalMemory: 15.75GiB freeMemory: 14.55GiB 2019-07-15 02:16:26.791749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-07-15 02:16:26.798427: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.798578: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.798649: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.798960: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.822279: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.822855: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.823008: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.823590: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.848462: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.848951: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.850006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.850126: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.882715: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.883228: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.883916: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.884077: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-07-15 02:16:26.888299: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x534fbb0 executing computations on platform CUDA. Devices: 2019-07-15 02:16:26.888333: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888342: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888350: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888356: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888550: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4cece50 executing computations on platform CUDA. Devices: 2019-07-15 02:16:26.888581: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888590: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888598: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888604: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888817: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5de59d0 executing computations on platform CUDA. Devices: 2019-07-15 02:16:26.888847: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888857: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888864: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888871: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.888982: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x584a2f0 executing computations on platform CUDA. Devices: 2019-07-15 02:16:26.889014: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.889028: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.889040: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.889052: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): Tesla V100-SXM2-16GB, Compute Capability 7.0 2019-07-15 02:16:26.891266: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300070000 Hz 2019-07-15 02:16:26.891267: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300070000 Hz 2019-07-15 02:16:26.891356: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300070000 Hz 2019-07-15 02:16:26.891658: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300070000 Hz 2019-07-15 02:16:26.893516: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x544e760 executing computations on platform Host. Devices: 2019-07-15 02:16:26.893544: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-07-15 02:16:26.893579: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5ee4590 executing computations on platform Host. Devices: 2019-07-15 02:16:26.893603: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-07-15 02:16:26.893611: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4deba40 executing computations on platform Host. Devices: 2019-07-15 02:16:26.893645: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-07-15 02:16:26.893770: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5948e70 executing computations on platform Host. Devices: 2019-07-15 02:16:26.893798: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-07-15 02:16:26.894375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1b.0 totalMemory: 15.75GiB freeMemory: 14.55GiB 2019-07-15 02:16:26.894404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-07-15 02:16:26.895155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1e.0 totalMemory: 15.75GiB freeMemory: 14.55GiB 2019-07-15 02:16:26.895186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-07-15 02:16:26.895929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1c.0 totalMemory: 15.75GiB freeMemory: 14.55GiB 2019-07-15 02:16:26.895973: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-07-15 02:16:26.896580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:00:1d.0 totalMemory: 15.75GiB freeMemory: 14.55GiB 2019-07-15 02:16:26.896612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-07-15 02:16:26.903910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:26.903957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-07-15 02:16:26.903967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-07-15 02:16:26.904766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0) 2019-07-15 02:16:26.913364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:26.913408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-07-15 02:16:26.913423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-07-15 02:16:26.914284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0) 2019-07-15 02:16:26.925790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:26.925828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-07-15 02:16:26.925838: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-07-15 02:16:26.927228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0) 2019-07-15 02:16:26.951385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:26.951423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-07-15 02:16:26.951433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-07-15 02:16:26.952200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0) 2019-07-15 02:16:27.046108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.046157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-07-15 02:16:27.046168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-07-15 02:16:27.046282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.046317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-07-15 02:16:27.046327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-07-15 02:16:27.046972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0) 2019-07-15 02:16:27.047650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0) 2019-07-15 02:16:27.078044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.078122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-07-15 02:16:27.078138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-07-15 02:16:27.078962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0) 2019-07-15 02:16:27.081282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.081319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-07-15 02:16:27.081330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-07-15 02:16:27.082218: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0) 2019-07-15 02:16:27.446613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-07-15 02:16:27.446668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.446679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-07-15 02:16:27.446688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-07-15 02:16:27.446918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0) 2019-07-15 02:16:27.447338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-07-15 02:16:27.447382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.447394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-07-15 02:16:27.447402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-07-15 02:16:27.447624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0) 2019-07-15 02:16:27.451641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-07-15 02:16:27.451681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.451692: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-07-15 02:16:27.451700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-07-15 02:16:27.451923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0) 2019-07-15 02:16:27.453658: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-07-15 02:16:27.453715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.453728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-07-15 02:16:27.453736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-07-15 02:16:27.454209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0) 2019-07-15 02:16:27.457611: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-07-15 02:16:27.457657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.457668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-07-15 02:16:27.457676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-07-15 02:16:27.457917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0) 2019-07-15 02:16:27.460489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-07-15 02:16:27.460535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.460546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-07-15 02:16:27.460555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-07-15 02:16:27.460957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0) 2019-07-15 02:16:27.487439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-07-15 02:16:27.487493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.487505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-07-15 02:16:27.487513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-07-15 02:16:27.487740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0) 2019-07-15 02:16:27.530557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-07-15 02:16:27.530648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:27.530669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-07-15 02:16:27.530684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-07-15 02:16:27.531115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0) PY3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609]TF1.13.1 Horovod size: 8 Using a learning rate of 0.8 Checkpointing every 1000 steps Saving summary every 625 steps WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. Using preprocessing threads per GPU: 5 WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:87: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:87: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:87: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:87: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:87: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:87: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:87: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:87: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:136: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:136: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:136: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:136: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:136: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:136: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:136: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:136: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:168: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.average_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:673: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:168: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.average_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:673: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:168: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.average_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:673: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:168: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.average_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:673: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:168: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.average_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:673: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:168: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.average_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:673: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:168: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.average_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:673: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:168: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.average_pooling2d instead. WARNING:tensorflow:From models/resnet/tensorflow/train_imagenet_resnet_hvd.py:673: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. 2019-07-15 02:16:35.850031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-07-15 02:16:35.850089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:35.850100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-07-15 02:16:35.850108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-07-15 02:16:35.850441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0) 2019-07-15 02:16:35.869129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-07-15 02:16:35.869178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:35.869189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-07-15 02:16:35.869197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-07-15 02:16:35.869434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0) 2019-07-15 02:16:35.876866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-07-15 02:16:35.876915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:35.876927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-07-15 02:16:35.876936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-07-15 02:16:35.877179: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0) 2019-07-15 02:16:35.937862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-07-15 02:16:35.937919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:35.937930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-07-15 02:16:35.937938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-07-15 02:16:35.938215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0) 2019-07-15 02:16:35.968125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 2 2019-07-15 02:16:35.968182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:35.968194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 2 2019-07-15 02:16:35.968201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 2: N 2019-07-15 02:16:35.968452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0) 2019-07-15 02:16:36.016115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-07-15 02:16:36.016173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:36.016184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-07-15 02:16:36.016192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-07-15 02:16:36.016444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0) 2019-07-15 02:16:36.066308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 1 2019-07-15 02:16:36.066376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:36.066389: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1 2019-07-15 02:16:36.066398: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N 2019-07-15 02:16:36.066649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0) 2019-07-15 02:16:36.078162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 3 2019-07-15 02:16:36.078210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-07-15 02:16:36.078222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 3 2019-07-15 02:16:36.078230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 3: N 2019-07-15 02:16:36.078503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14135 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0) Step Epoch Speed Loss FinLoss LR 2019-07-15 02:16:51.022757: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-07-15 02:16:51.081849: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-07-15 02:16:51.120253: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-07-15 02:16:51.140793: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-07-15 02:16:51.141236: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-07-15 02:16:51.175727: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-07-15 02:16:51.190324: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-07-15 02:16:51.394682: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-07-15 02:16:54.295180: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.295254: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.335691: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.335747: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.382848: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.382914: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.409045: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.409097: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.423026: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.423082: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.441066: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.441118: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.442199: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.442254: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.484685: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.484730: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.515578: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.515630: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.528169: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.528226: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.562962: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.563006: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.572279: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.572327: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.588928: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.588973: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.665748: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.665796: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.740477: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.740532: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO NET/Socket : Using [0]eth0:192.168.16.199<0> tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tf-resnet50-horovod-job-worker-0:19:160 [0] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1] NCCL version 2.4.2+cuda10.0 tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO NET/Socket : Using [0]eth0:192.168.16.199<0> tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tf-resnet50-horovod-job-worker-0:21:159 [2] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1] tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO NET/Socket : Using [0]eth0:192.168.31.30<0> tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tf-resnet50-horovod-job-worker-1:21:160 [2] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1] 2019-07-15 02:16:54.887693: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:54.887736: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.38GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO NET/Socket : Using [0]eth0:192.168.31.30<0> tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tf-resnet50-horovod-job-worker-1:19:168 [0] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1] tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO NET/Socket : Using [0]eth0:192.168.16.199<0> tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO NET/Socket : Using [0]eth0:192.168.16.199<0> tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO NET/Socket : Using [0]eth0:192.168.31.30<0> tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO NET/Socket : Using [0]eth0:192.168.31.30<0> tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tf-resnet50-horovod-job-worker-0:22:164 [3] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1] tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). tf-resnet50-horovod-job-worker-0:20:161 [1] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1] tf-resnet50-horovod-job-worker-1:20:159 [1] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1] tf-resnet50-horovod-job-worker-1:22:161 [3] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1] tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO comm 0x7f15e031f2d0 rank 6 nranks 8 cudaDev 2 nvmlDev 2 tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO comm 0x7f04c8382d50 rank 0 nranks 8 cudaDev 0 nvmlDev 0 tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO comm 0x7fa8b431fee0 rank 2 nranks 8 cudaDev 2 nvmlDev 2 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO comm 0x7f0a903288f0 rank 4 nranks 8 cudaDev 0 nvmlDev 0 tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO comm 0x7fbd34323870 rank 5 nranks 8 cudaDev 1 nvmlDev 1 tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO comm 0x7f53d432a300 rank 7 nranks 8 cudaDev 3 nvmlDev 3 tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO comm 0x7f6ce831a860 rank 1 nranks 8 cudaDev 1 nvmlDev 1 tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO comm 0x7f02ac31d600 rank 3 nranks 8 cudaDev 3 nvmlDev 3 tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance : SOC tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance : SOC tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance : SOC tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance : SOC tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO CUDA Dev 1[1], Socket NIC distance : SOC tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO CUDA Dev 0[0], Socket NIC distance : SOC tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO CUDA Dev 2[2], Socket NIC distance : SOC tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO CUDA Dev 3[3], Socket NIC distance : SOC tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Channel 00 : 0 1 2 3 4 5 6 7 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Channel 01 : 0 1 2 3 4 5 6 7 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Ring 00 : 3 -> 4 [receive] via NET/Socket/0 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Ring 00 : 7 -> 0 [receive] via NET/Socket/0 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Ring 00 : 4[0] -> 5[1] via P2P/IPC tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO Ring 00 : 5[1] -> 6[2] via P2P/IPC tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO Ring 00 : 6[2] -> 7[3] via P2P/IPC tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO Ring 00 : 7 -> 0 [send] via NET/Socket/0 tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO Ring 00 : 1[1] -> 2[2] via P2P/IPC tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO Ring 00 : 2[2] -> 3[3] via P2P/IPC tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO Ring 00 : 7[3] -> 6[2] via P2P/IPC tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO Ring 00 : 3 -> 4 [send] via NET/Socket/0 tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO Ring 00 : 5[1] -> 4[0] via P2P/IPC tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO Ring 00 : 6[2] -> 5[1] via P2P/IPC tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Ring 00 : 4 -> 0 [receive] via NET/Socket/0 tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO Ring 00 : 3[3] -> 2[2] via P2P/IPC tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO Ring 00 : 1[1] -> 0[0] via P2P/IPC tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO Ring 00 : 2[2] -> 1[1] via P2P/IPC tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Ring 00 : 4 -> 0 [send] via NET/Socket/0 tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO Ring 01 : 6[2] -> 7[3] via P2P/IPC tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO Ring 01 : 7 -> 0 [send] via NET/Socket/0 tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO Ring 01 : 5[1] -> 6[2] via P2P/IPC tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Ring 00 : 0 -> 4 [receive] via NET/Socket/0 tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO Ring 01 : 1[1] -> 2[2] via P2P/IPC tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO Ring 01 : 6[2] -> 5[1] via P2P/IPC tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO Ring 01 : 2[2] -> 3[3] via P2P/IPC tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO Ring 01 : 3 -> 4 [send] via NET/Socket/0 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Ring 00 : 0 -> 4 [send] via NET/Socket/0 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO Ring 01 : 2[2] -> 1[1] via P2P/IPC tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Ring 01 : 3 -> 4 [receive] via NET/Socket/0 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Ring 01 : 4[0] -> 5[1] via P2P/IPC tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Ring 01 : 7 -> 0 [receive] via NET/Socket/0 tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO Ring 01 : 3[3] -> 2[2] via P2P/IPC tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO Trees [0] 2->3->-1/-1/-1 [1] 2->3->-1/-1/-1 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Ring 01 : 0[0] -> 1[1] via P2P/IPC tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO Ring 01 : 7[3] -> 6[2] via P2P/IPC tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO Trees [0] 5->6->7/-1/-1 [1] 5->6->7/-1/-1 tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO Trees [0] 1->2->3/-1/-1 [1] 1->2->3/-1/-1 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO Trees [0] 6->7->-1/-1/-1 [1] 6->7->-1/-1/-1 tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO Ring 01 : 5[1] -> 4[0] via P2P/IPC tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO Trees [0] 4->5->6/-1/-1 [1] 4->5->6/-1/-1 tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO Ring 01 : 1[1] -> 0[0] via P2P/IPC tf-resnet50-horovod-job-worker-0:22:164 [3] NCCL INFO comm 0x7f02ac31d600 rank 3 nranks 8 cudaDev 3 nvmlDev 3 - Init COMPLETE tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO Trees [0] 0->1->2/-1/-1 [1] 0->1->2/-1/-1 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Ring 01 : 0 -> 4 [receive] via NET/Socket/0 tf-resnet50-horovod-job-worker-1:21:160 [2] NCCL INFO comm 0x7f15e031f2d0 rank 6 nranks 8 cudaDev 2 nvmlDev 2 - Init COMPLETE tf-resnet50-horovod-job-worker-0:21:159 [2] NCCL INFO comm 0x7fa8b431fee0 rank 2 nranks 8 cudaDev 2 nvmlDev 2 - Init COMPLETE tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Ring 01 : 0 -> 4 [send] via NET/Socket/0 tf-resnet50-horovod-job-worker-0:20:161 [1] NCCL INFO comm 0x7f6ce831a860 rank 1 nranks 8 cudaDev 1 nvmlDev 1 - Init COMPLETE tf-resnet50-horovod-job-worker-1:22:161 [3] NCCL INFO comm 0x7f53d432a300 rank 7 nranks 8 cudaDev 3 nvmlDev 3 - Init COMPLETE tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Ring 01 : 4 -> 0 [receive] via NET/Socket/0 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Could not find real path of /sys/class/net/eth0/device tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO include/net.h:24 -> 2 tf-resnet50-horovod-job-worker-1:20:159 [1] NCCL INFO comm 0x7fbd34323870 rank 5 nranks 8 cudaDev 1 nvmlDev 1 - Init COMPLETE tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Ring 01 : 4 -> 0 [send] via NET/Socket/0 tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO Trees [0] 0->4->5/-1/-1 [1] -1->4->5/0/-1 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Trees [0] -1->0->1/4/-1 [1] 4->0->1/-1/-1 tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Using 256 threads, Min Comp Cap 7, Trees enabled for all sizes tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO comm 0x7f04c8382d50 rank 0 nranks 8 cudaDev 0 nvmlDev 0 - Init COMPLETE tf-resnet50-horovod-job-worker-1:19:168 [0] NCCL INFO comm 0x7f0a903288f0 rank 4 nranks 8 cudaDev 0 nvmlDev 0 - Init COMPLETE tf-resnet50-horovod-job-worker-0:19:160 [0] NCCL INFO Launch mode Parallel 2019-07-15 02:16:55.430474: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.430533: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.474680: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.474730: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.517135: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.517215: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.566775: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.566843: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.568613: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.568668: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.582429: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.582473: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.658881: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.658928: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.884590: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:55.884645: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.230640: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.230702: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.273938: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.273985: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.317865: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.317907: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.321648: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.321731: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.361672: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.361716: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.367432: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.367478: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.371094: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.371163: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.381692: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.381738: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.410961: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.411026: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.458073: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.458141: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.458932: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.458999: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.682505: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.682551: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.770153: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-07-15 02:16:56.770194: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.34GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 0 0.0 239.7 6.686 8.023 0.80000 1 0.0 589.7 0.000 1.379 0.79840 50 0.1 5371.7 0.000 2.306 0.72200 100 0.2 5430.7 0.000 2.161 0.64800 150 0.2 5468.8 0.000 2.032 0.57800 200 0.3 5460.4 0.000 1.924 0.51200 250 0.4 5412.5 0.000 1.833 0.45000 300 0.5 5435.5 0.000 1.757 0.39201 350 0.6 5424.6 0.000 1.694 0.33801 400 0.6 5430.3 0.000 1.641 0.28801 450 0.7 5504.9 0.000 1.598 0.24201 500 0.8 5482.8 0.000 1.563 0.20001 550 0.9 5462.4 0.000 1.535 0.16201 600 1.0 5400.1 0.000 1.513 0.12801 650 1.0 5422.7 0.000 1.496 0.09801 700 1.1 5392.7 0.000 1.483 0.07201 750 1.2 5479.8 0.000 1.474 0.05001 800 1.3 5450.0 0.000 1.468 0.03201 850 1.4 5428.7 0.000 1.465 0.01801 900 1.4 5431.5 0.000 1.463 0.00801 950 1.5 5444.5 0.000 1.462 0.00201 Finished in 410.1348099708557