INFO|2019-02-13T22:59:36|/opt/launcher.py|48| Launcher started.
INFO|2019-02-13T22:59:36|/opt/launcher.py|73| Command to run: python tf_cnn_benchmarks.py --batch_size=32 --model=resnet50 --variable_update=parameter_server --flush_stdout=true --num_gpus=2 --local_parameter_device=gpu --device=gpu --data_format=NHWC --job_name=worker --ps_hosts=tf-cnn-training-ps-0.kubeflow.svc:2222 --worker_hosts=tf-cnn-training-worker-0.kubeflow.svc:2222 --task_index=0
INFO|2019-02-13T22:59:36|/opt/launcher.py|15| Running python tf_cnn_benchmarks.py --batch_size=32 --model=resnet50 --variable_update=parameter_server --flush_stdout=true --num_gpus=2 --local_parameter_device=gpu --device=gpu --data_format=NHWC --job_name=worker --ps_hosts=tf-cnn-training-ps-0.kubeflow.svc:2222 --worker_hosts=tf-cnn-training-worker-0.kubeflow.svc:2222 --task_index=0
INFO|2019-02-13T22:59:37|/opt/launcher.py|27| 2019-02-13 22:59:37.489630: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.177179: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.177921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1064] Found device 0 with properties:
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| pciBusID: 0000:00:17.0
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| totalMemory: 15.78GiB freeMemory: 15.35GiB
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.485360: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.486253: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1064] Found device 1 with properties:
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| pciBusID: 0000:00:19.0
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| totalMemory: 15.78GiB freeMemory: 15.37GiB
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.486294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1079] Device peer to peer matrix
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.486317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1085] DMA: 0 1
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.486325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1095] 0:   Y Y
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.486331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1095] 1:   Y Y
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.486352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1154] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:17.0, compute capability: 7.0)
INFO|2019-02-13T22:59:38|/opt/launcher.py|27| 2019-02-13 22:59:38.486360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1154] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:19.0, compute capability: 7.0)
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| 2019-02-13 23:02:11.314241: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> tf-cnn-training-ps-0.kubeflow.svc:2222}
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| 2019-02-13 23:02:11.314280: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:2222}
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| 2019-02-13 23:02:11.317209: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:2222
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| TensorFlow:  1.5
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| Model:       resnet50
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| Mode:        training
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| SingleSess:  False
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| Batch size:  64 global
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| 32 per device
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| Devices:     ['/job:worker/task:0/gpu:0', '/job:worker/task:0/gpu:1']
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| Data format: NHWC
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| Optimizer:   sgd
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| Variables:   parameter_server
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| Sync:        True
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| ==========
INFO|2019-02-13T23:02:11|/opt/launcher.py|27| Generating model
INFO|2019-02-13T23:02:12|/opt/launcher.py|27| WARNING:tensorflow:From /opt/tf-benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:372: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
INFO|2019-02-13T23:02:12|/opt/launcher.py|27| Instructions for updating:
INFO|2019-02-13T23:02:12|/opt/launcher.py|27| keep_dims is deprecated, use keepdims instead
INFO|2019-02-13T23:02:18|/opt/launcher.py|27| 2019-02-13 23:02:18.452487: I tensorflow/core/distributed_runtime/master_session.cc:1011] Start master session 5a288b32a61167f7 with config: intra_op_parallelism_threads: 1 gpu_options { force_gpu_compatible: true } allow_soft_placement: true
INFO|2019-02-13T23:02:20|/opt/launcher.py|27| Running warm up
INFO|2019-02-13T23:06:30|/opt/launcher.py|27| Done warm up
INFO|2019-02-13T23:06:30|/opt/launcher.py|27| Step	Img/sec	loss
INFO|2019-02-13T23:06:31|/opt/launcher.py|27| 1	images/sec: 217.5 +/- 0.0 (jitter = 0.0)	10.611
INFO|2019-02-13T23:06:33|/opt/launcher.py|27| 10	images/sec: 215.8 +/- 3.1 (jitter = 2.6)	8.493
INFO|2019-02-13T23:06:36|/opt/launcher.py|27| 20	images/sec: 217.1 +/- 1.6 (jitter = 3.0)	8.229
INFO|2019-02-13T23:06:39|/opt/launcher.py|27| 30	images/sec: 216.8 +/- 1.1 (jitter = 3.1)	8.105
INFO|2019-02-13T23:06:42|/opt/launcher.py|27| 40	images/sec: 216.8 +/- 0.9 (jitter = 3.1)	7.947
INFO|2019-02-13T23:06:45|/opt/launcher.py|27| 50	images/sec: 216.9 +/- 0.7 (jitter = 3.3)	7.916
INFO|2019-02-13T23:06:48|/opt/launcher.py|27| 60	images/sec: 217.8 +/- 0.7 (jitter = 3.5)	8.252
INFO|2019-02-13T23:06:51|/opt/launcher.py|27| 70	images/sec: 218.0 +/- 0.6 (jitter = 3.3)	7.880
INFO|2019-02-13T23:06:54|/opt/launcher.py|27| 80	images/sec: 217.8 +/- 0.6 (jitter = 3.2)	7.885
INFO|2019-02-13T23:06:57|/opt/launcher.py|27| 90	images/sec: 217.6 +/- 0.5 (jitter = 3.5)	7.823
INFO|2019-02-13T23:07:00|/opt/launcher.py|27| 100	images/sec: 217.7 +/- 0.5 (jitter = 3.3)	7.926
INFO|2019-02-13T23:07:00|/opt/launcher.py|27| ----------------------------------------------------------------
INFO|2019-02-13T23:07:00|/opt/launcher.py|27| total images/sec: 217.95
INFO|2019-02-13T23:07:00|/opt/launcher.py|27| ----------------------------------------------------------------
INFO|2019-02-13T23:07:01|/opt/launcher.py|80| Finished: python tf_cnn_benchmarks.py --batch_size=32 --model=resnet50 --variable_update=parameter_server --flush_stdout=true --num_gpus=2 --local_parameter_device=gpu --device=gpu --data_format=NHWC --job_name=worker --ps_hosts=tf-cnn-training-ps-0.kubeflow.svc:2222 --worker_hosts=tf-cnn-training-worker-0.kubeflow.svc:2222 --task_index=0
INFO|2019-02-13T23:07:01|/opt/launcher.py|84| Command ran successfully sleep for ever.