[英]tensorflow slim multi-GPU can't work
Currently I use tensorflow slim to train the model from scrach.目前我使用 tensorflow slim 从头开始训练模型。 If I just follow the instruction here https://github.com/tensorflow/models/tree/master/slim#training-a-model-from-scratch , everything is OK.
如果我只是按照这里的说明https://github.com/tensorflow/models/tree/master/slim#training-a-model-from-scratch ,一切正常。
However, I want to use multi GPU, so I set --num_clones=2 or 4, both of them are not working.但是,我想使用多 GPU,所以我设置了 --num_clones=2 或 4,它们都不起作用。 The result is that both of them get stuck at global_step/sec: 0. They can't continue.
结果两个都卡在global_step/sec: 0,无法继续。 You can see the result image here error result
您可以在此处查看结果图像错误结果
DATASET_DIR=/tmp/imagenet
TRAIN_DIR=/tmp/train_logs
python train_image_classifier.py \
--num_clones=4 \
--train_dir=${TRAIN_DIR} \
--dataset_name=imagenet \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=inception_v3
Hope someone can help me, thanks in advance.希望有人可以帮助我,在此先感谢。 By the way, I use tensorflow 1.1 & python 35 on Ubuntu 16.04.
顺便说一句,我在 Ubuntu 16.04 上使用 tensorflow 1.1 & python 35。 If you need more information, please let me know.
如果您需要更多信息,请告诉我。
Your issue resembles an experience I had after switching from a single-GPU to a multi-GPU configuration using tf-slim.您的问题类似于我使用 tf-slim 从单 GPU 切换到多 GPU 配置后的体验。 I observed that the parameter server job assumed the name 'localhost', which conflicted with the default job name assigned by model_deploy to my CPU device.
我观察到参数服务器作业采用名称“localhost”,这与 model_deploy 分配给我的 CPU 设备的默认作业名称冲突。 I suggest you inspect the device names by following the "Logging Device placement" section of this tensorflow.org article .
我建议您按照这篇 tensorflow.org 文章的“记录设备放置”部分检查设备名称。 It explains how to print device names to the console on a per-operation basis.
它解释了如何在每个操作的基础上将设备名称打印到控制台。 You can then pass the actual job name as an argument to DeployConfig()'s
ps_job_name
parameter and proceed with training.然后,您可以将实际作业名称作为参数传递给 DeployConfig() 的
ps_job_name
参数并继续训练。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.