![](/img/trans.png)
[英]Distributed Training with tf.estimator resulting in more training steps
[英]Distributed training using Keras model and tf.Estimator
按照此处的示例,可以从现有的keras
模型创建tf.Estimator
。 首先,该页面指出,这样做可以利用tf.Estimator
好处,例如由于分布式训练而提高的训练速度。 可悲的是,当我运行代码时,系统中只有一个GPU用于计算; 因此,速度没有增加。 如何将分布式学习与基于keras
模型构建的估计器一起使用?
我绊倒了这种方法:
distributed_model = tf.keras.utils.multi_gpu_model(model, gpus=2)
听起来好像可以解决这个问题。 但这目前不起作用,因为它创建了一个使用在tensorflow/python/keras/_impl/keras/utils/training_utils.py
定义的get_slice(..)
方法的图,并且此方法失败并显示以下错误消息:
Traceback (most recent call last): File "hub.py", line 75, in <module>
estimator = create_model_estimator() File "hub.py", line 67, in create_model_estimator
estimator = tf.keras.estimator.model_to_estimator(keras_model=new_model, custom_objects={'tf': tf}, model_dir=model_dir, config=run_config) File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/estimator.py", line 302, in model_to_estimator
_save_first_checkpoint(keras_model, est, custom_objects, keras_weights) File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/estimator.py", line 231, in _save_first_checkpoint
custom_objects) File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/estimator.py", line 109, in _clone_and_build_model
model = models.clone_model(keras_model, input_tensors=input_tensors) File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/models.py", line 1557, in clone_model
return _clone_functional_model(model, input_tensors=input_tensors) File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/models.py", line 1451, in _clone_functional_model
output_tensors = topology._to_list(layer(computed_tensor, **kwargs)) File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py", line 258, in __call__
output = super(Layer, self).__call__(inputs, **kwargs) File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 696, in __call__
outputs = self.call(inputs, *args, **kwargs) File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/core.py", line 630, in call
return self.function(inputs, **arguments) File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/utils/training_utils.py", line 156, in get_slice
shape = array_ops.shape(data) NameError: name 'array_ops' is not defined
因此,如何使用两个GPU来训练带有tf.Estimator
对象的模型?
编辑 :通过切换tensorflow
的版本/版本,我能够摆脱之前的错误消息,但是现在我得到了这个错误消息:
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
status, run_metadata)
File "/root/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value res2a_branch2c/bias
[[Node: res2a_branch2c/bias/_482 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1142_res2a_branch2c/bias", _device="/job:localhost/replica:0/task:0/device:GPU:0"](res2a_branch2c/bias)]]
[[Node: bn4a_branch2a/beta/_219 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_878_bn4a_branch2a/beta", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
也许这与这个问题有关 ?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.