繁体 English 中英

Tensorflow Multi-GPU丢失

[英]Tensorflow Multi-GPU loss

原文 2019-02-14 11:05:54 6 1 python/ tensorflow/ multi-gpu

我正在研究如何在Tensorflow上实施多GPU培训。 现在，我按照文档中的建议阅读此资源。 据我了解，在第178行，可变损失仅说明了一个GPU的损失（如评论所述）。 因此，在周期的末尾，例如第192行，损失将保留所考虑的最后一个GPU的损失值。 可变损失直到传递给Session.run（）进行计算后，才在第243行使用。 因此，在第255行打印的损失值仅是最后一个GPU的损失，而不是总数。 在我看来，Google工程师这么简单的事情弄错了，我想念的是什么？ 谢谢！

1 个解决方案

似乎您没有丢失任何东西。 他们认为打印损失值并报告一塔的摘要就足够了。

通常，当您开始在多个GPU上使用新模型时，您会跟踪每个GPU的损耗/汇总和/或仅计算调试时的平均损耗。 之后，仅跟踪一个塔就足够了，因为每个塔都包含相同的模型副本。

顺便说一句，我发现使用tf.estimators进行多GPU训练更容易，同时使用tf.contrib.estimator.replicate_model_fn(...)和tf.contrib.estimator.TowerOptimizer(...)来分发模型并优化。

Tensorflow 多 GPU - NCCL

[英]Tensorflow Multi-GPU - NCCL

TensorFlow MirroredStrategy() 不适用于多 GPU 训练

[英]TensorFlow MirroredStrategy() not working for multi-gpu training

Tensorflow GPU /多GPU如何分配内存？

[英]How Tensorflow GPU/multi-GPU allocates memory?

Choiche GPU tensorflow-directml 或多 GPU

[英]Choiche GPU tensorflow-directml or multi-gpu

Tensorflow多GPU重用与复制？

[英]Tensorflow Multi-GPU reusing vs. duplicating?

如何为 tensorflow 多 GPU 代码实现批量归一化层

[英]How to implement batch normalization layer for tensorflow multi-GPU code

tensorflow slim 多 GPU 无法工作

[英]tensorflow slim multi-GPU can't work

TensorFlow：是否可以为多GPU训练恢复检查点模型？

[英]TensorFlow: Is it possible to restore checkpoint models for multi-gpu training?

Pytorch 多 GPU 问题

[英]Pytorch Multi-GPU Issue

在tensorflow中使用multi-gpu时，为什么GPU内存使用情况会大不相同？

[英]Why GPU memory-usage is quite different when using multi-gpu in tensorflow?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Tensorflow 多 GPU - NCCL TensorFlow MirroredStrategy() 不适用于多 GPU 训练 Tensorflow GPU /多GPU如何分配内存？ Choiche GPU tensorflow-directml 或多 GPU Tensorflow多GPU重用与复制？如何为 tensorflow 多 GPU 代码实现批量归一化层 tensorflow slim 多 GPU 无法工作 TensorFlow：是否可以为多GPU训练恢复检查点模型？ Pytorch 多 GPU 问题在tensorflow中使用multi-gpu时，为什么GPU内存使用情况会大不相同？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM