簡體 English 中英

Tensorflow Multi-GPU丟失

[英]Tensorflow Multi-GPU loss

原文 2019-02-14 11:05:54 1 1 python/ tensorflow/ multi-gpu

我正在研究如何在Tensorflow上實施多GPU培訓。 現在，我按照文檔中的建議閱讀此資源。 據我了解，在第178行，可變損失僅說明了一個GPU的損失（如評論所述）。 因此，在周期的末尾，例如第192行，損失將保留所考慮的最后一個GPU的損失值。 可變損失直到傳遞給Session.run（）進行計算后，才在第243行使用。 因此，在第255行打印的損失值僅是最后一個GPU的損失，而不是總數。 在我看來，Google工程師這么簡單的事情弄錯了，我想念的是什么？ 謝謝！

1 個解決方案

似乎您沒有丟失任何東西。 他們認為打印損失值並報告一塔的摘要就足夠了。

通常，當您開始在多個GPU上使用新模型時，您會跟蹤每個GPU的損耗/匯總和/或僅計算調試時的平均損耗。 之后，僅跟蹤一個塔就足夠了，因為每個塔都包含相同的模型副本。

順便說一句，我發現使用tf.estimators進行多GPU訓練更容易，同時使用tf.contrib.estimator.replicate_model_fn(...)和tf.contrib.estimator.TowerOptimizer(...)來分發模型並優化。

Tensorflow 多 GPU - NCCL

[英]Tensorflow Multi-GPU - NCCL

TensorFlow MirroredStrategy() 不適用於多 GPU 訓練

[英]TensorFlow MirroredStrategy() not working for multi-gpu training

Tensorflow GPU /多GPU如何分配內存？

[英]How Tensorflow GPU/multi-GPU allocates memory?

Choiche GPU tensorflow-directml 或多 GPU

[英]Choiche GPU tensorflow-directml or multi-gpu

Tensorflow多GPU重用與復制？

[英]Tensorflow Multi-GPU reusing vs. duplicating?

如何為 tensorflow 多 GPU 代碼實現批量歸一化層

[英]How to implement batch normalization layer for tensorflow multi-GPU code

tensorflow slim 多 GPU 無法工作

[英]tensorflow slim multi-GPU can't work

TensorFlow：是否可以為多GPU訓練恢復檢查點模型？

[英]TensorFlow: Is it possible to restore checkpoint models for multi-gpu training?

Pytorch 多 GPU 問題

[英]Pytorch Multi-GPU Issue

在tensorflow中使用multi-gpu時，為什么GPU內存使用情況會大不相同？

[英]Why GPU memory-usage is quite different when using multi-gpu in tensorflow?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Tensorflow 多 GPU - NCCL TensorFlow MirroredStrategy() 不適用於多 GPU 訓練 Tensorflow GPU /多GPU如何分配內存？ Choiche GPU tensorflow-directml 或多 GPU Tensorflow多GPU重用與復制？如何為 tensorflow 多 GPU 代碼實現批量歸一化層 tensorflow slim 多 GPU 無法工作 TensorFlow：是否可以為多GPU訓練恢復檢查點模型？ Pytorch 多 GPU 問題在tensorflow中使用multi-gpu時，為什么GPU內存使用情況會大不相同？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM