簡體 English 中英

Tensorflow 鏡像策略和 Horovod 分布策略

[英]Tensorflow Mirror Strategy and Horovod Distribution Strategy

原文 2019-03-05 17:15:21 9 2 tensorflow/ deep-learning/ mpi/ distributed-tensorflow/ horovod

我試圖了解 Tensorflow Mirror Strategy 和 Horovod Distribution Strategy 之間的基本區別是什么。

從文檔和源代碼調查中，我發現 Horovod ( https://github.com/horovod/horovod ) 正在使用消息傳遞協議 (MPI) 在多個節點之間進行通信。 具體來說，它使用 MPI 的 all_reduce、all_gather。

從我的觀察（我可能是錯的）鏡像策略也使用 all_reduce 算法（ https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute ）。

他們都使用數據並行、同步的訓練方法。 所以我有點困惑它們有什么不同？ 區別僅在於實現還是有其他（理論上的）區別？

以及與 horovod 相比，鏡像策略的性能如何？

2 個解決方案

鏡像策略有自己的 all_reduce 算法，該算法在后台使用遠程過程調用 (gRPC)。

就像你提到的 Horovod 使用 MPI/GLOO 在多個進程之間進行通信。

關於性能，我的一位同事在使用 4 個 Tesla V100 GPU 之前使用這里的代碼進行了實驗。 結果表明，3設置的工作情況最好的： replicated與all_reduce_spec=nccl ， collective_all_reduce與適當調諧allreduce_merge_scope （例如32），和horovod 。 我沒有看到這 3 種之間的顯着差異。

為什么 Tensorflow 中的分布策略不支持漸變裁剪？

[英]Why is gradient clipping not supported with a distribution strategy in Tensorflow?

Tensorflow、Keras：“型號”object 沒有屬性“_get_distribution_strategy”

[英]Tensorflow, Keras : 'Model' object has not attribute '_get_distribution_strategy'

張量流代碼優化策略

[英]tensorflow code optimization strategy

使用分布策略在 Estimator 中累積梯度

[英]Accumulate gradients in Estimator with distribution strategy

編譯時間分配策略問題

[英]Compile time distribution strategy issue

Tensorflow 模型量化最佳策略

[英]Tensorflow model quantization best strategy

在 Tensorflow 2 中使用分布式策略累積梯度

[英]Accumulate gradients with distributed strategy in Tensorflow 2

帶估計器的參數服務器策略（Tensorflow）

[英]Parameter Server Strategy with estimators(Tensorflow)

分配策略中未創建變量 scope with costum Layer

[英]Variable was not created in the distribution strategy scope with costum Layer

利用所有 CPU 和所有 GPU 的分發策略

[英]Distribution Strategy that leverages all CPUs and all GPUs

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 為什么 Tensorflow 中的分布策略不支持漸變裁剪？ Tensorflow、Keras：“型號”object 沒有屬性“_get_distribution_strategy” 張量流代碼優化策略使用分布策略在 Estimator 中累積梯度編譯時間分配策略問題 Tensorflow 模型量化最佳策略在 Tensorflow 2 中使用分布式策略累積梯度帶估計器的參數服務器策略（Tensorflow）分配策略中未創建變量 scope with costum Layer 利用所有 CPU 和所有 GPU 的分發策略

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM