简体繁体中英

Tensorflow Mirror Strategy and Horovod Distribution Strategy

原文 2019-03-05 17:15:21 4 2 tensorflow/ deep-learning/ mpi/ distributed-tensorflow/ horovod

I am trying to understand what are the basic difference between Tensorflow Mirror Strategy and Horovod Distribution Strategy.

From the documentation and the source code investigation I found that Horovod ( https://github.com/horovod/horovod ) is using Message Passing Protocol (MPI) to communicate between multiple nodes. Specifically it uses all_reduce, all_gather of MPI.

From my observation (I may be wrong) Mirror Strategy is also using all_reduce algorithm ( https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/distribute ).

Both of them are using data-parallel, synchronous training approach. So I am a bit confused how they are different? Is the difference only in implementation or there are other (theoretical) difference?

And how is the performance of mirror strategy compared to horovod?

2 answers

Mirror Strategy has its own all_reduce algorithm which use remote procedural calls (gRPC) under the hood.

Like you mentioned Horovod uses MPI/GLOO to communicate between multiple processes.

Regarding the performance, one of my colleagues have performed experiments before using 4 Tesla V100 GPUs using the codes from here . The results suggested that 3 settings work the best: replicated with all_reduce_spec=nccl , collective_all_reduce with properly tuned allreduce_merge_scope (eg 32), and horovod . I did not see significant differences among these 3.

Why is gradient clipping not supported with a distribution strategy in Tensorflow?

Tensorflow, Keras : 'Model' object has not attribute '_get_distribution_strategy'

tensorflow code optimization strategy

Accumulate gradients in Estimator with distribution strategy

Compile time distribution strategy issue

Tensorflow model quantization best strategy

Accumulate gradients with distributed strategy in Tensorflow 2

Parameter Server Strategy with estimators(Tensorflow)

Variable was not created in the distribution strategy scope with costum Layer

Distribution Strategy that leverages all CPUs and all GPUs

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why is gradient clipping not supported with a distribution strategy in Tensorflow? Tensorflow, Keras : 'Model' object has not attribute '_get_distribution_strategy' tensorflow code optimization strategy Accumulate gradients in Estimator with distribution strategy Compile time distribution strategy issue Tensorflow model quantization best strategy Accumulate gradients with distributed strategy in Tensorflow 2 Parameter Server Strategy with estimators(Tensorflow) Variable was not created in the distribution strategy scope with costum Layer Distribution Strategy that leverages all CPUs and all GPUs

Related Tags

Tensorflow Mirror Strategy and Horovod Distribution Strategy

Question

2 answers

solution1
0 2020-10-06 00:06:49

solution2
0 2020-10-06 04:15:25

Tensorflow Mirror Strategy and Horovod Distribution Strategy

Question

2 answers

solution1 0 2020-10-06 00:06:49

solution2 0 2020-10-06 04:15:25

solution1
0 2020-10-06 00:06:49

solution2
0 2020-10-06 04:15:25