簡體 English 中英

具有多GPU方法的tensorflow分布式訓練混合

[英]tensorflow distributed training hybrid with multi-GPU methodology

原文 2016-09-20 13:41:06 9 1 machine-learning/ tensorflow/ deep-learning/ distributed/ multi-gpu

在使用了當前的分布式培訓實現一段時間后，我認為它會將每個GPU視為一個單獨的工作程序，但是現在通常在一個盒子中安裝2〜4個GPU。 采用單盒多GPU方法更好地先在單盒中計算平均梯度，然后在多個節點之間進行同步，這不是更好嗎？ 這樣可以大大減輕I / O流量，而這始終是數據並行性的瓶頸。

有人告訴我當前的實現方式是將所有GPU都作為一個工作器放在一個盒子中，但是我無法弄清楚如何用SyncReplicasOptimizer綁定平均梯度，因為SyncReplicasOptimizer直接將優化器作為輸入。

任何人有任何想法嗎？

1 個解決方案

分布式TensorFlow在同一工作任務中支持多個GPU。 對圖像模型執行分布式訓練的一種常見方法是在同一工作程序中的多個GPU之間執行同步訓練，而在工作程序中進行異步訓練（盡管其他配置也是可能的）。 這樣，您只需一次將模型參數拉給工作線程，然后就可以在本地GPU之間分配模型參數，從而簡化了網絡帶寬的利用。

為了進行這種培訓，許多用戶在單個工作程序中跨GPU執行“圖形復制”。 這可以在本地GPU設備之間使用顯式循環，例如在CIFAR-10示例模型中； 或更高級別的庫支持，例如TF-Slim中的model_deploy()實用程序。

在Tensorflow上訓練多GPU：更簡單的方法？

[英]Training Multi-GPU on Tensorflow: a simpler way?

TensorFlow：是否可以為多GPU訓練恢復檢查點模型？

[英]TensorFlow: Is it possible to restore checkpoint models for multi-gpu training?

TensorFlow：多 GPU 配置（性能）

[英]TensorFlow: Multi-GPU configuration (performance)

多gpu模式下的tensorflow c ++ SetDefaultDevice

[英]tensorflow c++ SetDefaultDevice in multi-gpu mode

使用LSTM在張量流中進行分布式訓練

[英]Distributed training with LSTM in tensorflow

Tensorflow輸入管道用於分布式培訓

[英]Tensorflow input pipeline for distributed training

自定義模型的 Keras 多 GPU 模型失敗

[英]Keras multi-gpu model fails for a custom model

Tensorflow 訓練錯誤 model（在 GPU 上）

[英]Tensorflow error upon training model (on GPU)

針對具有不同 GPU 的變形金剛的多 GPU 訓練

[英]Multi GPU training for Transformers with different GPUs

在Keras中使用multi_gpu_model恢復培訓

[英]Resume training with multi_gpu_model in Keras

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 在Tensorflow上訓練多GPU：更簡單的方法？ TensorFlow：是否可以為多GPU訓練恢復檢查點模型？ TensorFlow：多 GPU 配置（性能）多gpu模式下的tensorflow c ++ SetDefaultDevice 使用LSTM在張量流中進行分布式訓練 Tensorflow輸入管道用於分布式培訓自定義模型的 Keras 多 GPU 模型失敗 Tensorflow 訓練錯誤 model（在 GPU 上）針對具有不同 GPU 的變形金剛的多 GPU 訓練在Keras中使用multi_gpu_model恢復培訓

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM