简体   繁体   English

使用多 GPU 训练时,Tensorflow 精度低

[英]Tensorflow low accuracy when training with multi GPU

Is it normal that I get lower accuracy when training on multi GPU?在多 GPU 上训练时精度降低是否正常? For example, when I train on single GPU with n batch size, I got 63% accuracy.例如,当我在单个 GPU 上训练n批大小时,我获得了 63% 的准确率。 However, when I train with 4 GPUs with n batch size for each GPU, I got only 58% accuracy.然而,当我用 4 个 GPU 训练每个 GPU 的批次大小为n 时,我的准确率只有 58%。 Both cases were trained with 100 epochs两个案例都训练了 100 个 epoch

I guess the averaging of gradient somehow makes it difficult for the optimizer.我猜梯度的平均以某种方式使优化器变得困难。 Anybody experienced the same thing ?有人经历过同样的事情吗?

Since the model was trained after 100 epochs, you can infer the model and find the accuracy on CPU itself because it is not computationally heavy.由于模型是在 100 个 epoch 后训练的,因此您可以推断模型并在 CPU 本身上找到准确度,因为它的计算量并不大。 But if you want to observe the accuracy while training, then it can be hard to find accuracy from each GPUs and averaging, which might not tell how good is the model that is being trained.但是,如果您想在训练时观察准确度,则很难从每个 GPU 中找到准确度并求平均值,这可能无法说明正在训练的模型有多好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM