TensorFlow GPU时代优化？

Question

So this code works, and it gives me a 2x boost over CPU only, but I think its possible to get it faster. 所以这段代码可以工作，它只给我2倍的CPU提升，但我认为它可以让它更快。 I think the issue boils down to this area... 我认为这个问题归结为这个领域......

for i in tqdm(range(epochs), ascii=True):
    sess.run(train_step, feed_dict={x: train, y_:labels})

I think what happens is that every epoch, we go back to the CPU for information on what to do next (the for loop) and the for loop pushes back to the GPU. 我想会发生的是每个时代，我们回到CPU获取有关下一步做什么的信息（for循环），for循环推回到GPU。 Now the GPU can fit the entire data set and more into memory. 现在GPU可以将整个数据集更多地融入内存。

Is it possible, and if so how? 是否有可能，如果是这样，怎么样？ to just have it continually crunch 1000 epochs on the GPU without coming back to the CPU to report its status. 只需让它在GPU上持续处理1000个纪元而不回到CPU来报告其状态。 Or perhaps control how often it reports status. 或者可能控制报告状态的频率。 It would be nice to say crunch 1000 epochs on GPU, and then see my train vs validation, then crunch again. 如果说在GPU上紧缩1000个时代，然后看到我的火车与验证，然后再次紧缩，那就太好了。 But doing it between every epoch is not really helpful. 但是在每个时代之间做这件事并没有多大帮助。

Thanks, 谢谢，

~David 〜大卫

Answer 1

The overhead of session.run is around 100 usec, so if you do 10k steps, this overhead adds around 1 second. session.run的开销约为100 usec，因此如果你执行10k步骤，这个开销会增加大约1秒。 If this is significant, then you are doing many small iterations, and are incurring extra overhead in other places. 如果这很重要，那么您正在进行许多小的迭代，并且在其他地方会产生额外的开销。 IE, GPU kernel launch overhead is 5x larger than CPU (5 usec vs 1 usec). IE，GPU内核启动开销比CPU大5倍（5 usec vs 1 usec）。

Using feed_dict is probably a bigger problem and you could speed things up by using queues/input pipelines. 使用feed_dict可能是一个更大的问题，你可以通过使用队列/输入管道加快速度。

Also, a robust way to figure out where you are spending time is to profile. 此外，找出你花时间在哪里的有力方法是剖析。 IE, to figure out what fraction of time is due to your for loop, you can do cProfile as follows. IE，要弄清楚你的for循环所花费的时间，你可以按如下方式进行cProfile。

python -m cProfile -o timing.prof myscript.py
snakeviz  timing.prof

To figure out where the time goes inside of TensorFlow run , you can do timeline profiling as described here 要确定TensorFlow run ，您可以按照此处所述进行时间线分析

TensorFlow GPU时代优化？

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-08-15 14:08:30

TensorFlow GPU时代优化？

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-08-15 14:08:30

解决方案1
3 已采纳 2016-08-15 14:08:30