简体繁体 English

Tensorflow 第一个 epoch 非常慢（可能与 pool_allocator 有关）

[英]Tensorflow first epoch is extremely slow (maybe related to pool_allocator)

原文 2017-07-07 09:07:34 7 1 tensorflow/ tcmalloc

I am training a model built with TF.我正在训练一个用 TF 构建的模型。 At the first epoch, TF is slower than the next epochs by a factor of *100 and I am seeing messages like:在第一个时期，TF 比下一个时期慢 *100 倍，我看到如下消息：

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 958 to 1053

As suggested here , I tried to use tcmalloc by setting LD_PRELOAD="/usr/lib/libtcmalloc.so" , but it didn't help.正如此处所建议的，我尝试通过设置LD_PRELOAD="/usr/lib/libtcmalloc.so"来使用 tcmalloc，但没有帮助。

Any idea on how to make the first epoch run faster?关于如何使第一个纪元运行得更快的任何想法？

1 个解决方案

It seems that it is a hardware issue.看来这是一个硬件问题。 For the first epoch TF (the same as other DL libraries, like PyTorch as discussed here ) caching information about data as discussed here by @ppwwyyxx对于第一个时期的 TF（与其他 DL 库相同，如此处讨论的 PyTorch）缓存有关数据的信息，如 @ppwwyyxx 此处所讨论

If each data has different size, TF can spend a large amount of time running cudnn benchmarks for each data and store them in cache如果每个数据有不同的大小，TF 可以花费大量时间为每个数据运行 cudnn 基准测试并将它们存储在缓存中

Tensorflow特征值分解极慢 - Tensorflow eigenvalue decomposition is extremely slow

Tensorflow 卡在第一个时代 - Tensorflow stuck on first epoch

图像中的Tensorflow馈送非常慢 - Tensorflow feeding in images is extremely slow

第一次训练纪元很慢 - First training epoch is very slow

TensorFlow 第一个纪元中的未知步骤 - TensorFlow unkown steps in first epoch

Tensorflow 推理使用 Java API 极慢 - Tensorflow inference using Java API extremely slow

tensorflow在python for循环中运行非常慢 - tensorflow runs extremely slow inside a python for loop

Tensorflow model.fit 卡在第一个 epoch - Tensorflow model.fit stuck at first epoch

张量流训练的第一个纪元结束时出现ValueError - ValueError in the first epoch ending of tensorflow training

Tensorflow 在进入第一个纪元后抛出 ValueError() - Tensorflow throwing ValueError() after making it to first epoch

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Tensorflow特征值分解极慢 - Tensorflow eigenvalue decomposition is extremely slow Tensorflow 卡在第一个时代 - Tensorflow stuck on first epoch 图像中的Tensorflow馈送非常慢 - Tensorflow feeding in images is extremely slow 第一次训练纪元很慢 - First training epoch is very slow TensorFlow 第一个纪元中的未知步骤 - TensorFlow unkown steps in first epoch Tensorflow 推理使用 Java API 极慢 - Tensorflow inference using Java API extremely slow tensorflow在python for循环中运行非常慢 - tensorflow runs extremely slow inside a python for loop Tensorflow model.fit 卡在第一个 epoch - Tensorflow model.fit stuck at first epoch 张量流训练的第一个纪元结束时出现ValueError - ValueError in the first epoch ending of tensorflow training Tensorflow 在进入第一个纪元后抛出 ValueError() - Tensorflow throwing ValueError() after making it to first epoch

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM