简体繁体 English

为什么tensorflow会消耗这么多内存？

[英]Why is tensorflow consuming this much memory?

原文 2018-04-30 03:33:54 7 1 python/ memory/ tensorflow/ convolutional-neural-network

I have a simple CNN (4 conv-pool-lrelu layers and 2 fully connected ones). 我有一个简单的CNN（4个conv-pool-lrelu层和2个完全连接的层）。
I am only using TensorFlow on CPU (no gpu). 我只在CPU（没有gpu）上使用TensorFlow。
I have ~6GB of available memory. 我有约6GB的可用内存。
My batches are composed of 56 images of 640x640 pixels ( < 100 MB ). 我的批次由640x640像素（<100 MB）的56张图像组成。

And TensorFlow is consuming more that the available memory (causing the program to crash, obviously). 而且TensorFlow消耗了更多的可用内存（显然导致程序崩溃）。

My question is : why does TensorFlow requires this much memory to run my network ? 我的问题是：为什么TensorFlow需要这么多内存才能运行我的网络？ I don't understand what is taking this much space (maybe caching the data several time to optimize convolution computation ? Saving all the hidden outputs for backpropagation purpose ?). 我不明白占用了这么多空间的原因（也许是多次缓存数据以优化卷积计算？保存所有隐藏输出以用于反向传播目的？）。 And is there a way to prevent TensorFlow from consuming this much memory ? 有没有办法防止TensorFlow占用这么多的内存？

Side notes : 注意事项：

I cannot reduce the size of the batch, I am trying to make some Multiple Instance Learning, so I need to compute all my patches in one run. 我无法减小批处理的大小，我正在尝试进行多实例学习，因此我需要一次运行计算所有补丁。
I am using a AdamOptimizer 我正在使用AdamOptimizer
All my convolutions are 5x5 windows, 1x1 stride, with (from 1st one to last one) 32, 64, 128 and 256 features. 我所有的卷积都是5x5窗口，1x1跨度，具有（从第一个到最后一个）32、64、128和256个功能。 I am using leaky ReLUs and 2x2 max pooling. 我正在使用泄漏的ReLU和最大2x2池。 FC layers are composed of 64 and 3 neurones. FC层由64和3个神经元组成。
Using Ubuntu 16.4 / Python 3.6.4 / TensorFlow 1.6.0 使用Ubuntu 16.4 / Python 3.6.4 / TensorFlow 1.6.0

1 个解决方案

As you have mentioned: 正如您提到的：

All my convolutions are 5x5 windows, 1x1 stride, with (from 1st one to last one) 32, 64, 128 and 256 features. 我所有的卷积都是5x5窗口，1x1跨度，具有（从第一个到最后一个）32、64、128和256个功能。 I am using leaky ReLUs and 2x2 max pooling. 我正在使用泄漏的ReLU和最大2x2池。 FC layers are composed of 64 and 3 neurones. FC层由64和3个神经元组成。

So, the memory consumption of your network goes like : 因此，您网络的内存消耗如下：

Input: 640x640x3 = 1200 (in KB) Input: 640x640x3 = 1200（以KB为单位）

C1: 636x636x32 = 12.5 MB (stride=1 worked) C1: 636x636x32 = 12.5 MB（步幅= 1有效）

P1: 635x635x32 = 12.3 MB (stride=1 worked) P1: 635x635x32 = 12.3 MB（步幅= 1有效）

C2: 631x631x64 = 24.3 MB C2: 631x631x64 = 24.3 MB

P2: 630x630x64 = 24.2 MB P2: 630x630x64 = 24.2 MB

C3: 626x626x128 = 47.83 MB C3: 626x626x128 = 47.83 MB

P3: 625x625x128 = 47.68 MB P3: 625x625x128 = 47.68 MB

C4: 621x621x256 = 94.15 MB C4: 621x621x256 = 94.15 MB

P4: 620x620x256 = 93.84 MB P4: 620x620x256 = 93.84 MB

FC1: 64 = 0.0625 KB (negligible) FC1: 64 = 0.0625 KB（可忽略）

FC2: 3 = 0.003 KB (negligible) FC2: 3 = 0.003 KB（可忽略）

Total for one image = ~ 358 MB Total for one image = 358 MB

For batch of 56 image = 56 x 358 ~19.6 GB For batch of 56 image = 56 x 358〜19.6 GB

That's why your network does not run on 6 GB . 这就是为什么您的网络无法在6 GB上运行的原因。 Try with some higher stride or lower sized image to adjust it into 6 GB space. 尝试使用higher stride或lower sized image将其调整为6 GB空间。 And it should work. 它应该工作。