Tensorflow上VGG16的内存问题

Question

I have been trying to get a VGG16 Keras model with a Tensorflow backend to work in order to classify images for the 'Planet: Understanding the Amazon from Space competition on Kaggle. 我一直在尝试使用具有Tensorflow后端的VGG16 Keras模型来工作，以便对“行星：从 Kaggle的太空竞赛中了解亚马逊”进行图像分类。 Unfortunately, when trying to get the model running I consistently run into memory issues, even when running on AWS's g.2.8 large which has 60 GB of memory. 不幸的是，当试图使模型运行时，即使在具有60 GB内存的AWS g.2.8大型AWS上运行时，我始终遇到内存问题。

The traceback of the problem is as follows: 问题的回溯如下：

    Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0
_________________________________________________________________
sequential_1 (Sequential)    (None, 1)                 6423041
=================================================================
Total params: 21,137,729.0
Trainable params: 21,137,729.0
Non-trainable params: 0.0
_________________________________________________________________
Traceback (most recent call last):
  File "VGG16_Kg_Kernel.py", line 160, in <module>
        train_datagen.fit(x_train)
      File "/home/ec2-user/src/anaconda3/lib/python3.5/site-packages/keras/preprocessing/image.py", line 648, in fit
        x = np.copy(x)
      File "/home/ec2-user/src/anaconda3/lib/python3.5/site-packages/numpy/lib/function_base.py", line 1497, in copy
        return array(a, order=order, copy=True)
    MemoryError

The entire print out can be found here: https://github.com/jvk/VGG16/blob/master/error_text.txt 完整的打印输出可以在这里找到： https : //github.com/jvk/VGG16/blob/master/error_text.txt

From the print out, the GPU appears to be running, but it might not be running perfectly. 从打印结果来看，GPU似乎正在运行，但是可能运行不正常。

The data contains ~ 100K 11.6 KB images. 数据包含〜100K 11.6 KB图像。 The code I use to run the model can be found here: https://github.com/jvk/VGG16/blob/master/VGG16_Kg_Kernel.py 我用来运行模型的代码可以在这里找到： https : //github.com/jvk/VGG16/blob/master/VGG16_Kg_Kernel.py

Please let me know if any more information is needed.Thanks! 如果需要更多信息，请告诉我。谢谢！

Answer 1

The short of it is you're out of GPU memory not RAM. 简而言之，就是您的GPU内存不足，而不是RAM。 G2s have 4GB graphics processors and there seems to be an issue with VGG16 and Tensorflow. G2具有4GB图形处理器，并且VGG16和Tensorflow似乎存在问题。

I've run the same with a Theano backend and had no issues. 我已经使用Theano后端运行了相同的程序，没有任何问题。 I'd recommend trying that. 我建议尝试一下。

Tensorflow上VGG16的内存问题

问题描述

1 个解决方案

解决方案1
0 2017-07-10 19:06:00

Tensorflow上VGG16的内存问题

问题描述

1 个解决方案

解决方案1 0 2017-07-10 19:06:00

解决方案1
0 2017-07-10 19:06:00