如何使用TensorFlow正确管理内存和批处理大小

Question

I am using TensorFlow to build a simple feed-forward neural network, and I am using variable size batches. 我正在使用TensorFlow构建一个简单的前馈神经网络，我使用的是可变大小的批处理。 I am not using the GPU, I have 8GB RAM, and running on Python 3.5.2. 我没有使用GPU，我有8GB内存，并在Python 3.5.2上运行。

My problem is that I have some batches that are too big and are generating the typical out of memory error. 我的问题是，我有一些太大的批次，并产生典型的内存不足错误。 I understand that, it is not a problem. 我明白，这不是问题。 However, if I use Keras with TF backend I don't have that issue. 但是，如果我使用带有TF后端的Keras，我就没有这个问题。 I have built an example (with fixed size batches) bellow that illustrates this. 我已经建立了一个示例（具有固定大小的批次），这说明了这一点。

Is there a problem with my implementation? 我的实施有问题吗？ How should I handle batches that are too big? 我应该如何处理太大的批次？

TensorFlow example (exhausts memory) TensorFlow示例（耗尽内存）


import numpy as np
import tensorflow as tf

n_observations = 100000
n_input = 6
batch_size = 20000
X = np.random.rand(n_observations, n_input)
Y = X[:,0] ** 3 + X[:,1] ** 2 + X[:,2] + X[:,3] + X[:,4] + X[:,5]+ np.random.rand(n_observations)

n_hidden = 16
n_output = 1

def generatebatch(n_observations, batch_size):
    for batch_i in range(n_observations // batch_size):
        start = batch_i*batch_size
        end = start + batch_size
        batch_xs = X[start:end, :]
        batch_ys = Y[start:end]
        yield batch_xs, batch_ys

with tf.Session() as sess:
    # placeholders for input and target
    net_input = tf.placeholder(tf.float32, [None, n_input])
    y_true = tf.placeholder(tf.float32)

    # Hidden Layer
    W1 = tf.Variable(tf.random_normal([n_input, n_hidden]))
    b1 = tf.Variable(tf.random_normal([n_hidden]))
    net_output1 = tf.nn.relu(tf.matmul(net_input, W1) + b1)

    # Yet another Hidden Layer
    yaW1 = tf.Variable(tf.random_normal([n_hidden, n_hidden]))
    yab1 = tf.Variable(tf.random_normal([n_hidden]))
    yanet_output1 = tf.nn.relu(tf.matmul(net_output1, yaW1) + yab1)

    # Output Layer
    W2 = tf.Variable(tf.random_normal([n_hidden, n_output]))
    b2 = tf.Variable(tf.random_normal([n_output]))
    net_output2 = tf.nn.relu(tf.matmul(yanet_output1, W2) + b2)

    # The loss function
    cost = tf.reduce_mean(tf.pow(y_true - net_output2, 2))

    # Configure the optimizer
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    # Initialize variables
    sess.run(tf.global_variables_initializer())

    n_epochs = 100
    for epoch_i in range(n_epochs):
        batchloss = []
        for batch_xs, batch_ys in generatebatch(n_observations, batch_size):
            _, loss = sess.run(
                [optimizer, cost],
                feed_dict={
                    net_input: batch_xs,
                    y_true: batch_ys
            })
            batchloss.append(loss)
        print(np.mean(batchloss))

Keras Example (handles the batch size somehow) Keras示例（以某种方式处理批量大小）


import numpy as np
from keras.models import Sequential
from keras.layers import Dense
import logging

#just to hide the deprecation warnings
logging.basicConfig(level=logging.CRITICAL)

n_input = 6
n_observations = 100000
n_hidden = 16
n_epochs = 10
batch_size = 35000

# input data
X = np.random.rand(n_observations, n_input)
Y = X[:,0] ** 3 + X[:,1] ** 2 + X[:,2] + X[:,3] + X[:,4] + X[:,5]+ np.random.rand(n_observations)

# create and fit Multilayer Perceptron model
model = Sequential()
model.add(Dense(n_hidden, input_dim=n_input, activation='relu'))
model.add(Dense(n_hidden, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse', optimizer='adam')
model.fit(X, Y, nb_epoch=n_epochs, batch_size=batch_size, verbose=1)

Answer 1

Your Y has incorrect shape, maybe causing tensorflow to infer shape of tensors incorrectly ((20000, 20000) instead of (20000, 6), for example), consuming a lot of memory. 您的Y具有不正确的形状，可能导致张量流不正确地推断张量的形状（例如，（20000,20000）而不是（20000,6）），消耗大量内存。

Y = np.reshape(Y, [n_observations, 1])

Thus your placeholder should have the SAME shape: 因此，您的占位符应具有相同的形状：

net_input = tf.placeholder(tf.float32, shape=[None, n_input])
y_true = tf.placeholder(tf.float32, shape=[None, 1])

Answer 2

I think that Keras is overriding the default configuration options in TensorFlow. 我认为Keras正在覆盖TensorFlow中的默认配置选项。 Your native TensorFlow code runs fine with smaller batch sizes (eg 10k, 15k) on the GPU. 您的原生TensorFlow代码在GPU上运行较小的批量（例如10k，15k）。 But with the default configuration, it is going to assume you want GPU benefits and the OOM issue happens because there is not enough GPU memory. 但是使用默认配置，它会假设您需要GPU优势并且因为没有足够的GPU内存而发生OOM问题。

Your TensorFlow example works fine when you do change that default behavior to CPU (as you indicated in the question). 当您将该默认行为更改为CPU时，您的TensorFlow示例正常工作（如问题中所示）。 Here are the lines I changed to do that: 以下是我改变的行：

config = tf.ConfigProto(
    log_device_placement=True, allow_soft_placement=True
)
config.gpu_options.allow_growth = True


with tf.Session(config=config) as sess, \
        tf.device('cpu:0'):  # placeholders for input and target

如何使用TensorFlow正确管理内存和批处理大小

问题描述

TensorFlow example (exhausts memory) TensorFlow示例（耗尽内存）

Keras Example (handles the batch size somehow) Keras示例（以某种方式处理批量大小）

2 个解决方案

解决方案1
3 已采纳 2016-12-19 13:28:10

解决方案2
0 2016-12-17 18:38:22

如何使用TensorFlow正确管理内存和批处理大小

问题描述

TensorFlow example (exhausts memory) TensorFlow示例（耗尽内存）

Keras Example (handles the batch size somehow) Keras示例（以某种方式处理批量大小）

2 个解决方案

解决方案1 3 已采纳 2016-12-19 13:28:10

解决方案2 0 2016-12-17 18:38:22

解决方案1
3 已采纳 2016-12-19 13:28:10

解决方案2
0 2016-12-17 18:38:22