[英]tf.keras OOM even on a small LSTM model with a batch size of 1
[英]Unable to provide effective batch size in tensorflow keras model causing OOM
我正在尝试使用如下所示的输入形状训练 tensorflow keras 模型。
x_train = (729124, 50, 5)
y_train = (729124,)
我的模型定义如下
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(filters=8, kernel_size=2, input_shape=(50,5), activation='relu'))
# model.add(tf.keras.layers.InputLayer(input_shape=input_shape))
model.add(tf.keras.layers.LSTM(256, return_sequences=True, kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LSTM(256, dropout=0.1, kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(256, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(256, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(1))
model.compile(loss='mse', optimizer="adam")
使用以下代码训练模型
model.fit(x_train, y_train, epochs=50, batch_size=256, validation_split=0.2, shuffle=True)
我总是收到以下错误。
Traceback (most recent call last):
File "C:/Users/VOTAZBZ/Documents/Thesis/Code/ma_sajid/ml_models/RNN_train.py", line 124, in <module>
shuffle=True, callbacks=[callback, tensorboard_callback]) #callbacks=[callback, callback1, tensorboard_callback])
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\keras\engine\training.py", line 66, in _method_wrapper
return method(self, *args, **kwargs)
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\keras\engine\training.py", line 797, in fit
shuffle=False))
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py", line 1338, in train_validation_split
functools.partial(_split, indices=train_indices), arrays)
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\util\nest.py", line 617, in map_structure
structure[0], [func(*x) for x in entries],
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\util\nest.py", line 617, in <listcomp>
structure[0], [func(*x) for x in entries],
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py", line 1335, in _split
return array_ops.gather_v2(t, indices)
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\util\dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\ops\array_ops.py", line 4541, in gather_v2
batch_dims=batch_dims)
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\util\dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\ops\array_ops.py", line 4524, in gather
return gen_array_ops.gather_v2(params, indices, axis, name=name)
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 3755, in gather_v2
_ops.raise_from_not_ok_status(e, name)
File "C:\Tools\Python\3.6.5\lib\site-packages\tensorflow\python\framework\ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[583299,50,5] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:GatherV2]
我知道有一些内存溢出,因此想减少我的批量大小。 但我在这里发现令人惊讶的是,我给出的批量大小是无效的,因为我总是在张量试图为大小[583299,50,5]
分配张量时遇到上述错误,但我想我应该期待类似[batch_size,50,5]
。 我可以知道上面的实现有什么问题吗? 我怎样才能有效地给出训练的批量大小,从而避免内存溢出。
任何解决此问题的帮助将不胜感激。 提前致谢。
您的错误是正常行为。
您直接将整个数组提供给模型,并要求它每 256 个样本进行批处理。 通过这样做,即使您只要求进行 256 次批量计算,在某些时候它也会分配整个数组。
为了只分配一个子集,您需要做的是使用Generator
,它只会产生一批数据。
这是您要查找的链接: https : //www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence
一旦你有一个正在运行的 Sequence 对象,当调用它时会给你一个(X, 50 ,5) and (1,)
对,你可以像往常一样调用 fit 并删除batch_size
参数。
model.fit(x_train, y_train, epochs=50, validation_split=0.2, shuffle=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.