简体   繁体   English

运行 Tf.Keras model 时 memory 用完

[英]Running out of memory when running Tf.Keras model

I'm building a model to predict 1148 rows of 160000 columns to a number of 1-9.我正在构建一个 model 来预测 1148 行 160000 列的数量为 1-9。 I've done a similar thing before in keras, but am having trouble transfering the code to tensorflow.keras.我之前在 keras 中做过类似的事情,但是在将代码传输到 tensorflow.keras 时遇到问题。 Running the program produces the following error:运行程序会产生以下错误:

(1) Resource exhausted: 00M when allocating tensor with shape(1148,1,15998,9) and type float......k:0/device:GPU:0 by allocator GPU_0_bfc.............. [[{{node conv1d/conv1d-0-0-TransposeNCHWToNWC-LayoutOptimizer}}]] (1) 资源耗尽:00M 分配形状为(1148,1,15998,9)的张量和类型float......k:0/device:GPU:0 by allocator GPU_0_bfc........ ...... [[{{node conv1d/conv1d-0-0-TransposeNCHWToNWC-LayoutOptimizer}}]]

This is caused by the following code.这是由以下代码引起的。 It appears to be a memory issue, but I'm unsure why memory would be an issue.这似乎是一个 memory 问题,但我不确定为什么 memory 会是一个问题。 Advice would be appreciated.建议将不胜感激。

num_classes=9
y_train = to_categorical(y_train,num_classes)
x_train = x_train.reshape((1148, 160000, 1))
y_train = y_train.reshape((1148, 9))

input_1 = tf.keras.layers.Input(shape=(160000,1))
conv1 = tf.keras.layers.Conv1D(num_classes, kernel_size=3, activation='relu')(input_1)
flatten_1 = tf.keras.layers.Flatten()(conv1)
output_1 = tf.keras.layers.Dense(num_classes, activation='softmax')(flatten_1)

model = tf.keras.models.Model(input_1, output_1)
my_optimizer = tf.keras.optimizers.RMSprop()
my_optimizer.lr = 0.02
model.compile(optimizer=my_optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=50, steps_per_epoch=20)
predictions = model.predict(x_test)

Edit: model.summary编辑:model.summary

Layer-Output shape-Param#层输出形状参数#

Input_1 (inputLayer) none, 160000,1. Input_1 (inputLayer) 无,160000,1。 0 Conv1d (Conv1D) none,159998, 9 36 flatten (Flatten) none,1439982. 0 Conv1d (Conv1D) 无,159998, 9 36 展平 (Flatten) 无,1439982。 0 dense (Dense) none, 9. 12959847 0 密集(密集)无,9. 12959847

Total Params: 12,959,883 Trainable Params 12,959,883总参数:12,959,883 可训练参数 12,959,883

Without more information it is hard to give a concrete answer.没有更多信息,很难给出具体答案。

  • what hardware are you running on?你在什么硬件上运行? How much memory do you have available?你有多少 memory 可用?
  • At which point in the code does the error occur?错误发生在代码中的哪一点?

Some things you can try:你可以尝试一些事情:

  • change from 32-bit float to 16 bit float, if you haven't already (2x memory reduction)从 32 位浮点数更改为 16 位浮点数,如果您还没有(减少 2 倍 memory)
  • reduce the batch size by adding batch_size=16 inside model.fit (default is 32) (2x memory reduction)通过在model.fit中添加batch_size=16来减少批量大小(默认为 32)(2x memory 减少)
  • If that's still not enough you need to think about applying dimensionality reduction to your feature space, which is very high dimensional (160,000)如果这还不够,您需要考虑将降维应用到您的特征空间,这是非常高维的(160,000)

This might sound very silly but in my case I was getting (1) Resource exhausted error because I didn't had enough space in my main harddrive.这听起来可能很愚蠢,但在我的情况下,我遇到了 (1) 资源耗尽错误,因为我的主硬盘驱动器中没有足够的空间。 After cleaning out some space, my training scripts start working again.在清理了一些空间后,我的训练脚本又开始工作了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM