简体   繁体   中英

Allocator ran out of memory - how to clear GPU memory from TensorFlow dataset?

Assuming a Numpy array X_train of shape (4559552, 13, 22) , the following code:

train_dataset = tf.data.Dataset \
    .from_tensor_slices((X_train, y_train)) \
    .shuffle(buffer_size=len(X_train) // 10) \
    .batch(batch_size)

works fine exactly once. When I re-run it (after slight modifications to X_train ), it then triggers an InternalError due to an out of memory GPU:

2021-12-19 15:36:58.460497: W tensorflow/core/common_runtime/bfc_allocator.cc:457]
Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.71GiB requested by op _EagerConst

It seems that the first time, it finds 100% free GPU memory so all works fine, but the subsequent times, the GPU memory is already almost full and hence the error.

From what I understand, it seems that simply clearing GPU memory from the old train_dataset would be sufficient to solve the problem, but I couldn't find any way to achieve this in TensorFlow.Currently the only way to re-assign the dataset is to kill the Python kernel and re-run everything from start.

Is there a way to avoid re-starting the Python kernel from scratch and instead free the GPU memory so that the new dataset can be loaded into it?

The dataset doesn't need full GPU memory, so I would consider switching to a TFRecord solution as a non-ideal solution here (as it comes with additional complications).

Try setting a hard limit on the total GPU memory as shown inhere

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM