Allocator ran out of memory - how to clear GPU memory from TensorFlow dataset?

Question

Assuming a Numpy array X_train of shape (4559552, 13, 22) , the following code:

train_dataset = tf.data.Dataset \
    .from_tensor_slices((X_train, y_train)) \
    .shuffle(buffer_size=len(X_train) // 10) \
    .batch(batch_size)

works fine exactly once. When I re-run it (after slight modifications to X_train ), it then triggers an InternalError due to an out of memory GPU:

2021-12-19 15:36:58.460497: W tensorflow/core/common_runtime/bfc_allocator.cc:457]
Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.71GiB requested by op _EagerConst

It seems that the first time, it finds 100% free GPU memory so all works fine, but the subsequent times, the GPU memory is already almost full and hence the error.

From what I understand, it seems that simply clearing GPU memory from the old train_dataset would be sufficient to solve the problem, but I couldn't find any way to achieve this in TensorFlow.Currently the only way to re-assign the dataset is to kill the Python kernel and re-run everything from start.

Is there a way to avoid re-starting the Python kernel from scratch and instead free the GPU memory so that the new dataset can be loaded into it?

The dataset doesn't need full GPU memory, so I would consider switching to a TFRecord solution as a non-ideal solution here (as it comes with additional complications).

Answer 1

Try setting a hard limit on the total GPU memory as shown inhere

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

Allocator ran out of memory - how to clear GPU memory from TensorFlow dataset?

Question

1 answers

solution1
1 2021-12-19 21:44:32

Allocator ran out of memory - how to clear GPU memory from TensorFlow dataset?

Question

1 answers

solution1 1 2021-12-19 21:44:32

solution1
1 2021-12-19 21:44:32