Why am I getting memory allocation error even on batch size = 1?

Question

I'm (still) trying to implement a simple Unet network using Keras on Tensorflow 2.0 backend.

My templates and masks are 1536x1536 RGB images (masks are black and white). According to this article , it's possible to measure the amount of needed memory.

My model crashes with memory allocation error on tensor [1,16,1536,1536]. Using the equation given in the article above I've calculated the amount of needed memory for this tensor: 1 * 16 * 1536 * 1536 * 4 = 144 Mbytes. And I have GTX 1080 Ti with ~ 9 Gbytes available for Tensorflow. What's going wrong? Am I missing something?

Here's an almost complete traceback:

2020-03-02 15:59:13.841967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-03-02 15:59:16.083234: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-03-02 15:59:16.087240: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-03-02 15:59:16.210856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 15:59:16.210988: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 15:59:16.211429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 15:59:16.947775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-02 15:59:16.947868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-03-02 15:59:16.947922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-03-02 15:59:16.948594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)
2020-03-02 15:59:16.994676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 15:59:16.994849: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 15:59:16.995291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 15:59:16.995793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 15:59:16.995908: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 15:59:16.996301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 15:59:16.996406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-02 15:59:16.996491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-03-02 15:59:16.996541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-03-02 15:59:16.996942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)
2020-03-02 15:59:18.191834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 15:59:18.191964: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 15:59:18.192383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 15:59:18.192499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-02 15:59:18.192591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2020-03-02 15:59:18.192644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2020-03-02 15:59:18.193053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)
Epoch 1/100
2020-03-02 15:59:18.421211: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-03-02 15:59:19.577897: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 512.00M (536870912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.616600: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 460.80M (483183872 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.638395: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-03-02 15:59:19.644478: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.644601: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
2020-03-02 15:59:19.653644: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.653767: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 259.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-03-02 15:59:19.865828: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:19.874844: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:29.884662: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:29.893593: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-03-02 15:59:29.893792: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 144.00MiB (rounded to 150994944).  Current allocation summary follows.
2020-03-02 15:59:29.919126: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 1054574080 memory_limit_: 9210949796 available bytes: 8156375716 curr_region_allocation_bytes_: 1073741824
2020-03-02 15:59:29.919304: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: 
Limit:                  9210949796
InUse:                  1010432000
MaxInUse:               1010432000
NumAllocs:                     594
MaxAllocSize:            283870720

2020-03-02 15:59:29.919520: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *****__****************xxxxxxxxxx***************xxxxxxxxxx******************************xxxxxxxxxxxx
2020-03-02 15:59:29.919696: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at conv_ops.cc:947 : Resource exhausted: OOM when allocating tensor with shape[1,16,1536,1536] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "E:/Explorium/python/unet_trainer.py", line 82, in <module>
    results = model.fit_generator(train_generator, epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, validation_data=val_generator, validation_steps=VALIDATION_STEPS, callbacks=callbacks)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1297, in fit_generator
    steps_name='steps_per_epoch')
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_generator.py", line 265, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 973, in train_on_batch
    class_weight=class_weight, reset_metrics=reset_metrics)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 264, in train_on_batch
    output_loss_metrics=model._output_loss_metrics)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 311, in train_on_batch
    output_loss_metrics=output_loss_metrics))
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 252, in _process_single_batch
    training=training))
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 127, in _model_loss
    outs = model(inputs, **kwargs)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 708, in call
    convert_kwargs_to_constants=base_layer_utils.call_context().saving)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 860, in _run_internal_graph
    output_tensors = layer(computed_tensors, **kwargs)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py", line 197, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 1134, in __call__
    return self.conv_op(inp, filter)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 639, in __call__
    return self.call(inp, filter)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 238, in __call__
    name=self.name)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 2010, in conv2d
    name=name)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1031, in conv2d
    data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1130, in conv2d_eager_fallback
    ctx=_ctx, name=name)
  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,16,1536,1536] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Conv2D]

Process finished with exit code 1

Here's my model:

import numpy as np
import os
import cv2
import random
from tensorflow.python.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.layers import Input, BatchNormalization, Activation, Dropout
from tensorflow.python.keras.layers.convolutional import Conv2D, Conv2DTranspose
from tensorflow.python.keras.layers.pooling import MaxPooling2D
from tensorflow.python.keras.layers.merge import concatenate
import tensorflow as tf


config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)


def data_gen(templates_folder, masks_folder, image_width, batch_size):
    counter = 0
    images_list = os.listdir(templates_folder)
    random.shuffle(images_list)
    while True:
        templates_pack = np.zeros((batch_size, image_width, image_width, 3)).astype('float')
        masks_pack = np.zeros((batch_size, image_width, image_width, 1)).astype('float')
        for i in range(counter, counter + batch_size):
            template = cv2.imread(templates_folder + '/' + images_list[i]) / 255.
            templates_pack[i - counter] = template

            mask = cv2.imread(masks_folder + '/' + images_list[i], cv2.IMREAD_GRAYSCALE) / 255.
            mask = mask.reshape(image_width, image_width, 1) # Add extra dimension for parity with template size [1536 * 1536 * 3]
            masks_pack[i - counter] = mask

        counter += batch_size
        if counter + batch_size >= len(images_list):
            counter = 0
            random.shuffle(images_list)
        yield templates_pack, masks_pack


def get_unet(input_image, n_filters, kernel_size, dropout=0.5):
    conv_1 = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), data_format="channels_last", activation='relu', kernel_initializer="he_normal", padding="same")(input_image)
    conv_1 = BatchNormalization()(conv_1)
    conv_2 = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_1)
    conv_2 = BatchNormalization()(conv_2)
    pool_1 = MaxPooling2D(pool_size=(2, 2))(conv_2)
    pool_1 = Dropout(dropout * 0.5)(pool_1)

    conv_3 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_1)
    conv_3 = BatchNormalization()(conv_3)
    conv_4 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_3)
    conv_4 = BatchNormalization()(conv_4)
    pool_2 = MaxPooling2D(pool_size=(2, 2))(conv_4)
    pool_2 = Dropout(dropout)(pool_2)

    conv_5 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_2)
    conv_5 = BatchNormalization()(conv_5)
    conv_6 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_5)
    conv_6 = BatchNormalization()(conv_6)
    pool_3 = MaxPooling2D(pool_size=(2, 2))(conv_6)
    pool_3 = Dropout(dropout)(pool_3)

    conv_7 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_3)
    conv_7 = BatchNormalization()(conv_7)
    conv_8 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_7)
    conv_8 = BatchNormalization()(conv_8)
    pool_4 = MaxPooling2D(pool_size=(2, 2))(conv_8)
    pool_4 = Dropout(dropout)(pool_4)

    conv_9 = Conv2D(filters=n_filters * 16, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_4)
    conv_9 = BatchNormalization()(conv_9)
    conv_10 = Conv2D(filters=n_filters * 16, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_9)
    conv_10 = BatchNormalization()(conv_10)

    upconv_1 = Conv2DTranspose(n_filters * 8, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_10)
    concat_1 = concatenate([upconv_1, conv_8])
    concat_1 = Dropout(dropout)(concat_1)
    conv_11 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_1)
    conv_11 = BatchNormalization()(conv_11)
    conv_12 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_11)
    conv_12 = BatchNormalization()(conv_12)

    upconv_2 = Conv2DTranspose(n_filters * 4, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_12)
    concat_2 = concatenate([upconv_2, conv_6])
    concat_2 = Dropout(dropout)(concat_2)
    conv_13 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_2)
    conv_13 = BatchNormalization()(conv_13)
    conv_14 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_13)
    conv_14 = BatchNormalization()(conv_14)

    upconv_3 = Conv2DTranspose(n_filters * 2, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_14)
    concat_3 = concatenate([upconv_3, conv_4])
    concat_3 = Dropout(dropout)(concat_3)
    conv_15 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_3)
    conv_15 = BatchNormalization()(conv_15)
    conv_16 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_15)
    conv_16 = BatchNormalization()(conv_16)

    upconv_4 = Conv2DTranspose(n_filters * 1, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_16)
    concat_4 = concatenate([upconv_4, conv_2])
    concat_4 = Dropout(dropout)(concat_4)
    conv_17 = Conv2D(filters=n_filters * 1, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_4)
    conv_17 = BatchNormalization()(conv_17)
    conv_18 = Conv2D(filters=n_filters * 1, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_17)
    conv_18 = BatchNormalization()(conv_18)

    conv_19 = Conv2D(1, (1, 1), activation='sigmoid')(conv_18)
    model = Model(inputs=input_image, outputs=conv_19)
    return model


callbacks = [EarlyStopping(patience=10, verbose=1),
             ReduceLROnPlateau(factor=0.1, patience=3, min_lr=0.00001, verbose=1),
             ModelCheckpoint("model-prototype.h5", verbose=1, save_best_only=True, save_weights_only=True)
             ]
train_templates_path = "E:/train/templates"
train_masks_path = "E:/train/masks"
valid_templates_path = "E:/valid/templates"
valid_masks_path = "E:/valid/masks"
TRAIN_SET_SIZE = len(os.listdir(train_templates_path))
VALID_SET_SIZE = len(os.listdir(valid_templates_path))
BATCH_SIZE = 1
EPOCHS = 100
STEPS_PER_EPOCH = TRAIN_SET_SIZE / BATCH_SIZE
VALIDATION_STEPS = VALID_SET_SIZE / BATCH_SIZE
IMAGE_WIDTH = 1536

train_generator = data_gen(train_templates_path, train_masks_path, IMAGE_WIDTH, batch_size = BATCH_SIZE)
val_generator = data_gen(valid_templates_path, valid_masks_path, IMAGE_WIDTH, batch_size = BATCH_SIZE)

input_image = Input((IMAGE_WIDTH, IMAGE_WIDTH, 3), name='img')
model = get_unet(input_image, n_filters=16, kernel_size = 3, dropout=0.05)

model.compile(optimizer=Adam(lr=0.001), loss="binary_crossentropy", metrics=["accuracy"])

results = model.fit_generator(train_generator, epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, validation_data=val_generator, validation_steps=VALIDATION_STEPS, callbacks=callbacks)

Answer 1

The problem in your case is the dimension of the images.

It is not the dimension of the model like other people stated in the comment, but rather the input dimension of your images which requires much more GPU memory in order to be processed.

The solution in your case is to downsample the image with a factor of two. You need to divide both the width and height with the exact same factor in order to maintain the aspect ratio, thereby allow the network to learn even on smaller images without losing that much information and introducing distortion.

You will be able to train with a batch_size of 1 on your GTX 1080 on 768x768(I have a GTX 1080Ti and I tested several segmentation networks with several input dimensions). If for some reasons your GPU consumption is eaten by other processes such as YT or similar ones, then reducing it to 512x512 will definitely work(it should work even with 768x768 on batch_size=1)

Answer 2

Sure, one tensor might occupy that much memory, but you've also got to hold all of the variables in your network, and the values to be back-propagated. This makes calculating operational requirements complicated (although not impossible). The operational space of your network is quite large.

Why am I getting memory allocation error even on batch size = 1?

Question

2 answers

solution1
5 ACCPTED 2020-03-06 07:55:45

solution2
0 2020-03-02 11:17:56

Why am I getting memory allocation error even on batch size = 1?

Question

2 answers

solution1 5 ACCPTED 2020-03-06 07:55:45

solution2 0 2020-03-02 11:17:56

solution1
5 ACCPTED 2020-03-06 07:55:45

solution2
0 2020-03-02 11:17:56