简体   繁体   中英

Out Of Memory Error while implementing VGG16 from Scratch

I am currently trying to get into ConvNets and have 2 different systems to train on. The first one is a GPU-Server I have access to from work and the second one is a pc (terra workstation) which I want to use to play around at home. The System components are as follows:

在此处输入图像描述

Now I know that the GPU-Server has more capacity, but I want to confirm that the issue is because of the capacity of the system and not because I accidentily installed something wrong.

For this I implemented the VGG16.net from scratch with this code:

train_data_path = train_path
    train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.25)

    train_generator = train_datagen.flow_from_directory(
                train_data_path,
                target_size=(self.img_height, self.img_width),
                batch_size=self.batch_size,
                class_mode='categorical',
                subset='training')

    validation_generator = train_datagen.flow_from_directory(
                train_data_path,
                target_size=(self.img_height, self.img_width),
                batch_size=self.batch_size,
                class_mode='categorical',
                subset='validation')

    """trainingDataGenerator = ImageDataGenerator()
    train_generator = trainingDataGenerator.flow_from_directory("C:/Users/but/Desktop/dataScratch/Train", target_size=(384, 384))

    testDataGenerator = ImageDataGenerator()
    validation_generator = testDataGenerator.flow_from_directory("C:/Users/but/Desktop/dataScratch/Valid", target_size=(384, 384))"""

    model = Sequential()

    model.add(Conv2D(input_shape=(384, 384, 3), filters=64, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=64, kernel_size=(3, 3), padding="same", activation="relu"))

    #model.add(MaxPool2D(pool_size=(2,2), strides=(2,2)))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

    model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

    model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))



    model.add(Flatten())

    model.add(Dense(units=4096,activation="relu"))
    
    model.add(Dropout(0.30))  # Dropout layer

    model.add(Dense(units=4096,activation="relu"))

    model.add(Dropout(0.20))  # Dropout layer

    model.add(Dense(units=self.classes_num, activation="sigmoid"))


    opt = Adam(lr=0.00001)
    #opt = RMSprop(lr=0.00001)
    model.compile(optimizer=opt, loss=keras.losses.binary_crossentropy, metrics=['acc'])

    model.summary()

This implementation does work on the GPU-Server without problems. The values for epoches was 64 and the value for batch-size was 128 at most. After trying it out on work, I wanted to try it at home with the terra workstation. But I immediately run into an OOM-Error:

Epoch 1/64
2020-08-20 07:38:57.111158: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-08-20 07:38:57.311523: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-20 07:38:58.143037: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-08-20 07:38:58.193876: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.34GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.194023: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.34GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.371176: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.371345: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.55GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.420423: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.28GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.420580: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.28GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.453153: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 989.13MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.453308: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 989.13MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.497624: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.21MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.497789: W tensorflow/core/common_runtime/bfc_allocator.cc:243] Allocator (GPU_0_bfc) ran out of memory trying to allocate 16.21MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-08-20 07:38:58.498100: W tensorflow/core/kernels/gpu_utils.cc:48] Failed to allocate memory for convolution redzone checking; skipping this check. This is benign and only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once.
2020-08-20 07:39:08.560754: W tensorflow/core/common_runtime/bfc_allocator.cc:424] Allocator (GPU_0_bfc) ran out of memory trying to allocate 144.00MiB (rounded to 150994944).  Current allocation summary follows.
2020-08-20 07:39:08.560957: I tensorflow/core/common_runtime/bfc_allocator.cc:894] BFCAllocator dump for GPU_0_bfc
2020-08-20 07:39:08.561063: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (256):   Total Chunks: 134, Chunks in use: 134. 33.5KiB allocated for chunks. 33.5KiB in use in bin. 3.2KiB client-requested in use in bin.
2020-08-20 07:39:08.561254: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (512):   Total Chunks: 10, Chunks in use: 10. 5.0KiB allocated for chunks. 5.0KiB in use in bin. 5.0KiB client-requested in use in bin.
2020-08-20 07:39:08.561444: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1024):  Total Chunks: 16, Chunks in use: 16. 16.3KiB allocated for chunks. 16.3KiB in use in bin. 16.0KiB client-requested in use in bin.
2020-08-20 07:39:08.561641: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2048):  Total Chunks: 30, Chunks in use: 30. 61.5KiB allocated for chunks. 61.5KiB in use in bin. 60.0KiB client-requested in use in bin.
2020-08-20 07:39:08.561839: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4096):  Total Chunks: 6, Chunks in use: 5. 40.5KiB allocated for chunks. 33.8KiB in use in bin. 33.8KiB client-requested in use in bin.
2020-08-20 07:39:08.562030: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8192):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-20 07:39:08.562217: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16384):     Total Chunks: 10, Chunks in use: 10. 167.3KiB allocated for chunks. 167.3KiB in use in bin. 160.0KiB client-requested in use in bin.
2020-08-20 07:39:08.562418: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (32768):     Total Chunks: 5, Chunks in use: 5. 160.0KiB allocated for chunks. 160.0KiB in use in bin. 160.0KiB client-requested in use in bin.
2020-08-20 07:39:08.562621: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (65536):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-20 07:39:08.562810: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (131072):    Total Chunks: 5, Chunks in use: 5. 731.5KiB allocated for chunks. 731.5KiB in use in bin. 720.0KiB client-requested in use in bin.
2020-08-20 07:39:08.563009: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (262144):    Total Chunks: 5, Chunks in use: 5. 1.51MiB allocated for chunks. 1.51MiB in use in bin. 1.41MiB client-requested in use in bin.
2020-08-20 07:39:08.563214: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (524288):    Total Chunks: 5, Chunks in use: 5. 2.81MiB allocated for chunks. 2.81MiB in use in bin. 2.81MiB client-requested in use in bin.
2020-08-20 07:39:08.563406: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1048576):   Total Chunks: 5, Chunks in use: 5. 5.63MiB allocated for chunks. 5.63MiB in use in bin. 5.63MiB client-requested in use in bin.
2020-08-20 07:39:08.563599: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2097152):   Total Chunks: 10, Chunks in use: 10. 22.50MiB allocated for chunks. 22.50MiB in use in bin. 22.50MiB client-requested in use in bin.
2020-08-20 07:39:08.564610: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4194304):   Total Chunks: 5, Chunks in use: 5. 22.50MiB allocated for chunks. 22.50MiB in use in bin. 22.50MiB client-requested in use in bin.
2020-08-20 07:39:08.564904: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8388608):   Total Chunks: 25, Chunks in use: 25. 229.04MiB allocated for chunks. 229.04MiB in use in bin. 225.00MiB client-requested in use in bin.
2020-08-20 07:39:08.565228: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16777216):  Total Chunks: 1, Chunks in use: 1. 27.00MiB allocated for chunks. 27.00MiB in use in bin. 27.00MiB client-requested in use in bin.
2020-08-20 07:39:08.565421: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (33554432):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2020-08-20 07:39:08.565594: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (67108864):  Total Chunks: 6, Chunks in use: 6. 451.75MiB allocated for chunks. 451.75MiB in use in bin. 392.00MiB client-requested in use in bin.
2020-08-20 07:39:08.565962: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (134217728):     Total Chunks: 3, Chunks in use: 2. 431.18MiB allocated for chunks. 288.00MiB in use in bin. 288.00MiB client-requested in use in bin.
2020-08-20 07:39:08.566260: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (268435456):     Total Chunks: 9, Chunks in use: 9. 7.31GiB allocated for chunks. 7.31GiB in use in bin. 7.31GiB client-requested in use in bin.
2020-08-20 07:39:08.566552: I tensorflow/core/common_runtime/bfc_allocator.cc:917] Bin for 144.00MiB was 128.00MiB, Chunk State: 
2020-08-20 07:39:08.566667: I tensorflow/core/common_runtime/bfc_allocator.cc:923]   Size: 143.18MiB | Requested Size: 1.13MiB | in_use: 0 | bin_num: 19, prev:   Size: 144.00MiB | Requested Size: 144.00MiB | in_use: 1 | bin_num: -1
2020-08-20 07:39:08.566975: I tensorflow/core/common_runtime/bfc_allocator.cc:930] Next region of size 9104897280
2020-08-20 07:39:08.567135: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00000 of size 1280 next 1
2020-08-20 07:39:08.567298: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00500 of size 256 next 2
2020-08-20 07:39:08.567387: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00600 of size 256 next 5
2020-08-20 07:39:08.567468: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00700 of size 256 next 4
2020-08-20 07:39:08.567549: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00800 of size 512 next 10
2020-08-20 07:39:08.567632: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00a00 of size 512 next 12
2020-08-20 07:39:08.567715: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe00c00 of size 1024 next 16
2020-08-20 07:39:08.567784: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe01000 of size 1024 next 18
2020-08-20 07:39:08.567869: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe01400 of size 1024 next 20
2020-08-20 07:39:08.567943: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe01800 of size 2048 next 24
2020-08-20 07:39:08.567993: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe02000 of size 2048 next 26
2020-08-20 07:39:08.568086: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe02800 of size 2048 next 28
2020-08-20 07:39:08.568166: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe03000 of size 3584 next 3
2020-08-20 07:39:08.568245: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe03e00 of size 6912 next 6
2020-08-20 07:39:08.568325: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe05900 of size 2048 next 31
2020-08-20 07:39:08.568410: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe06100 of size 2048 next 34
2020-08-20 07:39:08.568492: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe06900 of size 16384 next 36
2020-08-20 07:39:08.568573: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0a900 of size 16384 next 38
2020-08-20 07:39:08.568656: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0e900 of size 256 next 41
2020-08-20 07:39:08.568734: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ea00 of size 256 next 44
2020-08-20 07:39:08.568817: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0eb00 of size 256 next 46
2020-08-20 07:39:08.568899: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ec00 of size 256 next 47
2020-08-20 07:39:08.568979: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ed00 of size 256 next 48
2020-08-20 07:39:08.569059: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ee00 of size 256 next 49
2020-08-20 07:39:08.569143: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0ef00 of size 256 next 50
2020-08-20 07:39:08.569222: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0f000 of size 256 next 51
2020-08-20 07:39:08.569298: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe0f100 of size 6912 next 52
2020-08-20 07:39:08.569383: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe10c00 of size 256 next 53
2020-08-20 07:39:08.569465: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe10d00 of size 256 next 54
2020-08-20 07:39:08.569544: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe10e00 of size 512 next 55
2020-08-20 07:39:08.569622: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe11000 of size 512 next 57
2020-08-20 07:39:08.569700: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 130fe11200 of size 1024 next 58
.
.
.
.
.
.
.
2020-08-20 07:39:08.592846: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 2 Chunks of size 603979776 totalling 1.13GiB
2020-08-20 07:39:08.592924: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 5 Chunks of size 1207959552 totalling 5.63GiB
2020-08-20 07:39:08.593012: I tensorflow/core/common_runtime/bfc_allocator.cc:962] Sum Total of in-use chunks: 8.34GiB
2020-08-20 07:39:08.593092: I tensorflow/core/common_runtime/bfc_allocator.cc:964] total_region_allocated_bytes_: 9104897280 memory_limit_: 9104897474 available bytes: 194 curr_region_allocation_bytes_: 18209795072
2020-08-20 07:39:08.593233: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Stats: 
Limit:                  9104897474
InUse:                  8954752000
MaxInUse:               9104890368
NumAllocs:                     778
MaxAllocSize:           1207959552

2020-08-20 07:39:08.593416: W tensorflow/core/common_runtime/bfc_allocator.cc:429] ***************************************************************************************************_
2020-08-20 07:39:08.593504: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at conv_ops.cc:539 : Resource exhausted: OOM when allocating tensor with shape[16,256,96,96] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2020-08-20 07:39:08.593626: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[16,256,96,96] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node conv2d_6/convolution}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last):
  File "C:/Users/but/PycharmProjects/ScratchClassifier/ScratchNet.py", line 147, in <module>
    x.model_create("C:/Users/but/Desktop/dataScratch/Train")
  File "C:/Users/but/PycharmProjects/ScratchClassifier/ScratchNet.py", line 108, in model_create
    hist = model.fit_generator(steps_per_epoch=self.batch_size, generator=train_generator, validation_data=validation_generator, validation_steps=8, epochs=self.epochs, callbacks=[checkpoint,early])
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\engine\training.py", line 1732, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\engine\training_generator.py", line 220, in fit_generator
    reset_metrics=False)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\engine\training.py", line 1514, in train_on_batch
    outputs = self.train_function(ins)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3727, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\function.py", line 1551, in __call__
    return self._call_impl(args, kwargs)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\function.py", line 1591, in _call_impl
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call
    ctx=ctx)
  File "C:\Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError:  OOM when allocating tensor with shape[16,256,96,96] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node conv2d_6/convolution (defined at \Users\but\Miniconda3\envs\kerasEnv\lib\site-packages\keras\backend\tensorflow_backend.py:3009) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_keras_scratch_graph_3860]

I couldn't even train it with a batch size of 1, which was suggested in other posts to see if it works. Then I tried to reduce the amount of neurons in the dense layers and set it to 2024 instead of 4096 and then it worked.

My question would be: Was the amount of Neurons in the dense layers too much for the hardware or do you think that maybe I have configured something wrong?

You can try different input sizes, a different number of neurons in every layer. That's not a problem. It's a good way to see how your model acts on different params. But there is a limitation depending on your hardware. If you see your model summary, your model has 333 million parameters? can you imagine how heavy is your model?

Original VGG16 has 138 million parameters.

The problem is the limitation of your hardware. Your GPU doesn't have enough memory to allocate and load the required buffer, even for only 1 image. You have to reduce the input image size or drop conv block (not a good idea) or reduce number of neurons from some layers.

But if you really want to train this model on your current PC, you can try tensorflow-cpu instead of tensorflow-gpu, if you have a large RAM btw. But please note, training will be super slower than the GPU.

Your model would be larger than the Keras implementation, because you have a larger input shape. The VGG-19 usually takes input of shape (224, 224, 3) .

I'm afraid hardware questions are off-topic though, so I won't go there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM