简体   繁体   中英

CNN Negative Number of Parameters

I am trying to build a CNN model with keras. When i add two blocks of Conv3D and MaxPooling, everything is normal. However, once the third block is added (as shown in the code) the number of trainable parameters gets negative value. Any idea how this can happen?

model = keras.models.Sequential()

# # # First Block
model.add(Conv2D(filters=16, kernel_size=(5, 5), padding='valid', input_shape=(157, 462, 14), activation = 'tanh' ))
model.add(MaxPooling2D( (2,2) ))

# # # Second Block     
model.add(Conv2D(filters=32, kernel_size=(5, 5), padding='valid', activation = 'tanh'))
model.add(MaxPooling2D( (2, 2) ))

# # # Third Block   
model.add(Conv2D(filters=64, kernel_size=(5, 5), padding='valid', activation = 'tanh'))
model.add(MaxPooling2D( (2, 2) ))

model.add(Flatten())
model.add(Dense(157 * 462))
model.compile(loss='mean_squared_error',
              optimizer=keras.optimizers.Adamax(),
               metrics=['mean_absolute_error'])

print(model.summary())

The result of this code is the following:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 153, 458, 16)      5616      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 76, 229, 16)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 72, 225, 32)       12832     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 112, 32)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 108, 64)       51264     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 54, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 55296)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 72534)             -284054698
=================================================================
Total params: -283,984,986
Trainable params: -283,984,986
Non-trainable params: 0
_________________________________________________________________
None

Yes, of course, your Dense layer has a weight matrix of size 55296 x 72534 , which contains 4010840064 numbers, that is 4010 million parameters.

Somewhere in the Keras code the number of parameters is stored as an int32, and that means there is a limit to what numbers it can store, namely 2^32 - 1 = 2147483647 , and now you can see, your 4010 million parameters is larger than 2^32 - 1 , so the number overflows into the negative side of an integer.

I would recommend not making a model with such large number of parameters, you would not be able to train it anyway without aa huge amount of RAM.

The problem is because you are running your code in CPU due to which the backend of keras tensorflow or theano are able to work properly. I was able to run your code perfectly with GPU in google colab and this is what I got

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 153, 458, 16)      5616      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 76, 229, 16)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 72, 225, 32)       12832     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 112, 32)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 108, 64)       51264     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 54, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 55296)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 72534)             4010912598
=================================================================
Total params: 4,010,982,310
Trainable params: 4,010,982,310
Non-trainable params: 0

I recommend you to use GPU for training such a huge network.

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM