简体   繁体   中英

Understanding weights from a convolutional layer

I'm trying to do semantic segmentation for magnetic resonance images, which are one channel images.

To get encoder from a U-Net network I use this function:

def get_encoder_unet(img_shape, k_init = 'glorot_uniform', bias_init='zeros'):

    inp = Input(shape=img_shape)
    conv1 = Conv2D(64, (5, 5), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv1_1')(inp)
    conv1 = Conv2D(64, (5, 5), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv1_2')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool1')(conv1)
    
    conv2 = Conv2D(96, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv2_1')(pool1)
    conv2 = Conv2D(96, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv2_2')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool2')(conv2)

    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv3_1')(pool2)
    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv3_2')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool3')(conv3)

    conv4 = Conv2D(256, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv4_1')(pool3)
    conv4 = Conv2D(256, (4, 4), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv4_2')(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2), data_format="channels_last", name='pool4')(conv4)

    conv5 = Conv2D(512, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv5_1')(pool4)
    conv5 = Conv2D(512, (3, 3), activation='relu', padding='same', data_format="channels_last", kernel_initializer=k_init, bias_initializer=bias_init, name='conv5_2')(conv5)

    return conv5,conv4,conv3,conv2,conv1,inp

And its summary is:

Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 200, 200, 1)]     0         
_________________________________________________________________
conv1_1 (Conv2D)             (None, 200, 200, 64)      1664      
_________________________________________________________________
conv1_2 (Conv2D)             (None, 200, 200, 64)      102464    
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 100, 100, 64)      0         
_________________________________________________________________
conv2_1 (Conv2D)             (None, 100, 100, 96)      55392     
_________________________________________________________________
conv2_2 (Conv2D)             (None, 100, 100, 96)      83040     
_________________________________________________________________
pool2 (MaxPooling2D)         (None, 50, 50, 96)        0         
_________________________________________________________________
conv3_1 (Conv2D)             (None, 50, 50, 128)       110720    
_________________________________________________________________
conv3_2 (Conv2D)             (None, 50, 50, 128)       147584    
_________________________________________________________________
pool3 (MaxPooling2D)         (None, 25, 25, 128)       0         
_________________________________________________________________
conv4_1 (Conv2D)             (None, 25, 25, 256)       295168    
_________________________________________________________________
conv4_2 (Conv2D)             (None, 25, 25, 256)       1048832   
_________________________________________________________________
pool4 (MaxPooling2D)         (None, 12, 12, 256)       0         
_________________________________________________________________
conv5_1 (Conv2D)             (None, 12, 12, 512)       1180160   
_________________________________________________________________
conv5_2 (Conv2D)             (None, 12, 12, 512)       2359808   
=================================================================
Total params: 5,384,832
Trainable params: 5,384,832
Non-trainable params: 0
_________________________________________________________________

I'm trying to understand how neural networks work, and I have this code to show the shape for the last layer weights and biases.

layer_dict = dict([(layer.name, layer) for layer in model.layers])

layer_name = model.layers[-1].name
#layer_name = 'conv5_2'

filter_index = 0 # Which filter in this block would you like to visualise?

# Grab the filters and biases for that layer
filters, biases = layer_dict[layer_name].get_weights()

print("Filters")
print("\tType: ", type(filters))
print("\tShape: ", filters.shape)
print("Biases")
print("\tType: ", type(biases))
print("\tShape: ", biases.shape)

With this output:

Filters
    Type:  <class 'numpy.ndarray'>
    Shape:  (3, 3, 512, 512)
Biases
    Type:  <class 'numpy.ndarray'>
    Shape:  (512,)

I'm trying to understand what Filters' shape means (3, 3, 512, 512) . I think the last 512 are the number of filters in this layer, but what (3, 3, 512) means? My images are one channel, so I don't understand that 3, 3 in the filters' shape ( img_shape is (200, 200, 1) ).

I think the last 512 are the number of filters in this layer, but what (3, 3, 512) means?

Means overall size of filters: they are 3D themselves. As input of conv5_2 you have [batch, height', width', channels] tensor. Filters in your case has size 3*3 per channel: you take every 3x3 region of conv5_2 input, applying 3x3 filter to it and get 1 value as output (seeanimation ). But those 3x3 filters are different for every channel (512 in your case) (seethis illustration for 1 channel). After all you want perform Conv2D number_of_filter times, so you need 512 filters of size 3x3x512.
Good article for deeper dive into intuition behind CNN architect and Conv2D in particular (see part 2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM