简体   繁体   中英

training of a fully convolutional network with images of arbitrary size

I have built a fully convolutional network that I feed subnetwork A with MFCC coefficients The wav files where MFCCs are calculated from have variable duration, so every wav ends up to a list of MFCCs with variable length. I made an implementation and try to feed the sub network A with batch size=1.

    X_audio_images = loadFeaturesFromFiles(trainFilenames, params)

from sklearn.model_selection import KFold
n_split=5
f=0
all_acc=[]
for train_index,test_index in KFold(n_split).split(X):
    f=f+1
    print('\n\n\nFold ',f,' of ', n_split)
    for k, l in zip(train_index, test_index):
        xA_train,xA_test=X_audio_images[k],X_audio_images[l]
        xB_train,xB_test=X[k],X[l]
        y_train,y_test=y[k],y[l]
    
    
    xA_test = np.array(xA_test)
    xA_test = np.expand_dims(xA_test, axis=(0, 3))
    xA_train = np.array(xA_train)
    xA_train = np.expand_dims(xA_train, axis=(0, 3))
    xB_test = np.expand_dims(xB_test, axis=(0))
    xB_train = np.expand_dims(xB_train, axis=(0))
    y_train = np.expand_dims(y_train, axis=(0))
    y_test = np.expand_dims(y_test, axis=(0))
     
    inputsA = Input(shape=(None, None, 1))

    xA = Conv2D(filters=32, kernel_size=5, strides=1)(inputsA)
    xA = Dropout(0.5)(xA)
    xA = BatchNormalization()(xA)
    xA = Activation('relu')(xA)

    xA = MaxPooling2D()(xA)

    xA = Conv2D(filters=64, kernel_size=5, strides=1)(xA)
    xA = Dropout(0.5)(xA)
    xA = BatchNormalization()(xA)
    xA = Activation('relu')(xA)

    xA = MaxPooling2D()(xA)

    xA = Conv2D(filters=64, kernel_size=1, strides=1)(xA)
    xA = Dropout(0.3)(xA)
    xA = BatchNormalization()(xA)
    xA = Activation('relu')(xA)

    
    # Fully connected layer 2
    xA = Conv2D(filters=4, kernel_size=1, strides=1)(xA)
    xA = Dropout(0.2)(xA)
    xA = BatchNormalization()(xA)
    xA = GlobalMaxPooling2D()(xA)
   # predictions = tf.keras.layers.Activation('softmax')(x)
    
    inputsB = Input(shape=(input_dim,))
    xB = Dense(128, activation='relu')(inputsB)
    xB = Dropout(0.5)(xB)
    xB = Dense(128, activation='relu')(xB)
    xB = Dropout(0.5)(xB)

    combined =Concatenate()([xA, xB])
    out = Dense(104,  activation='relu')(combined)
    out = Dropout(0.3)(out)
    out = Dense(4,  activation='softmax')(out)


    model = Model(inputs=[inputsA, inputsB], outputs=out)
    model.summary()
    
   
 
    model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.01),metrics= ['accuracy'])
    
    # Train the model.
    print(xA_train.shape, xB_train.shape)
    model.fit( [xA_train, xB_train], y_train,  epochs=500,   batch_size=12, validation_data=([xA_test, xB_test], y_test), callbacks=[es] )
    loss,acc = model.evaluate( [xA_test, xB_test],  y_test)
    
    print('\nModel evaluation for fold ',f,' accuracy: ',acc,'\n\n')
    all_acc.append(acc)
#    time.sleep(2)
    

print('\n\nTest Accuracies for all folds: ', all_acc, '\tAverage: ', np.average(all_acc))

but I think its not actually doing what is supposed to do What I am trying to understand is how I feed the network every time with variable input length set the network dimension and then change it all over again and actually learn something. What is the steps for training?

If you want variable input size this is how to do it, but that won't work with Dense connections

If you want this to work with Dense connections you'll need to provide a set input size, since this creates a relationship between the image size and how many connections are coming out of it

Also, your variable 'input_dim' isn't referenced, is that what you're wanting to vary?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM