简体   繁体   中英

Multi-modal neural network training loss is not decreasing

I am trying to write a multi-modal network here and I am not sure if I am doing it the right way.

I have two networks, where, network_1 uses an image as an input, and network_2 is a fully connected network takes as an input a 17x1 vector full of joint positions(numbers ranging from -0.7 - 0.7). I concatenate the final fully connected layers of both the networks and output the final layers of 7 classes.

CODE:

 41         #-------NETWORK 1---------------

 42         network1 = Sequential()
 43         #Dense layers - 1st param is output
 44         network1.add(Dense(2048, input_shape=(8500,),name="dense_one"))
 45         network1.add(Dense(2048,activation='sigmoid',name = "dense_two"))
 46         network1.add(Dense(1000,activation='sigmoid',name = "dense_three"))
 47         network1.add(Dense(100,activation = 'relu',name = "dense_four"))
 48 
 49         for l in network1.layers:
 50                 print l.name, l.input_shape , "=======>", l.output_shape
 51 
 52         print network1.summary()
 53 
 54         #-------- NETWORK 2-----------
 55 
 56         network2 = Sequential()
 57         network2.add(Conv2D(32, kernel_size=(3,3),                                                                                   activation =                               'relu',                                                                             input_shape = (224,224,3)))
 58         network2.add(Conv2D(64, kernel_size = (3,3)))
 59         network2.add(MaxPooling2D(pool_size=(2,2)))
 60         network2.add(Dropout(0.5))
 61 
 62         network2.add(Dense(100,activation='sigmoid',name ="network2_three"))
 63         network2.add(Flatten())
 64 
 65         #-------------------MERGED NETWORK------------------#
 66 
 67         model = Sequential()
 68         model.add(Merge([network1,network2],mode = 'concat'))

The accuracy and loss doesn't seem to decrease. I am playing with different learning rates right now.

But, is there anything else that I should try? I am not able to find example architectures for multi-modal neural networks. How do I go about experimenting with different architectures?

A few tips :

  • Are you sure it's not a data problem ? try to visualize/inspect the inputs and target labels right before they go into the network for training, and make sure the inputs and their corresponding labels make sense. It sounds obvious but it's too common of an error that I wanted to mention it.

  • Try to define multi-input architectures as illustrated here using the Functional API (instead of multiple Sequential models)

  • Try using a small subsample of your data and see whether your model overfits it (it should), otherwise something can be wrong in how you train.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM