简体   繁体   中英

Tensorflow classification model returns incorrect output shape

I am making a simple binary classification model, that takes 30 timestamps with 5 features and should return a probability of a certain class

I've ran into the problem of the model's loss not decreasing over epochs. I've looked into the model's summary and output, and found out that instead of producing a single output number (probability of a class) it instead produces an array of 30 probabilities, which probably leads to it not being able to learn.

The model code is as follows:

print(train['inputs'].shape)  #(3511,30,5)
print(train['labels'].shape)  #(3511,1)

lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.Dense(256),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.Dense(256),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.Dense(256),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
  ])


  lstm_model.compile(
                loss="binary_crossentropy",
                optimizer=tf.optimizers.Adam(learning_rate=0.0001),
                metrics=["accuracy"])

  history = lstm_model.fit(x=train['inputs'], y=train['labels'], epochs=1,
                       validation_data=(val['inputs'], val['labels']),
                      )

The number of layers doesn't seem to impact the issue (added this much trying to overfit the model)

The summary of the model is as follows:

Model: "sequential_108"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_297 (Dense)            (1, 30, 256)              1536      
_________________________________________________________________
activation_128 (Activation)  (1, 30, 256)              0         
_________________________________________________________________
dense_298 (Dense)            (1, 30, 256)              65792     
_________________________________________________________________
activation_129 (Activation)  (1, 30, 256)              0         
_________________________________________________________________
dense_299 (Dense)            (1, 30, 256)              65792     
_________________________________________________________________
activation_130 (Activation)  (1, 30, 256)              0         
_________________________________________________________________
dense_300 (Dense)            (1, 30, 256)              65792     
_________________________________________________________________
activation_131 (Activation)  (1, 30, 256)              0         
_________________________________________________________________
dense_301 (Dense)            (1, 30, 1)                257       
=================================================================
Total params: 199,169
Trainable params: 199,169
Non-trainable params: 0

As you can see, the output layer returns an array of shape (30,1), and the same happens when trying to make actual predictions using the model.

I've also tried to reshape the labels to (3511) and (3511,1,1), but this doesn't seem to have fixed the issue.

What could be causing this behavior?

I assume you want to use LSTM layer, as you are working with 3-dimensional timestamp input.

All you need to do is set return_sequences in the last LSTM layer to False , for example:

lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(5, return_sequences=True, dropout=0.2, recurrent_dropout=0.2),
    tf.keras.layers.LSTM(10, return_sequences=True, activation='relu'),
    tf.keras.layers.LSTM(64, return_sequences=False, activation='relu'),
    tf.keras.layers.Dense(256),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.Dense(256),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.Dense(256),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.Dense(256),
    tf.keras.layers.Activation('relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
  ])

Some explanation behind how shapes in LSTM layers work is provided eg in this question:

How to stack multiple lstm in keras?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM