Keras LSTM for timeseries prediction: predicting vectors of features

Question

I have a timeseries dataset with N observations and F features. Every feature can either manifest (1) or not manifest (0). So the dataset would look like this:

T    F1    F2    F3    F4    F5 ... F
0    1     0     0     1     0      0
1    0     1     0     0     1      1
2    0     0     0     1     1      0
3    1     1     1     1     0      0
...
N    1     1     0     1     0      0

I am trying to use an LSTM-based architecture to predict which features manifest at time T+1 based on the observations TW - T, where W is the width of some time window. If W=4, the LSTM 'sees' 4 timesteps into the past in order to make the prediction. The LSTM expects 3D input, which will be of the form (number_batches, W, F). A naive Keras implementation might look like:

model = Sequential()
model.add(LSTM(128, stateful=True, batch_input_shape=(batch_size, W, F)))
model.add(Dense(F, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])
model.fit(x_train, y_train,
          batch_size=batch_size, epochs=250, shuffle=False,
          validation_data=(x_val, y_val))

The main problem I am having is this: the full dataset has a large number of features (> 200) and it is relatively rare for features to manifest, ie 0 is much more common than 1. The neural net simply learns to set all values to 0 and so achieves a high degree of 'accuracy'.

In essence, I want to weight every 1 in the input matrix by some value to give it more importance, but I am confused how to implement this in Keras. I know there is an option sample_weight in Keras, but how does it work? I would not know how to implement it in my example, for instance. Is this a reasonable solution to the problem I have? What optimiser and loss functions are commonly used for this type of problem?

Answer 1

This is a loss function I'm using for 2D highly unbalanced data, it works very well. You can replace the binary_crossentropy for another kind of loss.

import keras.backend as K

def weightedByBatch(yTrue,yPred):

    nVec = K.ones_like(yTrue) #to sum the total number of elements in the tensor
    percent = K.sum(yTrue) / K.sum(nVec) #percent of ones relative to total
    percent2 = 1 - percent #percent of zeros relative to total   
    yTrue2 = 1 - yTrue #complement of yTrue (yTrue+ yTrue2 = full of ones)   

    weights = (yTrue2 * percent2) + (yTrue*percent)
    return K.mean(K.binary_crossentropy(yTrue,yPred)/weights)

For your 3D data, this may work, but maybe you could work in columns, creating a pair of weights for each feature, instead of summing all features together.

This would be done like this:

def weightedByBatch2D(yTrue,yPred):

    nVec = K.ones_like(yTrue) #to sum the total number of elements in the tensor
    percent = K.sum(K.sum(yTrue,axis=0,keepdims=True),axis=1,keepdims=True) / K.sum(K.sum(nVec,axis=0,keepdims=True),axis=1,keepdims=True) #percent of ones relative to total
    percent2 = 1 - percent #percent of zeros relative to total   
    yTrue2 = 1 - yTrue #complement of yTrue (yTrue+ yTrue2 = full of ones)   

    weights = (yTrue2 * percent2) + (yTrue*percent)
    return K.mean(K.binary_crossentropy(yTrue,yPred)/weights)

Keras LSTM for timeseries prediction: predicting vectors of features

Question

1 answers

solution1
1 ACCPTED 2017-10-08 20:11:11

Keras LSTM for timeseries prediction: predicting vectors of features

Question

1 answers

solution1 1 ACCPTED 2017-10-08 20:11:11

solution1
1 ACCPTED 2017-10-08 20:11:11