简体   繁体   中英

Keras: val_loss & val_accuracy are not changing

I am trying to build an LSTM model to predict whether a stock is going up or down the next day. As you can see, a simple classification task that got me stuck for a couple of days now. I am selecting 3 features only to feed into my network, below I am showing my pre-processing:

# pre-processing, last column has values of either 1 or zero
len(df.columns) # 32 columns
index_ = len(df.columns) - 1
x = df.iloc[:,:index_]
y = df.iloc[:,index_:].values.astype(int)

Removing any nan values:

def clean_dataset(df):
    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
    df.dropna(inplace=True)
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf, 'NaN', 'nan']).any(1)
    return df[indices_to_keep].astype(np.float64)

df = clean_dataset(df)

Then I am taking the 3 selected features and showing the shape for X and Y

selected_features = ['feature1', 'feature2', 'feature3']
x = x[selected_features].values.astype(float)
# s.shape (44930, 3)
# y.shape (44930, 1)

Then I am splitting my dataset into 80/20

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=98 )

Here I am reshaping my data

x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], 1) 
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], 1) 
y_train = y_train.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)

Here is the new shape of each one:

x_train.shape = (35944, 3, 1)
x_test.shape = (8986, 3, 1)
y_train.shape = (35944, 1)
y_test.shape = (8986, 1)

First sample of the x_train set Before reshaping

x_train[0] => array([8.05977145e-01, 4.92200000e+01, 1.23157152e+08])

First sample of the x_train set After reshaping

x_train[0] => array([[8.05977145e-01],
                     [4.92200000e+01],
                     [1.23157152e+08]
                      ])

Making sure no nan values in my training set both x_train, and y_train :

for main_index, xx in enumerate(x_train):
  for i, y in enumerate(xx):
    if type(x_train[main_index][i][0]) !=  np.float64:
      print("Something wrong here:" ,main_index, i)
else:
  print("done") # one done, got nothing wrong

Finally I am training here LSTM

def build_nn():
    model = Sequential()    
    model.add(Bidirectional(LSTM(32, return_sequences=True, input_shape = (x_train.shape[1], 1), name="one"))) #. input_shape = (None, *x_train.shape) , 
    model.add(Dropout(0.20))
    model.add(Bidirectional(LSTM(32, return_sequences=False, name="three")))
    model.add(Dropout(0.10))
    model.add(Dense(32, activation='relu'))
    model.add(Dropout(0.10))
    model.add(Dense(1, activation='sigmoid'))
    opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
    return model

filepath = "bilstmv1.h5"
chkp = ModelCheckpoint(monitor = 'val_accuracy', mode = 'auto', filepath=filepath, verbose = 1, save_best_only=True)


model = build_nn()
model.fit(x_train, y_train, epochs=15, batch_size=32, validation_split=0.1, callbacks=[chkp])

Here is CNN:

model.add(Conv1D(256, 3, input_shape = (x_train.shape[1], 1), activation='relu', padding="same"))
model.add(BatchNormalization())
model.add(Dropout(0.15))
model.add(Conv1D(128, 3, activation='relu', padding="same"))
model.add(BatchNormalization())
model.add(Dropout(0.15))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.15))
model.add(Dense(1))
model.add(Activation("sigmoid"))
# opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.01)
# opt = SGD(lr=0.01)
model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=['accuracy'])

All seems good until I start training, both val_loss and val_accuracy are NOT changing when training

Epoch 1/15
1011/1011 [==============================] - 18s 10ms/step - loss: 0.6803 - accuracy: 0.5849 - val_loss: 0.6800 - val_accuracy: 0.5803

Epoch 00001: val_accuracy improved from -inf to 0.58025, saving model to bilstmv1.h5
Epoch 2/15
1011/1011 [==============================] - 9s 9ms/step - loss: 0.6782 - accuracy: 0.5877 - val_loss: 0.6799 - val_accuracy: 0.5803

Epoch 00002: val_accuracy did not improve from 0.58025
Epoch 3/15
1011/1011 [==============================] - 9s 8ms/step - loss: 0.6793 - accuracy: 0.5844 - val_loss: 0.6799 - val_accuracy: 0.5803

Epoch 00003: val_accuracy did not improve from 0.58025
Epoch 4/15
1011/1011 [==============================] - 9s 9ms/step - loss: 0.6784 - accuracy: 0.5861 - val_loss: 0.6799 - val_accuracy: 0.5803

Epoch 00004: val_accuracy did not improve from 0.58025
Epoch 5/15
1011/1011 [==============================] - 9s 9ms/step - loss: 0.6796 - accuracy: 0.5841 - val_loss: 0.6799 - val_accuracy: 0.5803

Epoch 00005: val_accuracy did not improve from 0.58025
Epoch 6/15
1011/1011 [==============================] - 8s 8ms/step - loss: 0.6792 - accuracy: 0.5842 - val_loss: 0.6798 - val_accuracy: 0.5803

Epoch 00006: val_accuracy did not improve from 0.58025
Epoch 7/15
1011/1011 [==============================] - 8s 8ms/step - loss: 0.6779 - accuracy: 0.5883 - val_loss: 0.6798 - val_accuracy: 0.5803

Epoch 00007: val_accuracy did not improve from 0.58025
Epoch 8/15
1011/1011 [==============================] - 8s 8ms/step - loss: 0.6797 - accuracy: 0.5830 - val_loss: 0.6798 - val_accuracy: 0.5803

Epoch 00008: val_accuracy did not improve from 0.58025

I tried to change every single thing i saw here and there and nothing worked, I am sure I have no nan values in my data as i did remove them in the pre-processing steps. I tried to run CNN to check if it is related to LSTM or not and got the same thing (neither one of the 2 things are changing) . Also, after trying different optimizers, nothing has changed. Any help is really appreciated.

Here is a link of the dataset after doing all the pre-processing: https://drive.google.com/file/d/1punYl-f3dFbw1YWtw3M7hVwy5knhqU9Q/view?usp=sharing

Using Decision Tree I was able to get 85%

decesion_tree = DecisionTreeClassifier().fit(x_train, y_train)
dt_predictions = decesion_tree.predict(x_test)
score = metrics.accuracy_score(y_test, dt_predictions) # 85

Note: the predictions test has same values for all testing set (x_test), that tell us why the val_accuracy is not changing.

There are multiple issues here so I will try to address them all step by step.

  1. The first is that machine learning data needs to have a pattern which the model can infer and predict. Stock prediction is highly irregular, nearly random and I would attribute any accuracy deviation from 50% to statistical variance.

  2. NN can be very hard to train and 'There is no free lunch'

import pandas as pd
import numpy as np

import tensorflow as tf

from tensorflow.keras import *
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import *

file = pd.read_csv('dummy_db.csv')

x_train = np.expand_dims(file[['feature1', 'feature2', 'feature3']].to_numpy(), axis=2)
y_train = file['Label'].to_numpy(np.bool)


model = Sequential()
model.add(Bidirectional(LSTM(32, return_sequences=True, input_shape = (x_train.shape[1], 1), name="one"))) #. input_shape = (None, *x_train.shape) ,
model.add(Dropout(0.20))
model.add(Bidirectional(LSTM(32, return_sequences=False, name="three")))
model.add(Dropout(0.10))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.10))
model.add(Dense(1, activation='sigmoid'))
opt = SGD(learning_rate = 0, momentum = 0.1)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=1, batch_size=128, validation_split=0.1)

A zero LR train step to identify initial accuracy. You will see that the intial accuracy is 41%(This accuracy is a hit or miss as will explain later).

316/316 [==============================] - 10s 11ms/step - loss: 0.7006 - accuracy: 0.4321 - val_loss: 0.6997 - val_accuracy: 0.41

I am keeping the LR small (1e-4) so you can see the shift in accuracy happening

opt = SGD(learning_rate = 1e-4, momentum = 0.1)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=15,batch_size=128, validation_split=0.1)

Epoch 1/15 316/316 [==============================] - 7s 9ms/step - loss: 0.6982 - accuracy: 0.4573 - val_loss: 0.6969 - val_accuracy: 0.41

Epoch 2/15 316/316 [==============================] - 2s 5ms/step - loss: 0.6964 - accuracy: 0.4784 - val_loss: 0.6954 - val_accuracy: 0.41

Epoch 3/15 316/316 [==============================] - 2s 6ms/step - loss: 0.6953 - accuracy: 0.4841 - val_loss: 0.6941 - val_accuracy: 0.49

Epoch 4/15 316/316 [==============================] - 2s 6ms/step - loss: 0.6940 - accuracy: 0.4993 - val_loss: 0.6929 - val_accuracy: 0.51

Epoch 5/15 316/316 [==============================] - 2s 6ms/step - loss: 0.6931 - accuracy: 0.5089 - val_loss: 0.6917 - val_accuracy: 0.54

Epoch 6/15 316/316 [==============================] - 2s 6ms/step - loss: 0.6918 - accuracy: 0.5209 - val_loss: 0.6907 - val_accuracy: 0.56

Epoch 7/15 316/316 [==============================] - 2s 6ms/step - loss: 0.6907 - accuracy: 0.5337 - val_loss: 0.6897 - val_accuracy: 0.58

Epoch 8/15 316/316 [==============================] - 2s 6ms/step - loss: 0.6905 - accuracy: 0.5347 - val_loss: 0.6886 - val_accuracy: 0.58

Epoch 9/15 316/316 [==============================] - 2s 6ms/step - loss: 0.6885 - accuracy: 0.5518 - val_loss: 0.6853 - val_accuracy: 0.58

** Rest of the runs left out for brevity **

If you rerun the training, you may see that model initially has a accuracy of 58 % and it never improves. This is because it has no features to actually to learn other than the minima that is seemingly present at 58% and one I wouldnt trust for actual cases.

Let me add some more proof for this

import pandas as pd

file = pd.read_csv('dummy_db.csv')
sum(file['Label'])/len(file)
0.4176496772757623

Thats how many Trues there are, there are concurently 58% falses. So what is happening is that your model is learning to predict false for all cases and getting the sub-optimal 58% accuracy. We can prove this statement

sum(model.predict(x_train) < 0.5)

array([44930])

That is the true reason for your recurring 58%, and I dont think it will ever do better.

  1. You seem to not be using LSTMs properly. LSTMs inputs are of the format [batch, timesteps, feature] and I dont think your inputs are actually timesteps. you can read more here , a question that explains quite well why LSTM is a bad choice for your data. There are better ML classifiers, both DL and non DL which are better at this than using LSTMs. Edit: https://datascience.stackexchange.com/questions/38328/when-does-decision-tree-perform-better-than-the-neural-network explains this even better.

So what to do now?

  1. Get better data.
  2. Read literature where someone did stock prediction and see what exactly they did.
  1. Why are you using Bidirectional on LSTM while trying to do a classification over stock-market?

  2. You should try Scaling your data: values of features_3 are way out of bounds.

  3. I'm not sure feature selection is a good idea here. Your DT may perform better while selecting features. But I don't think reducing dimensionnality is a great idea when trying to find a manifold that splits a potentially very very high dimensionality space into your 2 labels.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM