简体   繁体   English

极差的预测:LSTM 时间序列

[英]Extremely poor prediction: LSTM time-series

I tried to implement LSTM model for time-series prediction.我尝试实现 LSTM 模型进行时间序列预测。 Below is my trial code.下面是我的试用代码。 This code runs without error.此代码运行没有错误。 You can also try it without dependency.您也可以在不依赖的情况下尝试。

import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, TimeDistributed, Bidirectional
from sklearn.metrics import mean_squared_error, accuracy_score
from scipy.stats import linregress
from sklearn.utils import shuffle

fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)
print (raw.shape)

scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)

time_steps = 7
def create_ds(data, t_steps):
    data = pd.DataFrame(data)
    data_s = data.copy()
    for i in range(time_steps):
        data = pd.concat([data, data_s.shift(-(i+1))], axis = 1)   
    data.dropna(axis=0, inplace=True)
    return data.values

ds = create_ds(raw, time_steps)
print (ds.shape)
n_feats = raw.shape[1]
n_obs = time_steps * n_feats

n_rows = ds.shape[0]
train_size = int(n_rows * 0.8)

train_data = ds[:train_size, :]
train_data = shuffle(train_data)

test_data = ds[train_size:, :]

x_train = train_data[:, :n_obs]
y_train = train_data[:, n_obs:]
x_test = test_data[:, :n_obs]
y_test = test_data[:, n_obs:]

x_train = x_train.reshape(1, x_train.shape[0], x_train.shape[1])
y_train = y_train.reshape(1, y_train.shape[0], y_train.shape[1])
x_test = x_test.reshape(1, x_test.shape[0], x_test.shape[1])

print (x_train.shape)
print (y_train.shape)
print (x_test.shape)
print (y_test.shape)

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2]), stateful=True, batch_size=1))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(n_feats, return_sequences=True, stateful=True)) 

model.compile(loss='mse', optimizer='rmsprop')
model.fit(x_train, y_train, epochs=10, batch_size=1, verbose=2)  
y_predict = model.predict(x_test)
y_predict = y_predict.reshape(y_predict.shape[1], y_predict.shape[2])

y_predict = scaler.inverse_transform(y_predict)

y_test = scaler.inverse_transform(y_test)
y_test = y_test[:,0]
y_predict = y_predict[:,0]

print (y_test.shape)
print (y_predict.shape)

plt.plot(y_test, label='True')
plt.plot(y_predict,  label='Predict')
plt.legend()
plt.show()

在此处输入图片说明

However, prediction is extremely poor.但是,预测非常差。 How to improve the predictin?如何提高预测能力? Do you have any ideas to improve it?你有什么想法可以改进它吗?

Any ideas for improving prediction by re-designing architecture and/or layers?通过重新设计架构和/或层来改进预测的任何想法?

If you want to use the model in my code (the link you passed), you need to have the data correctly shaped: (1 sequence, total_time_steps, 5 features)如果你想在我的代码中使用模型(你传递的链接),你需要正确塑造数据:(1个序列,total_time_steps,5个特征)

Important: I don't know if this is the best way or the best model to do this, but this model is predicting 7 time steps ahead of the input ( time_shift=7 )重要提示:我不知道这是最好的方法还是最好的模型,但该模型预测输入提前 7 个时间步长 ( time_shift=7 )

Data and initial vars数据和初始变量

    fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)
print("raw shape:")
print (raw.shape)
#(1789,5) - 1789 time steps / 5 features

scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)

time_shift = 7 #shift is the number of steps we are predicting ahead
n_rows = raw.shape[0] #n_rows is the number of time steps of our sequence
n_feats = raw.shape[1]
train_size = int(n_rows * 0.8)


#I couldn't understand how "ds" worked, so I simply removed it because in the code below it's not necessary

#getting the train part of the sequence
train_data = raw[:train_size, :] #first train_size steps, all 5 features
test_data = raw[train_size:, :] #I'll use the beginning of the data as state adjuster


#train_data = shuffle(train_data) !!!!!! we cannot shuffle time steps!!! we lose the sequence doing this

x_train = train_data[:-time_shift, :] #the entire train data, except the last shift steps 
x_test = test_data[:-time_shift,:] #the entire test data, except the last shift steps
x_predict = raw[:-time_shift,:] #the entire raw data, except the last shift steps

y_train = train_data[time_shift:, :] 
y_test = test_data[time_shift:,:]
y_predict_true = raw[time_shift:,:]

x_train = x_train.reshape(1, x_train.shape[0], x_train.shape[1]) #ok shape (1,steps,5) - 1 sequence, many steps, 5 features
y_train = y_train.reshape(1, y_train.shape[0], y_train.shape[1])
x_test = x_test.reshape(1, x_test.shape[0], x_test.shape[1])
y_test = y_test.reshape(1, y_test.shape[0], y_test.shape[1])
x_predict = x_predict.reshape(1, x_predict.shape[0], x_predict.shape[1])
y_predict_true = y_predict_true.reshape(1, y_predict_true.shape[0], y_predict_true.shape[1])

print("\nx_train:")
print (x_train.shape)
print("y_train")
print (y_train.shape)
print("x_test")
print (x_test.shape)
print("y_test")
print (y_test.shape)

Model模型

Your model wasn't very powerful for this task, so I tried a bigger one (this on the other hand is too powerful)你的模型对于这个任务不是很强大,所以我尝试了一个更大的模型(另一方面这个太强大了)

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2])))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(256, return_sequences=True))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(n_feats, return_sequences=True)) 

model.compile(loss='mse', optimizer='adam')

Fitting配件

Notice that I had to train 2000+ epochs for the model to have good results.请注意,我必须训练 2000 多个 epoch 才能使模型获得良好的结果。
I added the validation data so we can compare the loss for train and test.我添加了验证数据,以便我们可以比较训练和测试的损失。

#notice that I'm predicting from the ENTIRE sequence, including x_train      
#is important for the model to adjust its states before predicting the end
model.fit(x_train, y_train, epochs=1000, batch_size=1, verbose=2, validation_data=(x_test,y_test))  

Predicting预测

Important: as for predicting the end of a sequence based on the beginning, it's important that the model sees the beginning to adjust the internal states, so I'm predicting the entire data ( x_predict ), not only the test data.重要提示:至于根据开头预测序列的结尾,重要的是模型看到开头以调整内部状态,因此我预测的是整个数据 ( x_predict ),而不仅仅是测试数据。

y_predict_model = model.predict(x_predict)

print("\ny_predict_true:")
print (y_predict_true.shape)
print("y_predict_model: ")
print (y_predict_model.shape)


def plot(true, predicted, divider):

    predict_plot = scaler.inverse_transform(predicted[0])
    true_plot = scaler.inverse_transform(true[0])

    predict_plot = predict_plot[:,0]
    true_plot = true_plot[:,0]

    plt.figure(figsize=(16,6))
    plt.plot(true_plot, label='True',linewidth=5)
    plt.plot(predict_plot,  label='Predict',color='y')

    if divider > 0:
        maxVal = max(true_plot.max(),predict_plot.max())
        minVal = min(true_plot.min(),predict_plot.min())

        plt.plot([divider,divider],[minVal,maxVal],label='train/test limit',color='k')

    plt.legend()
    plt.show()

test_size = n_rows - train_size
print("test length: " + str(test_size))

plot(y_predict_true,y_predict_model,train_size)
plot(y_predict_true[:,-2*test_size:],y_predict_model[:,-2*test_size:],test_size)

Showing entire data显示整个数据

在此处输入图片说明

Showing the end portion of it for more detail显示它的结尾部分以获取更多详细信息

Please notice that this model is overfitting , it means it can learn the training data and get bad results in test data.请注意,这个模型是过拟合的,这意味着它可以学习训练数据并在测试数据中得到不好的结果。

To solve this you must experimentally try smaller models, use dropout layers and other techniques to prevent overfitting.为了解决这个问题,你必须通过实验尝试更小的模型,使用 dropout 层和其他技术来防止过度拟合。

Notice also that this data very probably contains A LOT of random factors, meaning the models will not be able to learn anything useful from it.另请注意,此数据很可能包含大量随机因素,这意味着模型将无法从中学习任何有用的信息。 As you make smaller models to avoid overfitting, you may also find that the model will present worse predictions for training data.当您制作较小的模型以避免过度拟合时,您可能还会发现模型对训练数据的预测更差。

在此处输入图片说明

Finding the perfect model is not an easy task, it's an open question and you must experiment.找到完美的模型并非易事,这是一个悬而未决的问题,您必须进行实验。 Maybe LSTM models simply aren't the solution.也许 LSTM 模型根本不是解决方案。 Maybe your data is simply not predictable, etc. There isn't a definitive answer for this.也许您的数据根本无法预测,等等。对此没有明确的答案。

How to know the model is good如何知道模型好不好

With the validation data in training, you can compare loss for train and test data.使用训练中的验证数据,您可以比较训练和测试数据的损失。

Train on 1 samples, validate on 1 samples
Epoch 1/1000
9s - loss: 0.4040 - val_loss: 0.3348
Epoch 2/1000
4s - loss: 0.3332 - val_loss: 0.2651
Epoch 3/1000
4s - loss: 0.2656 - val_loss: 0.2035
Epoch 4/1000
4s - loss: 0.2061 - val_loss: 0.1696
Epoch 5/1000
4s - loss: 0.1761 - val_loss: 0.1601
Epoch 6/1000
4s - loss: 0.1697 - val_loss: 0.1476
Epoch 7/1000
4s - loss: 0.1536 - val_loss: 0.1287
Epoch 8/1000
.....

Both should go down together.两者应该一起下去。 When the test data stops going down, but the train data continues to improve, your model is starting to overfit.当测试数据停止下降,但训练数据继续改善时,您的模型开始过度拟合。





Trying another model尝试其他模型

The best I could do (but I didn't really try much) was using this model:我能做的最好的事情(但我并没有真正尝试太多)是使用这个模型:

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2])))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(n_feats, return_sequences=True)) 

model.compile(loss='mse', optimizer='adam')

When the losses were about:当损失大约是:

loss: 0.0389 - val_loss: 0.0437

After this point, the validation loss started going up (so training beyond this point is totally useless)在这一点之后,验证损失开始上升(因此超出这一点的训练完全没有用)

Result:结果:

在此处输入图片说明

This shows that all this model could learn was very overall behaviour, such as zones with higher values.这表明该模型可以学习的只是非常全面的行为,例如具有较高值的​​区域。

But the high frequency was either too random or the model wasn't good enough for this...但是高频要么太随机,要么模型不够好……

you may consider changing your model:你可以考虑改变你的模型:

import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, TimeDistributed, Bidirectional
from sklearn.metrics import mean_squared_error, accuracy_score
from scipy.stats import linregress
from sklearn.utils import shuffle

fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)
print (raw.shape)

scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)

time_steps = 7
def create_ds(data, t_steps):
    data = pd.DataFrame(data)
    data_s = data.copy()
    for i in range(time_steps):
        data = pd.concat([data, data_s.shift(-(i+1))], axis = 1)   
    data.dropna(axis=0, inplace=True)
    return data.values

ds = create_ds(raw, time_steps)
print (ds.shape)
n_feats = raw.shape[1]
n_obs = time_steps * n_feats

n_rows = ds.shape[0]
train_size = int(n_rows * 0.8)

train_data = ds[:train_size, :]
train_data = shuffle(train_data)

test_data = ds[train_size:, :]

x_train = train_data[:, :n_obs]
y_train = train_data[:, n_obs:]
x_test = test_data[:, :n_obs]
y_test = test_data[:, n_obs:]

print (x_train.shape)
print (x_test.shape)
print (y_train.shape)
print (y_test.shape)

x_train = x_train.reshape(x_train.shape[0], time_steps, n_feats)
x_test = x_test.reshape(x_test.shape[0], time_steps, n_feats)

print (x_train.shape)
print (x_test.shape)
print (y_train.shape)
print (y_test.shape)

model = Sequential()
model.add(LSTM(64, input_shape=(time_steps, n_feats), return_sequences=True))
model.add(LSTM(32, return_sequences=False))
model.add(Dense(n_feats))

model.compile(loss='mse', optimizer='rmsprop')
model.fit(x_train, y_train, epochs=10, batch_size=1, verbose=1, shuffle=False)

y_predict = model.predict(x_test)
print (y_predict.shape)
y_predict = scaler.inverse_transform(y_predict)

y_test = scaler.inverse_transform(y_test)
y_test = y_test[:,0]
y_predict = y_predict[:,0]

print (y_test.shape)
print (y_predict.shape)

plt.plot(y_test, label='True')
plt.plot(y_predict,  label='Predict')
plt.legend()
plt.show()

在此处输入图片说明

But I really do not know merits of your implementation:但我真的不知道你实施的优点:

* both x and y are 3d (1,steps,features) rather than x in 3d (samples, time-steps, features) and y in 2d (samples, features)
* input_shape=(None, x_train.shape[2])
* last layer - model.add(LSTM(n_feats, return_sequences=True, stateful=True)) 

Someone may provide better answer.有人可能会提供更好的答案。

I'm not exactly sure what you could do, that data looks as if it has no discernible pattern.我不确定你能做什么,这些数据看起来好像没有明显的模式。 If I can't see one I doubt an LSTM could.如果我看不到一个,我怀疑 LSTM 可以。 Your prediction does look like a good regression line though.不过,您的预测确实看起来像是一条很好的回归线。

Reading the original code, it seems the author first scales the dataset and then splits it up into Training and Testing subsets.阅读原始代码,作者似乎首先缩放数据集,然后将其拆分为训练和测试子集。 This means that information about the Testing subset (eg, volatility etc.) has "leaked" into the Training subset.这意味着有关测试子集的信息(例如,波动性等)已“泄漏”到训练子集中。

The recommended approach is to first perform the Training/Testing split up, calculate the scaling parameters using only the Training subset, and using these parameters perform the scaling of the Training and the Testing subsets separately.推荐的方法是首先执行训练/测试拆分,仅使用训练子集计算缩放参数,并使用这些参数分别执行训练和测试子集的缩放。

我自己正在创建一个模型来预测这样的数据,我创建了一个 SMOTErnn 灵魂作为过去的数据添加,我发现在 batch_size 上使用 TimeSeriesGenrator 更高,步幅更高,它的表现要好得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM