在 Keras 中使用 LSTM 进行多元时间序列预测（关于未来数据）

Question

So I have been using Keras to predict a multivariate time series.所以我一直在使用 Keras 来预测多元时间序列。 The dataset is a pollution dataset.该数据集是污染数据集。 The first column is what I want to predict and the remaining 7 are features.第一列是我要预测的内容，其余 7 列是特征。 Dataset can be found here: https://github.com/sagarmk/Forecasting-on-Air-pollution-with-RNN-LSTM/blob/master/pollution.csv数据集可以在这里找到： https://github.com/sagarmk/Forecasting-on-Air-pollution-with-RNN-LSTM/blob/master/pollution.csv

So what I want to do is to perform the following code on a test set without the "pollution" column.所以我想做的是在没有“污染”列的测试集上执行以下代码。 Let's say that there is new data for the features but not the pollution.假设有新的特征数据，但没有污染。 So所以

How do I train the model without test data?如何在没有测试数据的情况下训练 model？ (model.fit()) (model.fit())
How do I predict new pollution data without future data on pollution?如果没有未来的污染数据，我如何预测新的污染数据？ (model.predict()) （模型。预测（））

To make it simple the dataset could be initially split into a training and testing dataset in the beginning, where the "pollution" column is removed from he testing dataset?为了简单起见，最初可以将数据集拆分为训练和测试数据集，从测试数据集中删除“污染”列？

See below a simple code.请参阅下面的简单代码。 Here I simply import and process the dataset.这里我只是简单地导入和处理数据集。

import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns
import plotly.express as px

# Import data
dataset = pd.read_csv("pollution.csv")
dataset = dataset.drop(['date'], axis = 1)
label, layer = pd.factorize(dataset['wnd_dir'])
dataset['wnd_dir'] = pd.DataFrame(label)
dataset=dataset.fillna(dataset.mean())
dataset.head()

After that I normalize the dataset.之后，我对数据集进行规范化。

from sklearn.preprocessing import MinMaxScaler

values = dataset.values
scaler = MinMaxScaler()
scaled = scaler.fit_transform(values)
scaled[0]

Then convert the normalized data into supervised form.然后将归一化的数据转换为监督形式。

def to_supervised(dataset,dropNa = True,lag = 1):
    df = pd.DataFrame(dataset)
    column = []
    column.append(df)
    for i in range(1,lag+1):
        column.append(df.shift(-i))
    df = pd.concat(column,axis=1)
    df.dropna(inplace = True)
    features = dataset.shape[1]
    df = df.values
    supervised_data = df[:,:features*lag]
    supervised_data = np.column_stack( [supervised_data, df[:,features*lag]])
    return supervised_data

timeSteps = 2

supervised = to_supervised(scaled,lag=timeSteps)
pd.DataFrame(supervised).head()

Now the dataset is split and transformed so that the LSTM network can handle it.现在数据集被分割和转换，以便 LSTM 网络可以处理它。

features = dataset.shape[1]
train_hours = round(dataset.shape[0]*0.7)
X = supervised[:,:features*timeSteps]
y = supervised[:,features*timeSteps]

x_train = X[:train_hours,:]
x_test = X[train_hours:,:]
y_train = y[:train_hours]
y_test = y[train_hours:]

print(x_train.shape,x_test.shape,y_train.shape,y_test.shape)
#convert data to fit for lstm
#dimensions = (sample, timeSteps here it is 1, features )

x_train = x_train.reshape(x_train.shape[0], timeSteps, features)
x_test = x_test.reshape(x_test.shape[0], timeSteps, features)

print(x_train.shape,x_test.shape)

Here the model is trained这里训练了 model

#define the model
from keras.models import Sequential
from keras.layers import Dense,LSTM

model = Sequential()
model.add( LSTM( 50, input_shape = ( timeSteps,x_train.shape[2]) ) )
model.add( Dense(1) )

model.compile( loss = "mae", optimizer = "adam")

history =  model.fit( x_train,y_train, validation_data = (x_test,y_test), epochs = 50 , batch_size = 72, verbose = 0, shuffle = False)
plt.pyplot.plot(history.history['loss'], label='train')
plt.pyplot.plot(history.history['val_loss'], label='test')
plt.pyplot.legend()
#plt.pyplot.yticks([])
#plt.pyplot.xticks([])
plt.pyplot.title("loss during training")
plt.pyplot.show()

Lastly I plot the training data along with the test data.最后我 plot 训练数据和测试数据。

y_pred = model.predict(x_test)
x_test = x_test.reshape(x_test.shape[0],x_test.shape[2]*x_test.shape[1])

inv_new = np.concatenate( (y_pred, x_test[:,-7:] ) , axis =1)
inv_new = scaler.inverse_transform(inv_new)
final_pred = inv_new[:,0]

plt.pyplot.figure(figsize=(20,10))
plt.pyplot.plot(dataset['pollution'])
plt.pyplot.plot([None for i in dataset['pollution']] + [x for x in final_pred])
plt.pyplot.show()

Answer 1

Predicting results with your neural network should be as simple as the below line of code.使用您的神经网络预测结果应该像下面的代码行一样简单。 Passing new data that is in the same format as training data.传递与训练数据格式相同的新数据。

prediction = model.predict(features)

Do you have any code that you can provide?你有任何可以提供的代码吗？ What issue are you running into?你遇到了什么问题？

Answer 2

Ie when the "test" dataset only consists of 8 feature columns and no column for the price?即，当“测试”数据集仅包含 8 个特征列而没有价格列时？

It looks like you are asking a feature engeering question.看起来你在问一个特征工程问题。 You real dataset have nan value in different column which make predict failed, right?你真正的数据集在不同的列中有 nan 值，这使得预测失败，对吧？

In this case, you can take commom solution:在这种情况下，您可以采取常见的解决方案：

fill nan value by the median/mean of correspoding column in trainset.用 trainset 中对应列的中值/平均值填充 nan 值。

For example, you can fill future price by the median/mean of recently 14 days(aggregation length) prices of each product.例如，您可以通过每种产品最近 14 天（聚合长度）价格的中位数/平均值来填充未来价格。

UPDATE更新

When making future prediction, there may be a lot of features only have history(without plan).在进行未来预测时，可能有很多特征只有历史（没有计划）。 In traditional machine learning, if you want to predict a target depend on all feature, you need predict those future of features first.在传统的机器学习中，如果你想预测一个依赖于所有特征的目标，你需要先预测这些特征的未来。

But by LSTM, you can make prediction all in one, check time_series#multi-output_models但是通过 LSTM，您可以一并进行预测，检查time_series#multi-output_models

在 Keras 中使用 LSTM 进行多元时间序列预测（关于未来数据）

问题描述

2 个解决方案

解决方案1
0 2021-03-01 21:38:19

解决方案2
0 2021-03-02 07:25:00

UPDATE更新

在 Keras 中使用 LSTM 进行多元时间序列预测（关于未来数据）

问题描述

2 个解决方案

解决方案1 0 2021-03-01 21:38:19

解决方案2 0 2021-03-02 07:25:00

UPDATE更新

解决方案1
0 2021-03-01 21:38:19

解决方案2
0 2021-03-02 07:25:00