简体   繁体   English

在 Keras 中使用 LSTM 进行多元时间序列预测(关于未来数据)

[英]Multivariate time series forecasting with LSTMs in Keras (on future data)

So I have been using Keras to predict a multivariate time series.所以我一直在使用 Keras 来预测多元时间序列。 The dataset is a pollution dataset.该数据集是污染数据集。 The first column is what I want to predict and the remaining 7 are features.第一列是我要预测的内容,其余 7 列是特征。 Dataset can be found here: https://github.com/sagarmk/Forecasting-on-Air-pollution-with-RNN-LSTM/blob/master/pollution.csv数据集可以在这里找到: https://github.com/sagarmk/Forecasting-on-Air-pollution-with-RNN-LSTM/blob/master/pollution.csv

So what I want to do is to perform the following code on a test set without the "pollution" column.所以我想做的是在没有“污染”列的测试集上执行以下代码。 Let's say that there is new data for the features but not the pollution.假设有新的特征数据,但没有污染。 So所以

  1. How do I train the model without test data?如何在没有测试数据的情况下训练 model? (model.fit()) (model.fit())
  2. How do I predict new pollution data without future data on pollution?如果没有未来的污染数据,我如何预测新的污染数据? (model.predict()) (模型。预测())

To make it simple the dataset could be initially split into a training and testing dataset in the beginning, where the "pollution" column is removed from he testing dataset?为了简单起见,最初可以将数据集拆分为训练和测试数据集,从测试数据集中删除“污染”列?

See below a simple code.请参阅下面的简单代码。 Here I simply import and process the dataset.这里我只是简单地导入和处理数据集。

import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns
import plotly.express as px

# Import data
dataset = pd.read_csv("pollution.csv")
dataset = dataset.drop(['date'], axis = 1)
label, layer = pd.factorize(dataset['wnd_dir'])
dataset['wnd_dir'] = pd.DataFrame(label)
dataset=dataset.fillna(dataset.mean())
dataset.head()

After that I normalize the dataset.之后,我对数据集进行规范化。

from sklearn.preprocessing import MinMaxScaler

values = dataset.values
scaler = MinMaxScaler()
scaled = scaler.fit_transform(values)
scaled[0]

Then convert the normalized data into supervised form.然后将归一化的数据转换为监督形式。

def to_supervised(dataset,dropNa = True,lag = 1):
    df = pd.DataFrame(dataset)
    column = []
    column.append(df)
    for i in range(1,lag+1):
        column.append(df.shift(-i))
    df = pd.concat(column,axis=1)
    df.dropna(inplace = True)
    features = dataset.shape[1]
    df = df.values
    supervised_data = df[:,:features*lag]
    supervised_data = np.column_stack( [supervised_data, df[:,features*lag]])
    return supervised_data

timeSteps = 2

supervised = to_supervised(scaled,lag=timeSteps)
pd.DataFrame(supervised).head()

Now the dataset is split and transformed so that the LSTM network can handle it.现在数据集被分割和转换,以便 LSTM 网络可以处理它。

features = dataset.shape[1]
train_hours = round(dataset.shape[0]*0.7)
X = supervised[:,:features*timeSteps]
y = supervised[:,features*timeSteps]

x_train = X[:train_hours,:]
x_test = X[train_hours:,:]
y_train = y[:train_hours]
y_test = y[train_hours:]

print(x_train.shape,x_test.shape,y_train.shape,y_test.shape)
#convert data to fit for lstm
#dimensions = (sample, timeSteps here it is 1, features )

x_train = x_train.reshape(x_train.shape[0], timeSteps, features)
x_test = x_test.reshape(x_test.shape[0], timeSteps, features)

print(x_train.shape,x_test.shape)

Here the model is trained这里训练了 model

#define the model
from keras.models import Sequential
from keras.layers import Dense,LSTM

model = Sequential()
model.add( LSTM( 50, input_shape = ( timeSteps,x_train.shape[2]) ) )
model.add( Dense(1) )

model.compile( loss = "mae", optimizer = "adam")

history =  model.fit( x_train,y_train, validation_data = (x_test,y_test), epochs = 50 , batch_size = 72, verbose = 0, shuffle = False)
plt.pyplot.plot(history.history['loss'], label='train')
plt.pyplot.plot(history.history['val_loss'], label='test')
plt.pyplot.legend()
#plt.pyplot.yticks([])
#plt.pyplot.xticks([])
plt.pyplot.title("loss during training")
plt.pyplot.show()

Lastly I plot the training data along with the test data.最后我 plot 训练数据和测试数据。

y_pred = model.predict(x_test)
x_test = x_test.reshape(x_test.shape[0],x_test.shape[2]*x_test.shape[1])

inv_new = np.concatenate( (y_pred, x_test[:,-7:] ) , axis =1)
inv_new = scaler.inverse_transform(inv_new)
final_pred = inv_new[:,0]

plt.pyplot.figure(figsize=(20,10))
plt.pyplot.plot(dataset['pollution'])
plt.pyplot.plot([None for i in dataset['pollution']] + [x for x in final_pred])
plt.pyplot.show()

Predicting results with your neural network should be as simple as the below line of code.使用您的神经网络预测结果应该像下面的代码行一样简单。 Passing new data that is in the same format as training data.传递与训练数据格式相同的新数据。

prediction = model.predict(features)

Do you have any code that you can provide?你有任何可以提供的代码吗? What issue are you running into?你遇到了什么问题?

Ie when the "test" dataset only consists of 8 feature columns and no column for the price?即,当“测试”数据集仅包含 8 个特征列而没有价格列时?

It looks like you are asking a feature engeering question.看起来你在问一个特征工程问题。 You real dataset have nan value in different column which make predict failed, right?你真正的数据集在不同的列中有 nan 值,这使得预测失败,对吧?

In this case, you can take commom solution:在这种情况下,您可以采取常见的解决方案:

fill nan value by the median/mean of correspoding column in trainset.用 trainset 中对应列的中值/平均值填充 nan 值。

For example, you can fill future price by the median/mean of recently 14 days(aggregation length) prices of each product.例如,您可以通过每种产品最近 14 天(聚合长度)价格的中位数/平均值来填充未来价格。

UPDATE更新

When making future prediction, there may be a lot of features only have history(without plan).在进行未来预测时,可能有很多特征只有历史(没有计划)。 In traditional machine learning, if you want to predict a target depend on all feature, you need predict those future of features first.在传统的机器学习中,如果你想预测一个依赖于所有特征的目标,你需要先预测这些特征的未来。

But by LSTM, you can make prediction all in one, check time_series#multi-output_models但是通过 LSTM,您可以一并进行预测,检查time_series#multi-output_models

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Keras多元时间序列预测模型返回NaN作为MAE和损失 - Keras multivariate time series forecasting model returns NaN as MAE and loss 使用多元时间序列预测的需求预测 - Demand Forecasting using multivariate time Series forecasting 我可以对年度数据执行多元时间序列预测吗 - Can I perform multivariate time series forecasting on yearly data 如何重塑为 Keras 用于 XGBoost 的多步和多变量时间序列预测创建的 3D 张量? - How to reshape a 3D tensor created for multistep and multivariate time series forecasting for Keras to be used in XGBoost? 使用 LSTM 将单变量转换为多变量时间序列预测 - Transform Univariate to Multivariate Time Series Forecasting with LSTM 具有 3 个月数据集的多元时间序列预测 - Multivariate time series forecasting with 3 months dataset 使用 Keras 进行长期时间序列预测,预测依赖于未来输入 - Long-term time series forecasting using Keras with predictions relying on future inputs 在多元时间预测 LSTM 模型中预测未来值 - Predicting future values in a multivariate time forecasting LSTM model 使用 Keras 理解多元时间序列分类 - Understanding multivariate time series classification with Keras 用于多元时间序列的Keras递归神经网络 - Keras Recurrent Neural Networks For Multivariate Time Series
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM