[英]Multivariate time series forecasting with LSTMs in Keras (on future data)
So I have been using Keras to predict a multivariate time series.所以我一直在使用 Keras 来预测多元时间序列。 The dataset is a pollution dataset.
该数据集是污染数据集。 The first column is what I want to predict and the remaining 7 are features.
第一列是我要预测的内容,其余 7 列是特征。 Dataset can be found here: https://github.com/sagarmk/Forecasting-on-Air-pollution-with-RNN-LSTM/blob/master/pollution.csv
数据集可以在这里找到: https://github.com/sagarmk/Forecasting-on-Air-pollution-with-RNN-LSTM/blob/master/pollution.csv
So what I want to do is to perform the following code on a test set without the "pollution" column.所以我想做的是在没有“污染”列的测试集上执行以下代码。 Let's say that there is new data for the features but not the pollution.
假设有新的特征数据,但没有污染。 So
所以
To make it simple the dataset could be initially split into a training and testing dataset in the beginning, where the "pollution" column is removed from he testing dataset?为了简单起见,最初可以将数据集拆分为训练和测试数据集,从测试数据集中删除“污染”列?
See below a simple code.请参阅下面的简单代码。 Here I simply import and process the dataset.
这里我只是简单地导入和处理数据集。
import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns
import plotly.express as px
# Import data
dataset = pd.read_csv("pollution.csv")
dataset = dataset.drop(['date'], axis = 1)
label, layer = pd.factorize(dataset['wnd_dir'])
dataset['wnd_dir'] = pd.DataFrame(label)
dataset=dataset.fillna(dataset.mean())
dataset.head()
After that I normalize the dataset.之后,我对数据集进行规范化。
from sklearn.preprocessing import MinMaxScaler
values = dataset.values
scaler = MinMaxScaler()
scaled = scaler.fit_transform(values)
scaled[0]
Then convert the normalized data into supervised form.然后将归一化的数据转换为监督形式。
def to_supervised(dataset,dropNa = True,lag = 1):
df = pd.DataFrame(dataset)
column = []
column.append(df)
for i in range(1,lag+1):
column.append(df.shift(-i))
df = pd.concat(column,axis=1)
df.dropna(inplace = True)
features = dataset.shape[1]
df = df.values
supervised_data = df[:,:features*lag]
supervised_data = np.column_stack( [supervised_data, df[:,features*lag]])
return supervised_data
timeSteps = 2
supervised = to_supervised(scaled,lag=timeSteps)
pd.DataFrame(supervised).head()
Now the dataset is split and transformed so that the LSTM network can handle it.现在数据集被分割和转换,以便 LSTM 网络可以处理它。
features = dataset.shape[1]
train_hours = round(dataset.shape[0]*0.7)
X = supervised[:,:features*timeSteps]
y = supervised[:,features*timeSteps]
x_train = X[:train_hours,:]
x_test = X[train_hours:,:]
y_train = y[:train_hours]
y_test = y[train_hours:]
print(x_train.shape,x_test.shape,y_train.shape,y_test.shape)
#convert data to fit for lstm
#dimensions = (sample, timeSteps here it is 1, features )
x_train = x_train.reshape(x_train.shape[0], timeSteps, features)
x_test = x_test.reshape(x_test.shape[0], timeSteps, features)
print(x_train.shape,x_test.shape)
Here the model is trained这里训练了 model
#define the model
from keras.models import Sequential
from keras.layers import Dense,LSTM
model = Sequential()
model.add( LSTM( 50, input_shape = ( timeSteps,x_train.shape[2]) ) )
model.add( Dense(1) )
model.compile( loss = "mae", optimizer = "adam")
history = model.fit( x_train,y_train, validation_data = (x_test,y_test), epochs = 50 , batch_size = 72, verbose = 0, shuffle = False)
plt.pyplot.plot(history.history['loss'], label='train')
plt.pyplot.plot(history.history['val_loss'], label='test')
plt.pyplot.legend()
#plt.pyplot.yticks([])
#plt.pyplot.xticks([])
plt.pyplot.title("loss during training")
plt.pyplot.show()
Lastly I plot the training data along with the test data.最后我 plot 训练数据和测试数据。
y_pred = model.predict(x_test)
x_test = x_test.reshape(x_test.shape[0],x_test.shape[2]*x_test.shape[1])
inv_new = np.concatenate( (y_pred, x_test[:,-7:] ) , axis =1)
inv_new = scaler.inverse_transform(inv_new)
final_pred = inv_new[:,0]
plt.pyplot.figure(figsize=(20,10))
plt.pyplot.plot(dataset['pollution'])
plt.pyplot.plot([None for i in dataset['pollution']] + [x for x in final_pred])
plt.pyplot.show()
Predicting results with your neural network should be as simple as the below line of code.使用您的神经网络预测结果应该像下面的代码行一样简单。 Passing new data that is in the same format as training data.
传递与训练数据格式相同的新数据。
prediction = model.predict(features)
Do you have any code that you can provide?你有任何可以提供的代码吗? What issue are you running into?
你遇到了什么问题?
Ie when the "test" dataset only consists of 8 feature columns and no column for the price?
即,当“测试”数据集仅包含 8 个特征列而没有价格列时?
It looks like you are asking a feature engeering question.看起来你在问一个特征工程问题。 You real dataset have nan value in different column which make predict failed, right?
你真正的数据集在不同的列中有 nan 值,这使得预测失败,对吧?
In this case, you can take commom solution:在这种情况下,您可以采取常见的解决方案:
fill nan value by the median/mean of correspoding column in trainset.用 trainset 中对应列的中值/平均值填充 nan 值。
For example, you can fill future price by the median/mean of recently 14 days(aggregation length) prices of each product.例如,您可以通过每种产品最近 14 天(聚合长度)价格的中位数/平均值来填充未来价格。
When making future prediction, there may be a lot of features only have history(without plan).在进行未来预测时,可能有很多特征只有历史(没有计划)。 In traditional machine learning, if you want to predict a target depend on all feature, you need predict those future of features first.
在传统的机器学习中,如果你想预测一个依赖于所有特征的目标,你需要先预测这些特征的未来。
But by LSTM, you can make prediction all in one, check time_series#multi-output_models但是通过 LSTM,您可以一并进行预测,检查time_series#multi-output_models
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.