繁体   English   中英

使用 LSTM 将单变量转换为多变量时间序列预测

[英]Transform Univariate to Multivariate Time Series Forecasting with LSTM

我是人工神经网络世界的新手,所以如果我犯了一些错误,请原谅并纠正我,如果可以的话。 我想使用 LSTM 模型来预测市场上比特币的价格。 我知道该模型的实际局限性,但我创建它是为了教育目的。

我不知道是将它定义为多层模型还是多变量模型(如果有人能解释我将不胜感激的差异)基本上是一个在收盘价上训练的模型,称为“收盘价”,可以通过以下方式预测第二天的收盘价观察前 60 天。

我从这里构建模型没有问题,我刚刚和你说过,问题是我想用其他信息来训练模型,比如交易量或当天的最高价格。 重要的是能够决定在模型中插入哪两种类型的信息。 我找到了一个网站,其中详细解释了 Keras 中 LSTM多元时间序列预测,但我无法将其应用于我的特定案例。 你能帮我将“成交量”变量整合到模型中,看看未来“收盘”收盘价的预测能力是提高还是恶化?

数据属于这种类型,可以从 kaggle 在这里下载-->下载在此处输入图片说明

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt


#Create a new dataframe with only the 'Close column
data = df.filter(['close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

#scale data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)

#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
  x_train.append(train_data[i-60:i, 0])
  y_train.append(train_data[i, 0])
  # if i<= 61:
    # print(x_train)
    # print(y_train)
    # print()

#Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

#Create the testing data set
#Create a new array containing scaled values 
test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
  x_test.append(test_data[i-60:i, 0])

#Convert the data to a numpy array
x_test = np.array(x_test)
#Reshape the data

x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))

# print (len(x_test))
# #Get the models predicted price values
predictions = model.predict(x_test)

predictions = scaler.inverse_transform(predictions)
print(predictions)

#Get the root mean squared error (RMSE)
rmse=np.sqrt(np.mean(((predictions- y_test)**2)))
print (rmse)

从代码和评论中,我了解到您正在对单变量数据执行时间序列预测(只有列是Close ),现在想要对多变量数据执行时间序列预测(使用列、关闭)。

代码的重要部分将是函数multivariate_data ,它根据 60 天的过去历史和 1 天的目标日期返回特征和标签。

past_history = 60
future_target = 1

Multi_Variate Data 的完整工作代码(直到训练)如下所示:

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt

df = pd.read_csv('datasets_101543_240726.csv')

#Create a new dataframe with only the 'Close column
data = df.filter(['close', 'Volume'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

# scale data
data_mean = dataset[:training_data_len].mean(axis=0)
data_std = dataset[:training_data_len].std(axis=0)
dataset = (dataset-data_mean)/data_std

def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    data.append(dataset[indices])

    labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)

past_history = 60
future_target = 1

x_train, y_train = multivariate_data(dataset, dataset[:, 0], 0,
                                                   training_data_len, past_history,
                                                   future_target)
x_val, y_val = multivariate_data(dataset, dataset[:, 0],
                                               training_data_len, None, past_history,
                                               future_target)


#Reshape the data
#x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
#print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

代码中的测试数据x_testy_test可以替换为x_valy_val ,您可以对这些数据进行predictions

有关完整代码,请参阅与多变量数据上的时间序列预测相关的Tensorflow 教程

希望这可以帮助。 快乐学习!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM