使用 LSTM 将单变量转换为多变量时间序列预测

Question

我是人工神经网络世界的新手，所以如果我犯了一些错误，请原谅并纠正我，如果可以的话。 我想使用 LSTM 模型来预测市场上比特币的价格。 我知道该模型的实际局限性，但我创建它是为了教育目的。

我不知道是将它定义为多层模型还是多变量模型（如果有人能解释我将不胜感激的差异）基本上是一个在收盘价上训练的模型，称为“收盘价”，可以通过以下方式预测第二天的收盘价观察前 60 天。

我从这里构建模型没有问题，我刚刚和你说过，问题是我想用其他信息来训练模型，比如交易量或当天的最高价格。 重要的是能够决定在模型中插入哪两种类型的信息。 我找到了一个网站，其中详细解释了 Keras 中 LSTM的多元时间序列预测，但我无法将其应用于我的特定案例。 你能帮我将“成交量”变量整合到模型中，看看未来“收盘”收盘价的预测能力是提高还是恶化？

数据属于这种类型，可以从 kaggle 在这里下载-->下载

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt


#Create a new dataframe with only the 'Close column
data = df.filter(['close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

#scale data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)

#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
  x_train.append(train_data[i-60:i, 0])
  y_train.append(train_data[i, 0])
  # if i<= 61:
    # print(x_train)
    # print(y_train)
    # print()

#Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

#Create the testing data set
#Create a new array containing scaled values 
test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
  x_test.append(test_data[i-60:i, 0])

#Convert the data to a numpy array
x_test = np.array(x_test)
#Reshape the data

x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))

# print (len(x_test))
# #Get the models predicted price values
predictions = model.predict(x_test)

predictions = scaler.inverse_transform(predictions)
print(predictions)

#Get the root mean squared error (RMSE)
rmse=np.sqrt(np.mean(((predictions- y_test)**2)))
print (rmse)

Answer 1

从代码和评论中，我了解到您正在对单变量数据执行时间序列预测（只有列是Close ），现在想要对多变量数据执行时间序列预测（使用列、关闭和量）。

代码的重要部分将是函数multivariate_data ，它根据 60 天的过去历史和 1 天的目标日期返回特征和标签。

past_history = 60
future_target = 1

Multi_Variate Data 的完整工作代码（直到训练）如下所示：

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt

df = pd.read_csv('datasets_101543_240726.csv')

#Create a new dataframe with only the 'Close column
data = df.filter(['close', 'Volume'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

# scale data
data_mean = dataset[:training_data_len].mean(axis=0)
data_std = dataset[:training_data_len].std(axis=0)
dataset = (dataset-data_mean)/data_std

def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    data.append(dataset[indices])

    labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)

past_history = 60
future_target = 1

x_train, y_train = multivariate_data(dataset, dataset[:, 0], 0,
                                                   training_data_len, past_history,
                                                   future_target)
x_val, y_val = multivariate_data(dataset, dataset[:, 0],
                                               training_data_len, None, past_history,
                                               future_target)


#Reshape the data
#x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
#print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

代码中的测试数据x_test和y_test可以替换为x_val和y_val ，您可以对这些数据进行predictions 。

有关完整代码，请参阅与多变量数据上的时间序列预测相关的Tensorflow 教程。

希望这可以帮助。 快乐学习！

使用 LSTM 将单变量转换为多变量时间序列预测

问题描述

1 个解决方案

解决方案1
0 2020-05-28 16:11:06

使用 LSTM 将单变量转换为多变量时间序列预测

问题描述

1 个解决方案

解决方案1 0 2020-05-28 16:11:06

解决方案1
0 2020-05-28 16:11:06