简体   繁体   English

使用 LSTM 将单变量转换为多变量时间序列预测

[英]Transform Univariate to Multivariate Time Series Forecasting with LSTM

I am new to the world of artificial neural networks so if I make some mistakes, excuse me and correct me if you can.我是人工神经网络世界的新手,所以如果我犯了一些错误,请原谅并纠正我,如果可以的话。 I would like to use an LSTM model to be able to predict the price of bitcoin in the market.我想使用 LSTM 模型来预测市场上比特币的价格。 I know the practical limitations of the model but I am creating it for educational purposes.我知道该模型的实际局限性,但我创建它是为了教育目的。

I don't know whether to define it a multilayer or multivariate model (if someone could explain the difference I would be grateful for) basically a model that trained on the closing price called 'close' can predict the closing price of the next day by observing the previous 60 days.我不知道是将它定义为多层模型还是多变量模型(如果有人能解释我将不胜感激的差异)基本上是一个在收盘价上训练的模型,称为“收盘价”,可以通过以下方式预测第二天的收盘价观察前 60 天。

I had no problems building the model from here I just spoke to you, the problem is that I would like to train the model with other information such as the volume or the maximum price of the day.我从这里构建模型没有问题,我刚刚和你说过,问题是我想用其他信息来训练模型,比如交易量或当天的最高价格。 The important thing is to be able to decide which two types of information to insert in the model.重要的是能够决定在模型中插入哪两种类型的信息。 I found a site where the Multivariate Time Series Forecasting with LSTMs in Keras is explained in detail but I cannot apply it to my specific case.我找到了一个网站,其中详细解释了 Keras 中 LSTM多元时间序列预测,但我无法将其应用于我的特定案例。 Could you help me integrate the 'volume' variable into the model to see if the predictive power of the future 'close' closing price improves or worsens?你能帮我将“成交量”变量整合到模型中,看看未来“收盘”收盘价的预测能力是提高还是恶化?

The data are of this type and can be downloaded here from kaggle --> Download数据属于这种类型,可以从 kaggle 在这里下载-->下载在此处输入图片说明

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt


#Create a new dataframe with only the 'Close column
data = df.filter(['close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

#scale data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)

#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
  x_train.append(train_data[i-60:i, 0])
  y_train.append(train_data[i, 0])
  # if i<= 61:
    # print(x_train)
    # print(y_train)
    # print()

#Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

#Create the testing data set
#Create a new array containing scaled values 
test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
  x_test.append(test_data[i-60:i, 0])

#Convert the data to a numpy array
x_test = np.array(x_test)
#Reshape the data

x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))

# print (len(x_test))
# #Get the models predicted price values
predictions = model.predict(x_test)

predictions = scaler.inverse_transform(predictions)
print(predictions)

#Get the root mean squared error (RMSE)
rmse=np.sqrt(np.mean(((predictions- y_test)**2)))
print (rmse)

From the code and from the Comments, I understand that you are performing Time Series forecasting for Uni-Variate Data (with only column being Close ) and now, want to perform Time Series Forecasting for Multi-Variate Data (with the Columns, Close and Volume ).从代码和评论中,我了解到您正在对单变量数据执行时间序列预测(只有列是Close ),现在想要对多变量数据执行时间序列预测(使用列、关闭)。

Important part of code for you will be the function, multivariate_data , which returns the Features and Labels according to the past history of 60 days, and a Target Date of 1 Day.代码的重要部分将是函数multivariate_data ,它根据 60 天的过去历史和 1 天的目标日期返回特征和标签。

past_history = 60
future_target = 1

Complete working code (till Training) for Multi_Variate Data is shown below: Multi_Variate Data 的完整工作代码(直到训练)如下所示:

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt

df = pd.read_csv('datasets_101543_240726.csv')

#Create a new dataframe with only the 'Close column
data = df.filter(['close', 'Volume'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

# scale data
data_mean = dataset[:training_data_len].mean(axis=0)
data_std = dataset[:training_data_len].std(axis=0)
dataset = (dataset-data_mean)/data_std

def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    data.append(dataset[indices])

    labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)

past_history = 60
future_target = 1

x_train, y_train = multivariate_data(dataset, dataset[:, 0], 0,
                                                   training_data_len, past_history,
                                                   future_target)
x_val, y_val = multivariate_data(dataset, dataset[:, 0],
                                               training_data_len, None, past_history,
                                               future_target)


#Reshape the data
#x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
#print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

Test Data in your code, x_test and y_test can be replaced with x_val and y_val and you can perform predictions on that data.代码中的测试数据x_testy_test可以替换为x_valy_val ,您可以对这些数据进行predictions

Please refer Tensorflow Tutorial related to Time Series Forecasting on Multi Variate Data for complete code.有关完整代码,请参阅与多变量数据上的时间序列预测相关的Tensorflow 教程

Hope this helps.希望这可以帮助。 Happy Learning!快乐学习!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM