[英]Transform Univariate to Multivariate Time Series Forecasting with LSTM
I am new to the world of artificial neural networks so if I make some mistakes, excuse me and correct me if you can.我是人工神经网络世界的新手,所以如果我犯了一些错误,请原谅并纠正我,如果可以的话。 I would like to use an LSTM model to be able to predict the price of bitcoin in the market.
我想使用 LSTM 模型来预测市场上比特币的价格。 I know the practical limitations of the model but I am creating it for educational purposes.
我知道该模型的实际局限性,但我创建它是为了教育目的。
I don't know whether to define it a multilayer or multivariate model (if someone could explain the difference I would be grateful for) basically a model that trained on the closing price called 'close' can predict the closing price of the next day by observing the previous 60 days.我不知道是将它定义为多层模型还是多变量模型(如果有人能解释我将不胜感激的差异)基本上是一个在收盘价上训练的模型,称为“收盘价”,可以通过以下方式预测第二天的收盘价观察前 60 天。
I had no problems building the model from here I just spoke to you, the problem is that I would like to train the model with other information such as the volume or the maximum price of the day.我从这里构建模型没有问题,我刚刚和你说过,问题是我想用其他信息来训练模型,比如交易量或当天的最高价格。 The important thing is to be able to decide which two types of information to insert in the model.
重要的是能够决定在模型中插入哪两种类型的信息。 I found a site where the Multivariate Time Series Forecasting with LSTMs in Keras is explained in detail but I cannot apply it to my specific case.
我找到了一个网站,其中详细解释了 Keras 中 LSTM的多元时间序列预测,但我无法将其应用于我的特定案例。 Could you help me integrate the 'volume' variable into the model to see if the predictive power of the future 'close' closing price improves or worsens?
你能帮我将“成交量”变量整合到模型中,看看未来“收盘”收盘价的预测能力是提高还是恶化?
The data are of this type and can be downloaded here from kaggle --> Download数据属于这种类型,可以从 kaggle 在这里下载-->下载
import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
#Create a new dataframe with only the 'Close column
data = df.filter(['close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )
#scale data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)
#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
# if i<= 61:
# print(x_train)
# print(y_train)
# print()
#Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)
#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
print (x_train.shape)
#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))
#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)
#Create the testing data set
#Create a new array containing scaled values
test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, 0])
#Convert the data to a numpy array
x_test = np.array(x_test)
#Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))
# print (len(x_test))
# #Get the models predicted price values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
print(predictions)
#Get the root mean squared error (RMSE)
rmse=np.sqrt(np.mean(((predictions- y_test)**2)))
print (rmse)
From the code and from the Comments, I understand that you are performing Time Series forecasting for Uni-Variate Data (with only column being Close ) and now, want to perform Time Series Forecasting for Multi-Variate Data (with the Columns, Close and Volume ).从代码和评论中,我了解到您正在对单变量数据执行时间序列预测(只有列是Close ),现在想要对多变量数据执行时间序列预测(使用列、关闭和量)。
Important part of code for you will be the function, multivariate_data
, which returns the Features and Labels according to the past history of 60 days, and a Target Date of 1 Day.代码的重要部分将是函数
multivariate_data
,它根据 60 天的过去历史和 1 天的目标日期返回特征和标签。
past_history = 60
future_target = 1
Complete working code (till Training) for Multi_Variate Data is shown below: Multi_Variate Data 的完整工作代码(直到训练)如下所示:
import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
df = pd.read_csv('datasets_101543_240726.csv')
#Create a new dataframe with only the 'Close column
data = df.filter(['close', 'Volume'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )
# scale data
data_mean = dataset[:training_data_len].mean(axis=0)
data_std = dataset[:training_data_len].std(axis=0)
dataset = (dataset-data_mean)/data_std
def multivariate_data(dataset, target, start_index, end_index, history_size,
target_size):
data = []
labels = []
start_index = start_index + history_size
if end_index is None:
end_index = len(dataset) - target_size
for i in range(start_index, end_index):
indices = range(i-history_size, i)
data.append(dataset[indices])
labels.append(target[i:i+target_size])
return np.array(data), np.array(labels)
past_history = 60
future_target = 1
x_train, y_train = multivariate_data(dataset, dataset[:, 0], 0,
training_data_len, past_history,
future_target)
x_val, y_val = multivariate_data(dataset, dataset[:, 0],
training_data_len, None, past_history,
future_target)
#Reshape the data
#x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
#print (x_train.shape)
#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))
#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)
Test Data in your code, x_test
and y_test
can be replaced with x_val
and y_val
and you can perform predictions
on that data.代码中的测试数据
x_test
和y_test
可以替换为x_val
和y_val
,您可以对这些数据进行predictions
。
Please refer Tensorflow Tutorial related to Time Series Forecasting on Multi Variate Data for complete code.有关完整代码,请参阅与多变量数据上的时间序列预测相关的Tensorflow 教程。
Hope this helps.希望这可以帮助。 Happy Learning!
快乐学习!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.