[英]Transform Univariate to Multivariate Time Series Forecasting with LSTM
我是人工神經網絡世界的新手,所以如果我犯了一些錯誤,請原諒並糾正我,如果可以的話。 我想使用 LSTM 模型來預測市場上比特幣的價格。 我知道該模型的實際局限性,但我創建它是為了教育目的。
我不知道是將它定義為多層模型還是多變量模型(如果有人能解釋我將不勝感激的差異)基本上是一個在收盤價上訓練的模型,稱為“收盤價”,可以通過以下方式預測第二天的收盤價觀察前 60 天。
我從這里構建模型沒有問題,我剛剛和你說過,問題是我想用其他信息來訓練模型,比如交易量或當天的最高價格。 重要的是能夠決定在模型中插入哪兩種類型的信息。 我找到了一個網站,其中詳細解釋了 Keras 中 LSTM的多元時間序列預測,但我無法將其應用於我的特定案例。 你能幫我將“成交量”變量整合到模型中,看看未來“收盤”收盤價的預測能力是提高還是惡化?
數據屬於這種類型,可以從 kaggle 在這里下載-->下載
import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
#Create a new dataframe with only the 'Close column
data = df.filter(['close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )
#scale data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)
#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
# if i<= 61:
# print(x_train)
# print(y_train)
# print()
#Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)
#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
print (x_train.shape)
#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))
#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)
#Create the testing data set
#Create a new array containing scaled values
test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, 0])
#Convert the data to a numpy array
x_test = np.array(x_test)
#Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))
# print (len(x_test))
# #Get the models predicted price values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
print(predictions)
#Get the root mean squared error (RMSE)
rmse=np.sqrt(np.mean(((predictions- y_test)**2)))
print (rmse)
從代碼和評論中,我了解到您正在對單變量數據執行時間序列預測(只有列是Close ),現在想要對多變量數據執行時間序列預測(使用列、關閉和量)。
代碼的重要部分將是函數multivariate_data
,它根據 60 天的過去歷史和 1 天的目標日期返回特征和標簽。
past_history = 60
future_target = 1
Multi_Variate Data 的完整工作代碼(直到訓練)如下所示:
import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
df = pd.read_csv('datasets_101543_240726.csv')
#Create a new dataframe with only the 'Close column
data = df.filter(['close', 'Volume'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )
# scale data
data_mean = dataset[:training_data_len].mean(axis=0)
data_std = dataset[:training_data_len].std(axis=0)
dataset = (dataset-data_mean)/data_std
def multivariate_data(dataset, target, start_index, end_index, history_size,
target_size):
data = []
labels = []
start_index = start_index + history_size
if end_index is None:
end_index = len(dataset) - target_size
for i in range(start_index, end_index):
indices = range(i-history_size, i)
data.append(dataset[indices])
labels.append(target[i:i+target_size])
return np.array(data), np.array(labels)
past_history = 60
future_target = 1
x_train, y_train = multivariate_data(dataset, dataset[:, 0], 0,
training_data_len, past_history,
future_target)
x_val, y_val = multivariate_data(dataset, dataset[:, 0],
training_data_len, None, past_history,
future_target)
#Reshape the data
#x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
#print (x_train.shape)
#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))
#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)
代碼中的測試數據x_test
和y_test
可以替換為x_val
和y_val
,您可以對這些數據進行predictions
。
有關完整代碼,請參閱與多變量數據上的時間序列預測相關的Tensorflow 教程。
希望這可以幫助。 快樂學習!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.