簡體   English   中英

使用 LSTM 將單變量轉換為多變量時間序列預測

[英]Transform Univariate to Multivariate Time Series Forecasting with LSTM

我是人工神經網絡世界的新手,所以如果我犯了一些錯誤,請原諒並糾正我,如果可以的話。 我想使用 LSTM 模型來預測市場上比特幣的價格。 我知道該模型的實際局限性,但我創建它是為了教育目的。

我不知道是將它定義為多層模型還是多變量模型(如果有人能解釋我將不勝感激的差異)基本上是一個在收盤價上訓練的模型,稱為“收盤價”,可以通過以下方式預測第二天的收盤價觀察前 60 天。

我從這里構建模型沒有問題,我剛剛和你說過,問題是我想用其他信息來訓練模型,比如交易量或當天的最高價格。 重要的是能夠決定在模型中插入哪兩種類型的信息。 我找到了一個網站,其中詳細解釋了 Keras 中 LSTM多元時間序列預測,但我無法將其應用於我的特定案例。 你能幫我將“成交量”變量整合到模型中,看看未來“收盤”收盤價的預測能力是提高還是惡化?

數據屬於這種類型,可以從 kaggle 在這里下載-->下載在此處輸入圖片說明

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt


#Create a new dataframe with only the 'Close column
data = df.filter(['close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

#scale data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)

#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
  x_train.append(train_data[i-60:i, 0])
  y_train.append(train_data[i, 0])
  # if i<= 61:
    # print(x_train)
    # print(y_train)
    # print()

#Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

#Create the testing data set
#Create a new array containing scaled values 
test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
  x_test.append(test_data[i-60:i, 0])

#Convert the data to a numpy array
x_test = np.array(x_test)
#Reshape the data

x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))

# print (len(x_test))
# #Get the models predicted price values
predictions = model.predict(x_test)

predictions = scaler.inverse_transform(predictions)
print(predictions)

#Get the root mean squared error (RMSE)
rmse=np.sqrt(np.mean(((predictions- y_test)**2)))
print (rmse)

從代碼和評論中,我了解到您正在對單變量數據執行時間序列預測(只有列是Close ),現在想要對多變量數據執行時間序列預測(使用列、關閉)。

代碼的重要部分將是函數multivariate_data ,它根據 60 天的過去歷史和 1 天的目標日期返回特征和標簽。

past_history = 60
future_target = 1

Multi_Variate Data 的完整工作代碼(直到訓練)如下所示:

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt

df = pd.read_csv('datasets_101543_240726.csv')

#Create a new dataframe with only the 'Close column
data = df.filter(['close', 'Volume'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

# scale data
data_mean = dataset[:training_data_len].mean(axis=0)
data_std = dataset[:training_data_len].std(axis=0)
dataset = (dataset-data_mean)/data_std

def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    data.append(dataset[indices])

    labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)

past_history = 60
future_target = 1

x_train, y_train = multivariate_data(dataset, dataset[:, 0], 0,
                                                   training_data_len, past_history,
                                                   future_target)
x_val, y_val = multivariate_data(dataset, dataset[:, 0],
                                               training_data_len, None, past_history,
                                               future_target)


#Reshape the data
#x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
#print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

代碼中的測試數據x_testy_test可以替換為x_valy_val ,您可以對這些數據進行predictions

有關完整代碼,請參閱與多變量數據上的時間序列預測相關的Tensorflow 教程

希望這可以幫助。 快樂學習!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM