使用 LSTM 將單變量轉換為多變量時間序列預測

Question

我是人工神經網絡世界的新手，所以如果我犯了一些錯誤，請原諒並糾正我，如果可以的話。 我想使用 LSTM 模型來預測市場上比特幣的價格。 我知道該模型的實際局限性，但我創建它是為了教育目的。

我不知道是將它定義為多層模型還是多變量模型（如果有人能解釋我將不勝感激的差異）基本上是一個在收盤價上訓練的模型，稱為“收盤價”，可以通過以下方式預測第二天的收盤價觀察前 60 天。

我從這里構建模型沒有問題，我剛剛和你說過，問題是我想用其他信息來訓練模型，比如交易量或當天的最高價格。 重要的是能夠決定在模型中插入哪兩種類型的信息。 我找到了一個網站，其中詳細解釋了 Keras 中 LSTM的多元時間序列預測，但我無法將其應用於我的特定案例。 你能幫我將“成交量”變量整合到模型中，看看未來“收盤”收盤價的預測能力是提高還是惡化？

數據屬於這種類型，可以從 kaggle 在這里下載-->下載

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt


#Create a new dataframe with only the 'Close column
data = df.filter(['close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

#scale data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(dataset)

#Create the scaled training data set
train_data = scaled_data[0:training_data_len , :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
  x_train.append(train_data[i-60:i, 0])
  y_train.append(train_data[i, 0])
  # if i<= 61:
    # print(x_train)
    # print(y_train)
    # print()

#Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

#Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

#Create the testing data set
#Create a new array containing scaled values 
test_data = scaled_data[training_data_len - 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
  x_test.append(test_data[i-60:i, 0])

#Convert the data to a numpy array
x_test = np.array(x_test)
#Reshape the data

x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))

# print (len(x_test))
# #Get the models predicted price values
predictions = model.predict(x_test)

predictions = scaler.inverse_transform(predictions)
print(predictions)

#Get the root mean squared error (RMSE)
rmse=np.sqrt(np.mean(((predictions- y_test)**2)))
print (rmse)

Answer 1

從代碼和評論中，我了解到您正在對單變量數據執行時間序列預測（只有列是Close ），現在想要對多變量數據執行時間序列預測（使用列、關閉和量）。

代碼的重要部分將是函數multivariate_data ，它根據 60 天的過去歷史和 1 天的目標日期返回特征和標簽。

past_history = 60
future_target = 1

Multi_Variate Data 的完整工作代碼（直到訓練）如下所示：

import pandas as pd
import math
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt

df = pd.read_csv('datasets_101543_240726.csv')

#Create a new dataframe with only the 'Close column
data = df.filter(['close', 'Volume'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) * .8 )

# scale data
data_mean = dataset[:training_data_len].mean(axis=0)
data_std = dataset[:training_data_len].std(axis=0)
dataset = (dataset-data_mean)/data_std

def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i)
    data.append(dataset[indices])

    labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)

past_history = 60
future_target = 1

x_train, y_train = multivariate_data(dataset, dataset[:, 0], 0,
                                                   training_data_len, past_history,
                                                   future_target)
x_val, y_val = multivariate_data(dataset, dataset[:, 0],
                                               training_data_len, None, past_history,
                                               future_target)


#Reshape the data
#x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
#print (x_train.shape)

#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))

#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)

代碼中的測試數據x_test和y_test可以替換為x_val和y_val ，您可以對這些數據進行predictions 。

有關完整代碼，請參閱與多變量數據上的時間序列預測相關的Tensorflow 教程。

希望這可以幫助。 快樂學習！

使用 LSTM 將單變量轉換為多變量時間序列預測

問題描述

1 個解決方案

解決方案1
0 2020-05-28 16:11:06

使用 LSTM 將單變量轉換為多變量時間序列預測

問題描述

1 個解決方案

解決方案1 0 2020-05-28 16:11:06

解決方案1
0 2020-05-28 16:11:06