使用Keras使用LSTM进行预测

Question

我根据过去的值基于X预测Y。 我们格式化的CSV数据集包含三列（time_stamp，X和Y-其中Y是实际值），其样本格式为

time,X,Y
0.000561,0,10
0.000584,0,10
0.040411,5,10
0.040437,10,10
0.041638,12,10
0.041668,14,10
0.041895,15,10
0.041906,19,10
 ... ... ...

在训练预测模型之前，这是X和Y的图分别如下所示的样子。

这是我使用Keras在Python中使用LSTM递归神经网络解决问题的方式。

import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error


np.random.seed(7)

# Load data
df = pd.read_csv('test32_C_data.csv')
n_features = 100


def create_sequences(data, window=15, step=1, prediction_distance=15):
    x = []
    y = []

    for i in range(0, len(data) - window - prediction_distance, step):
        x.append(data[i:i + window])
        y.append(data[i + window + prediction_distance][1])

    x, y = np.asarray(x), np.asarray(y)

    return x, y


# Scaling prior to splitting
scaler = MinMaxScaler(feature_range=(0.01, 0.99))
scaled_data = scaler.fit_transform(df.loc[:, ["X", "Y"]].values)

# Build sequences
x_sequence, y_sequence = create_sequences(scaled_data)

# Create test/train split
test_len = int(len(x_sequence) * 0.90)
valid_len = int(len(x_sequence) * 0.90)
train_end = len(x_sequence) - (test_len + valid_len)
x_train, y_train = x_sequence[:train_end], y_sequence[:train_end]
x_valid, y_valid = x_sequence[train_end:train_end + valid_len], y_sequence[train_end:train_end + valid_len]
x_test, y_test = x_sequence[train_end + valid_len:], y_sequence[train_end + valid_len:]

# Initialising the RNN
model = Sequential()

# Adding the input layerand the LSTM layer
model.add(LSTM(15, input_shape=(15, 2)))

# Adding the output layer
model.add(Dense(1))

# Compiling the RNN
model.compile(loss='mse', optimizer='rmsprop')

# Fitting the RNN to the Training set
model.fit(x_train, y_train, epochs=5)

# Getting the predicted values
y_pred = model.predict(x_test)
#y_pred = scaler.inverse_transform(y_pred)

plot_colors = ['#332288', '#3cb44b']

# Plot the results
pd.DataFrame({"Actual": y_test, "Predicted": np.squeeze(y_pred)}).plot(color=plot_colors)
plt.xlabel('Time [Index]')
plt.ylabel('Values')

最后，当我运行代码时-神经模型似乎很好地捕获了信号模式，如下所示。

但是，我在此输出中遇到的一个问题是Y的范围。如前两幅图所示，该范围应为0-400，如上所示，并解决了我尝试使用定标器对y_pred = scaler.inverse_transform(y_pred)进行inverse_transform的y_pred = scaler.inverse_transform(y_pred)但会引发错误： ValueError: non-broadcastable output operand with shape (7625,1) doesn't match the broadcast shape (7625,2) 。 我们如何解决这种broadcast shape错误？

Answer 1

基本上，定标器记得它被喂了2个功能（/列）。 因此，期望有2个特征可以反转转换。

这里有两个选择。

1）您制作了两个不同的缩放器： scaler_x和scaler_y像这样：

# Scaling prior to splitting
scaler_x = MinMaxScaler(feature_range=(0.01, 0.99))
scaler_y = MinMaxScaler(feature_range=(0.01, 0.99))

scaled_x = scaler_x.fit_transform(df.loc[:, "X"].reshape([-1, 1]))
scaled_y = scaler_y.fit_transform(df.loc[:, "Y"].reshape([-1, 1]))

scaled_data = np.column_stack((scaled_x, scaled_y))

然后，您将可以执行以下操作：

y_pred = scaler_y.inverse_transform(y_pred)

2）您像这样伪造输出中的X列：

y_pred_reshaped = np.zeros((len(y_pred), 2))
y_pred_reshaped[:,1] = y_pred
y_pred = scaler.inverse_transform(y_pred_reshaped)[:,1]

有帮助吗？

编辑

这是所需的完整代码

import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error


np.random.seed(7)

# Load data
#df = pd.read_csv('test32_C_data.csv')
df = pd.DataFrame(np.random.randint(0,100, size=(100,3)), columns = ['time', 'X', 'Y'])
n_features = 100


def create_sequences(data, window=15, step=1, prediction_distance=15):
    x = []
    y = []

    for i in range(0, len(data) - window - prediction_distance, step):
        x.append(data[i:i + window])
        y.append(data[i + window + prediction_distance][1])

    x, y = np.asarray(x), np.asarray(y)

    return x, y


# Scaling prior to splitting
scaler_x = MinMaxScaler(feature_range=(0.01, 0.99))
scaler_y = MinMaxScaler(feature_range=(0.01, 0.99))

scaled_x = scaler_x.fit_transform(df.loc[:, "X"].reshape([-1,1]))
scaled_y = scaler_y.fit_transform(df.loc[:, "Y"].reshape([-1,1]))

scaled_data = np.column_stack((scaled_x, scaled_y))

# Build sequences
x_sequence, y_sequence = create_sequences(scaled_data)

test_len = int(len(x_sequence) * 0.90)
valid_len = int(len(x_sequence) * 0.90)
train_end = len(x_sequence) - (test_len + valid_len)
x_train, y_train = x_sequence[:train_end], y_sequence[:train_end]
x_valid, y_valid = x_sequence[train_end:train_end + valid_len], y_sequence[train_end:train_end + valid_len]
x_test, y_test = x_sequence[train_end + valid_len:], y_sequence[train_end + valid_len:]

# Initialising the RNN
model = Sequential()

# Adding the input layerand the LSTM layer
model.add(LSTM(15, input_shape=(15, 2)))

# Adding the output layer
model.add(Dense(1))

# Compiling the RNN
model.compile(loss='mse', optimizer='rmsprop')

# Fitting the RNN to the Training set
model.fit(x_train, y_train, epochs=5)

# Getting the predicted values
y_pred = model.predict(x_test)
y_pred = scaler_y.inverse_transform(y_pred)

使用Keras使用LSTM进行预测

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-04-19 15:14:26

编辑

使用Keras使用LSTM进行预测

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-04-19 15:14:26

编辑

解决方案1
2 已采纳 2018-04-19 15:14:26