用于多元时间序列的Keras递归神经网络

Question

I have been reading about Keras RNN models (LSTMs and GRUs), and authors seem to largely focus on language data or univariate time series that use training instances composed of previous time steps. 我一直在阅读有关Keras RNN模型（LSTM和GRU）的信息，作者似乎主要集中在语言数据或使用由先前时间步组成的训练实例的单变量时间序列。 The data I have is a bit different. 我的数据有些不同。

I have 20 variables measured every year for 10 years for 100,000 persons as input data, and the 20 variables measured for year 11 as output data. 我有10个十年中每年测量的20个变量（作为输入数据），为100,000个人，第11年中有20个变量（作为输出数据）。 What I would like to do is predict the value of one of the variables (not the other 19) for the 11th year. 我想做的是预测第11年变量之一（而不是其他19个）的值。

I have my data structured as X.shape = [persons, years, variables] = [100000, 10, 20] and Y.shape = [persons, variable] = [100000, 1] . 我的数据结构为X.shape = [persons, years, variables] = [100000, 10, 20] Y.shape = [persons, variable] = [100000, 1] X.shape = [persons, years, variables] = [100000, 10, 20]和Y.shape = [persons, variable] = [100000, 1] 。 Below is my Python code for a LSTM model. 以下是我的LSTM模型的Python代码。

## LSTM model.

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(128, activation = 'tanh', 
     input_shape = (X.shape[1], X.shape[2])))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X, Y, epochs = 25, batch_size = 128)

I have four (related) questions, please: 我有四个（相关）问题，请：

Have I coded the Keras model correctly for the data structure I have? 我是否为我拥有的数据结构正确编码了Keras模型？ The performance I get from a fully-connected network (using flattened data) and from LSTM, GRU, and 1D CNN models are nearly identical, and I don't know if I have made an error in Keras or if a recurrent model is simply not helpful in this case. 我从完全连接的网络（使用平坦的数据）以及LSTM，GRU和1D CNN模型获得的性能几乎相同，并且我不知道我是否在Keras中犯了错误，或者是否是循环模型简单地在这种情况下没有帮助。
Should I have Y as a series with shape Y.shape = [persons, years] = [100000, 11] , rather than including the variable in X, which would then have shape X.shape = [persons, years, variables] = [100000, 10, 19] ? 我是否应该将Y作为形状为Y.shape = [persons, years] = [100000, 11]的序列，而不是将变量包括在X中，则变量将具有形状X.shape = [persons, years, variables] = [100000, 10, 19] ？ If so, how can I get the RNN to output the predicted sequence? 如果是这样，如何获取RNN以输出预测的序列？ When I use return_sequences = True , Keras returns an error. 当我使用return_sequences = True ，Keras返回错误。
Is this the best way to predict with the data I have? 这是预测我拥有的数据的最佳方法吗？ Are there better option choices available in the Keras RNN models, or even other models? Keras RNN模型甚至其他模型中是否有更好的选项选择？
How could I simulate data resembling the data structure I have so that a RNN model would outperform a fully-connected network? 如何模拟类似于我现有数据结构的数据，以使RNN模型优于完全连接的网络？

UPDATE: 更新：

I have tried a simulation, with what I hope is a very simple case where an RNN should be expected to outperform a FNN. 我已经尝试了一个模拟，我希望这是一个非常简单的情况，其中应该期望RNN优于FNN。

While the LSTM tends to outperform the FNN when both have less hidden layers (4), the performance becomes identical with more hidden layers (8+). 当LSTM的隐藏层较少时（4），其性能往往优于FNN，而隐藏层较多（8+）时，性能变得相同。 Can anyone think of a better simulation where a RNN would be expected to outperform a FNN with a similar data structure? 谁能想到一个更好的模拟，在这种模拟中，预期RNN会在数据结构相似的情况下胜过FNN？

from keras import models
from keras import layers

from keras.layers import Dense, LSTM

import numpy as np
import matplotlib.pyplot as plt

The code below simulates data for 10,000 instances, 10 time steps, and 2 variables. 下面的代码模拟10,000个实例，10个时间步和2个变量的数据。 If the second variable has a 0 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 3. If the second variable has a 1 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 9. 如果第二个变量在第一个时间步中具有0，则Y是最后一个时间步的第一个变量的值乘以3。如果第二个变量在第一个时间步中具有1，则Y为最后一个时间步的第一个变量的值乘以9。

My hope was that the RNN would keep the value of second variable at the very first time step in memory and use that to know which value (3 or 9) to multiply the the first variable for the very last time step. 我希望RNN将第二个变量的值保留在内存中的第一个时间步，并使用该值知道哪个值（3或9）将最后一个时间步的第一个变量相乘。

## Simulate data.

instances = 10000

sequences = 10

X = np.zeros((instances, sequences * 2))

X[:int(instances / 2), 1] = 1

for i in range(instances):

    for j in range(0, sequences * 2, 2):

        X[i, j] = np.random.random()

Y = np.zeros((instances, 1))

for i in range(len(Y)):

    if X[i, 1] == 0:

        Y[i] = X[i, -2] * 3

    if X[i, 1] == 1:

        Y[i] = X[i, -2] * 9

Below is code for a FNN: 以下是FNN的代码：

## Densely connected model.

# Define model.

network_dense = models.Sequential()
network_dense.add(layers.Dense(4, activation = 'relu', 
     input_shape = (X.shape[1],)))
network_dense.add(Dense(1, activation = None))

# Compile model.

network_dense.compile(optimizer = 'rmsprop', loss = 'mean_absolute_error')

# Fit model.

history_dense = network_dense.fit(X, Y, epochs = 100, batch_size = 256, verbose = False)

plt.scatter(Y[X[:, 1] == 0, :], network_dense.predict(X[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

plt.scatter(Y[X[:, 1] == 1, :], network_dense.predict(X[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

Below is code for a LSTM: 以下是LSTM的代码：

## Structure X data for LSTM.

X_lstm = X.reshape(X.shape[0], X.shape[1] // 2, 2)

X_lstm.shape

## LSTM model.

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(4, activation = 'relu', 
     input_shape = (X_lstm.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm, Y, epochs = 100, batch_size = 256, verbose = False)

plt.scatter(Y[X[:, 1] == 0, :], network_lstm.predict(X_lstm[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

plt.scatter(Y[X[:, 1] == 1, :], network_lstm.predict(X_lstm[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

Answer 1

Yes the code used is correct for what you are trying to do. 是的，使用的代码对您要执行的操作是正确的。 10 years is the time window used to predict the following year so that should be the number of inputs into your model for each of the 20 variables. 10年是用来预测下一年的时间窗口，因此应该是模型中20个变量中每个变量的输入数量。 The sample size of 100,000 observations is not relevant to the input shape of your model. 100,000个观测值的样本大小与模型的输入形状无关。
The way that you had originally shaped the dependent variable Y is correct. 您最初塑造因变量Y的方式是正确的。 You are predicting a window of 1 year for 1 variable and you have 100,000 observations. 您正在预测1年的1个变量的窗口，并且您有100,000个观测值。 The key word argument return_sequences=True will cause an error to be thrown because you only have a single LSTM layer. 关键字参数return_sequences=True将导致引发错误，因为您只有一个LSTM层。 Set this parameter to True if you are implementing multiple LSTM layers and the layer in question is followed by another LSTM layer. 如果要实现多个LSTM层，并且将所讨论的层后跟另一个LSTM层，则将此参数设置为True 。

I wish I could offer some guidance to 3 but without actually having your dataset I don't know if it's possible to answer this with any sort of certainty. 我希望可以为3提供一些指导，但实际上没有您的数据集，我不知道是否可以肯定地回答这个问题。

I will say that LSTM's were designed to address what is know as the the long term dependency problem present in regular RNN's. 我将说LSTM旨在解决常规RNN中存在的长期依赖问题。 What this problem boils down to is that as the gap between when the relevant information was observed to the point where that information would be useful grows, the standard RNN will have a harder time learning the relationship between them. 这个问题归结为，随着观察相关信息到该信息将变得有用之间的差距越来越大，标准RNN将很难学习它们之间的关系。 Think of predicting a stock price based on 3 days of activity vs an entire year. 考虑基于活动的3天相对于全年预测股票价格。

This leads into number 4. If I use the term 'resembling' loosely and stretch your time window further out to say 50 years as opposed to 10, the advantages gained from using an LSTM would become more apparent. 这导致了第4个问题。如果我宽松地使用“相似”一词并将您的时间范围扩展到50年而不是10年，那么使用LSTM所获得的优势将变得更加明显。 Although I'm sure that someone more experienced will be able to offer a better answer and I look forward to seeing it. 尽管我确信经验丰富的人将能够提供更好的答案，但我很期待看到它。

I found this page helpful for understanding LSTM's: 我发现此页面有助于理解LSTM：

https://colah.github.io/posts/2015-08-Understanding-LSTMs/ https://colah.github.io/posts/2015-08-Understanding-LSTMs/

用于多元时间序列的Keras递归神经网络

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-08-22 23:15:43

用于多元时间序列的Keras递归神经网络

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-08-22 23:15:43

解决方案1
1 已采纳 2018-08-22 23:15:43