简体   繁体   English

keras lstm 输入形状不正确

[英]keras lstm incorrect input_shape

I am trying to use a lstm model to predict the weather (mainly to learn about lstm's and using python).我正在尝试使用 lstm model 来预测天气(主要是为了了解 lstm 和使用 python)。

I have a dataset of 500,000 rows each of which represents a date and there are 8 columns which are my features.我有一个包含 500,000 行的数据集,每行代表一个日期,并且有 8 列是我的特征。

Below is my model.下面是我的 model。

 model = Sequential()      
 model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))   
 model.add(Dropout(0.2))

 model.add(LSTM(100, return_sequences=True))
 model.add(Dropout(0.2))

 model.add(LSTM(50, return_sequences=False))
 model.add(Dropout(0.2))

 model.add(Dense(1))
 model.add(Activation('linear'))

 model.fit(
        X,
        y,
        batch_size=512,
        epochs=100,
        validation_split=0.05)

For the input parameters as I understand it the first parameter is the time step so here I am saying that I think the last 30 observations should be used to predict the next value.对于我理解的输入参数,第一个参数是时间步长,所以在这里我说我认为应该使用最后 30 个观察值来预测下一个值。 The 8 as I understand are the features so, air pressure, temperature etc.据我了解,这 8 个是功能,气压,温度等。

So my X matrix I convert into a 3D matrix with the line below so X is now 500000, 8, 1 matrix.所以我的 X 矩阵我用下面的线转换成 3D 矩阵,所以 X 现在是 500000, 8, 1 矩阵。

X = np.reshape(X, (X.shape[0], X.shape[1], 1))

When I run the model though I get the error below.当我运行 model 时,虽然我收到以下错误。

ValueError: Error when checking input: expected lstm_3_input to have shape (30, 8) but got array with shape (8, 1) ValueError:检查输入时出错:预期lstm_3_input的形状为 (30, 8) 但得到的数组的形状为 (8, 1)

What am I doing wrong?我究竟做错了什么?

Your issue is with data preparation .您的问题与数据准备有关。 Find details on data preparation for LSTMs here .此处查找有关 LSTM 数据准备的详细信息。

LSTMs map a sequence of past observations as input to an output observation. LSTMs map 过去的观察序列作为 output 观察的输入。 As such, the sequence of observations must be transformed into multiple samples Consider a given univariate sequence:因此,观察序列必须转换为多个样本考虑给定的单变量序列:

[10, 20, 30, 40, 50, 60, 70, 80, 90]

We can divide the sequence into multiple input/output patterns called samples, where three n_steps time steps are used as input and one time step is used as label for the one-step prediction that is being learned.我们可以将序列划分为多个称为样本的输入/输出模式,其中三个n_steps时间步用作输入,一个时间步用作 label 用于正在学习的一步预测。

X,              y
10, 20, 30      40
20, 30, 40      50
30, 40, 50      60
# ...

So what you want to do is implemented in the split_sequence() function below:所以你想要做的是在下面的split_sequence() function 中实现:

# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

Getting back to our initial example the following happens:回到我们最初的例子,会发生以下情况:

# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# summarize the data
for i in range(len(X)):
    print(X[i], y[i])

# [10 20 30] 40
# [20 30 40] 50
# [30 40 50] 60
# [40 50 60] 70
# [50 60 70] 80
# [60 70 80] 90

Take away: Now your shapes should be what your LSTM model expects them to be, and you should be able to adjust your data shape to your needs.带走:现在您的形状应该是您的 LSTM model 所期望的,并且您应该能够根据需要调整数据形状。 Obviously the same works for multiple input feature rows.显然,这同样适用于多个输入特征行。

I think your input shape is off.我认为您的输入形状已关闭。 The NN does not understand that you want it to take slices of 30 points to predict 31st. NN 不明白您希望它采用 30 个点的切片来预测第 31 个点。 What you need to do is to slice your dataset into chunks of length 30 (which means each point is going to be copied 29 time) and train on that, which will have a shape of (499969, 30, 8), assuming that last point goes only into y .您需要做的是将数据集切成长度为 30 的块(这意味着每个点将被复制 29 次)并在其上进行训练,假设最后一个形状为 (499969, 30, 8)点只进入y Also do not add a dummy dimension at the end, it is needed in conv layers for RGB channels.也不要在最后添加一个虚拟维度,它在 RGB 通道的转换层中是必需的。

I think you might need just a simple explanation of how layers work.我认为您可能只需要简单解释一下图层的工作原理。 In particular, note that all Keras layers behave something like this:特别要注意,所有 Keras 层的行为如下:

NAME(output_dim, input_shape = (...,input_dim))

For example, suppose I have 15000, 3 long vectors and I would like to change them to 5 long vectors.例如,假设我有 15000、3 个长向量,我想将它们更改为 5 个长向量。 Then something like this would do that:然后像这样的事情会这样做:

import numpy as np, tensorflow as tf

X = np.random.random((15000,3))
Y = np.random.random((15000,5))

M = tf.keras.models.Sequential()
M.add(tf.keras.layers.Dense(5,input_shape=(3,)))

M.compile('sgd','mse')
M.fit(X,Y) # Take note that I provided complete working code here. Good practice. 
           # I even include the imports and random data to check that it works. 

Likewise, if my input looks something like (1000,10,5) and I run it through an LSTM like LSTM(7);同样,如果我的输入看起来像 (1000,10,5) 并且我通过像 LSTM(7) 这样的 LSTM 运行它; then I should know (automatically) that I will get something like (...,7) as my output.那么我应该(自动)知道我会得到类似 (...,7) 作为我的 output 的东西。 Those 5 long vectors will get changed to 7 long vectors.这 5 个长向量将更改为 7 个长向量。 Rule to understand.规则来理解。 The last dimension is always the vector you are changing and the first parameter of the layer is always the dimension to change it to.最后一个维度始终是您要更改的向量,图层的第一个参数始终是要更改为的维度。

Now the second thing to learn about LSTMs.现在要了解 LSTM 的第二件事。 They use a time axis (which is not the last axis, because as we just went over, that is always the "changing dimension axis") which is removed if return_sequences=False and kept if return_sequences=True.他们使用时间轴(这不是最后一个轴,因为正如我们刚刚讨论过的那样,它始终是“变化维度轴”),如果 return_sequences=False 则删除该轴,如果 return_sequences=True 则保留该轴。 Some examples:一些例子:

LSTM(7) # (10000,100,5) -> (10000,7)
# Here the LSTM will loop through the 100, 5 long vectors (like a time series with memory),
# producing 7 long vectors. Only the last 7 long vector is kept. 
LSTM(7,return_sequences=True) # (10000,100,5) -> (10000,100,7)
# Same thing as the layer above, except we keep all the intermediate steps. 

You provide a layer that looks like this:您提供一个如下所示的层:

LSTM(50,input_shape=(30,8),return_sequences=True) # (10000,30,8) -> (10000,30,50)

Notice the 30 is the TIME dimension used in your LSTM model.请注意,30 是 LSTM model 中使用的 TIME 维度。 The 8 and the 50 are the INPUT_DIM and OUTPUT_DIM, and have nothing to do with the time axis. 8 和 50 是 INPUT_DIM 和 OUTPUT_DIM,与时间轴无关。 Another common misunderstanding, notice that the LSTM expects you to provide each SAMPLE with it's own COMPLETE PAST and TIME AXIS.另一个常见的误解,请注意 LSTM 希望您为每个 SAMPLE 提供它自己的 COMPLETE PAST 和 TIME AXIS。 That is, an LSTM does not use previous sample points for the next sample point;也就是说,一个 LSTM 不使用先前的样本点作为下一个样本点; each sample is independent and comes with it's own complete past data.每个样本都是独立的,并带有自己完整的过去数据。

So let's take a look at your model.那么让我们来看看你的model。 Step one.第一步。 What is your model doing and what kind of data is it expecting?您的 model 在做什么,它期待什么样的数据?

from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential

model = Sequential()      
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))   
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')

print(model.input_shape)
model.summary() # Lets see what your model is doing. 

So, now I clearly see your model does: (10000,30,8) -> (10000,30,50) -> (10000,30,100) -> (10000,50) -> (10000,1)所以,现在我清楚地看到您的 model 确实: (10000,30,8) -> (10000,30,50) -> (10000,30,100) -> (10000,50) -> (10000,1)

Did you expect that?你期待吗? Did you see that those would be the dimensions of the intermediate steps?您是否看到那些将是中间步骤的尺寸? Now that I know what input and output your model is expecting, I can easily verify that your model trains and works on that kind of data.既然我知道了您的 model 期望的输入和 output ,我可以轻松验证您的 model 是否可以训练和处理此类数据。

from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential
import numpy as np

X = np.random.random((10000,30,8))
Y = np.random.random((10000,1))

model = Sequential()      
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))   
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')

model.fit(X,Y)

Did you notice that your model was expecting inputs like (...,30,8)?您是否注意到您的 model 期待像 (...,30,8) 这样的输入? Did you know your model was expecting output data that looked like (...,1)?您是否知道您的 model 期望 output 数据看起来像 (...,1)? Knowing what your model wants, also means you can now change your model to fit the data your interested in. If you want your data to run over your 8 parameters like a time axis, then your input dimension needs to reflect that.了解您的 model 想要什么,也意味着您现在可以更改 model 以适应您感兴趣的数据。如果您希望数据在时间轴等 8 个参数上运行,那么您的输入维度需要反映这一点。 Change the 30 to an 8 and change the 8 to a 1. If you do this, notice also that your first layer is expanding each 1 long vector (a single number) into a 50 long vector.将 30 更改为 8 并将 8 更改为 1。如果这样做,还要注意您的第一层将每个 1 长向量(单个数字)扩展为 50 长向量。 Does that sound like what you wanted the model to do?这听起来像您希望 model 做的事情吗? Maybe your LSTM should be an LSTM(2) or LSTM(5) instead of 50...etc.也许您的 LSTM 应该是 LSTM(2) 或 LSTM(5) 而不是 50...等。 You could spend the next 1000 hours trying to find the right parameters that work with the data you are using.您可能会花费接下来的 1000 个小时来尝试找到适用于您正在使用的数据的正确参数。

Maybe you don't want to go over your FEATURE space as a TIME SPACE, maybe try repeating your data into batches of size 10, where each sample has it's own history, dimensions say (10000,10,8).也许您不想在您的 FEATURE 空间上将 go 作为时间空间,也许尝试将您的数据重复成大小为 10 的批次,其中每个样本都有自己的历史记录,尺寸为 (10000,10,8)。 Then a LSTM(50) would use your 8 long feature space and change it into a 50 long feature space while going over the TIME AXIS of 10. Maybe you just want to keep the last one with return_sequences=False.然后,LSTM(50) 将使用您的 8 个长特征空间并将其更改为 50 个长特征空间,同时超过 10 的 TIME AXIS。也许您只想保留最后一个,且 return_sequences=False。

Let me copy a function I used for preparing my data for LSTM:让我复制一个用于为 LSTM 准备数据的 function:

from itertools import islice

def slice_data_for_lstm(data, lookback):
    return np.array(list(zip(*[islice(np.array(data), i, None, 1) for i in range(lookback)])))

X_sliced = slice_data_for_lstm(X, 30)

lookback should be 30 in your case and will create 30 stacks of your (8, 1) features.在您的情况下,lookback 应该是 30 并且将创建 30 堆栈 (8, 1) 功能。 The resulting data is in shape (N, 30, 8, 1).结果数据的形状为 (N, 30, 8, 1)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM