简体   繁体   English

层顺序输入与层不兼容:LSTM 中的形状错误

[英]Input of layer sequential is incompatible with the layer: shapes error in LSTM

I'm new to neural.networks and I want to use them to compare with other machine learning methods.我是 neural.networks 的新手,我想用它们来与其他机器学习方法进行比较。 I have a multivariate time series data with a range of approximately two years.我有一个范围约为两年的多元时间序列数据。 I want to predict 'y' for the next few days based on the other variables using LSTM.我想使用 LSTM 根据其他变量预测接下来几天的“y”。 The final day of my data is 2020-07-31.我的数据的最后一天是 2020-07-31。

df.tail()

              y   holidays  day_of_month    day_of_week month   quarter
   Date                     
 2020-07-27 32500      0      27                 0        7        3
 2020-07-28 33280      0      28                 1        7        3
 2020-07-29 31110      0      29                 2        7        3
 2020-07-30 37720      0      30                 3        7        3
 2020-07-31 32240      0      31                 4        7        3

To train the LSTM model I also split the data into train and test data.为了训练 LSTM model,我还将数据拆分为训练数据和测试数据。

from sklearn.model_selection import train_test_split
split_date = '2020-07-27' #to predict the next 4 days
df_train = df.loc[df.index <= split_date].copy()
df_test = df.loc[df.index > split_date].copy()
X1=df_train[['day_of_month','day_of_week','month','quarter','holidays']]
y1=df_train['y']
X2=df_test[['day_of_month','day_of_week','month','quarter','holidays']]
y2=df_test['y']

X_train, y_train =X1, y1
X_test, y_test = X2,y2

Because I'm working with LSTM, some scaling is needed:因为我正在使用 LSTM,所以需要进行一些缩放:

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Now, onto the difficult part: the model.现在,进入困难的部分:model。

num_units=50
activation_function = 'sigmoid'
optimizer = 'adam'
loss_function = 'mean_squared_error'
batch_size = 10
num_epochs = 100

 # Initialize the RNN
regressor = Sequential()

 # Adding the input layer and the LSTM layer
regressor.add(LSTM(units = num_units, return_sequences=True ,activation = activation_function, 
input_shape=(X_train.shape[1], 1)))

 # Adding the output layer
regressor.add(Dense(units = 1))

 # Compiling the RNN
regressor.compile(optimizer = optimizer, loss = loss_function)

# Using the training set to train the model
regressor.fit(X_train_scaled, y_train, batch_size = batch_size, epochs = num_epochs)

However, I receive the following error:但是,我收到以下错误:

ValueError: Input 0 of layer sequential_11 is incompatible with the layer: expected ndim=3, found 
ndim=2. Full shape received: [None, 5]

I don't understand how we choose the parameters or the shape of the input.我不明白我们如何选择参数或输入的形状。 I've seen some videos and read some Github pages and everyone seems to run LSTM in a different way, which makes it even more difficult to implement.我看过一些视频并阅读了一些 Github 页,每个人似乎都以不同的方式运行 LSTM,这使得它更难实现。 The previous error is probably coming from the shape but other than that is everything else right?之前的错误可能来自形状,但除此之外其他一切都正确吗? And how can I fix this to work?我该如何解决这个问题? Thanks谢谢

EDIT: This similar question does not solve my problem.. I've tried the solution from there编辑: 这个类似的问题不能解决我的问题。我已经从那里尝试了解决方案

x_train = X_train_scaled.reshape(-1, 1, 5)
x_test  = X_test_scaled.reshape(-1, 1, 5)

(My X_test and y_test only have one column). (我的 X_test 和 y_test 只有一列)。 And the solution also doesn't seem to work.而且该解决方案似乎也不起作用。 I get this error now:我现在收到此错误:

ValueError: Input 0 is incompatible with layer sequential_22: expected shape= 
(None, None, 1), found shape=[None, 1, 5]

INPUT:输入:

The problem is that you model expect a 3D input of shape (batch, sequence, features) but your X_train is actually a slice of data frame, so a 2D array:问题是你 model 期望形状输入 3D (batch, sequence, features)但你的X_train实际上是数据帧的一部分,所以是一个二维数组:

X1=df_train[['day_of_month','day_of_week','month','quarter','holidays']]
X_train, y_train =X1, y1

I assume your columns are supposed to be you features, so what you would usually do is "stack slices" of your df so that you X_train look something like that:我假设你的专栏应该是你的特征,所以你通常会做的是你的 df 的“堆栈切片”,这样你的X_train看起来像这样:

Here is a dummy 2D data set of shape (15,5) :这是一个形状为(15,5)的虚拟二维数据集:

data = np.zeros((15,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

You can reshape it to add a batch dimension, for example (15,1,5) :您可以对其进行整形以添加批量维度,例如(15,1,5)

data = data[:,np.newaxis,:] 

array([[[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.]]])

Same data, but presented in a different way.相同的数据,但以不同的方式呈现。 Now in this example, batch = 15 and sequence = 1 , I don't know what is the sequence length in your case but it can be anything.现在在这个例子中, batch = 15sequence = 1 ,我不知道你的序列长度是多少,但它可以是任何东西。

MODEL: MODEL:

Now in your model, keras input_shape expect (batch, sequence, features) , when you pass this:现在在你的 model, keras input_shape expect (batch, sequence, features)中,当你传递这个:

input_shape=(X_train.shape[1], 1)

This is what you model sees: (None, Sequence = X_train.shape[1], num_features = 1) None is for the batch dimension.这就是您 model 看到的: (None, Sequence = X_train.shape[1], num_features = 1) None是针对批量维度的。 I don't think that's what your are trying to do so once you've reshaped you should also correct input_shape to match the new array.我不认为这就是你想要这样做的,一旦你重塑了你还应该更正input_shape以匹配新数组。

It is a multivariate regression problem you are solving using LSTM.这是您使用 LSTM 解决的多元回归问题。 Before jumping into the code lets actually see what it means在进入代码之前让我们实际看看它的意思

Problem statement:问题陈述:

  • You have 5 feature holidays, day_of_month, day_of_week,month,quarter per day for k days你有5特色holidays, day_of_month, day_of_week,month,quarter per day for k days
  • For any day n, given the features of say last 'm' days you want to predict the y of the n th day对于任何 n 天,给定最后 'm' 天的特征,你想预测第n天的y

Creating window dataset:创建 window 数据集:

  • We fist need to decide on the number of days we want to feed to our model. This is called the sequence length (lets fix it to 3 for this example).我们首先需要确定我们想要提供给 model 的天数。这称为序列长度(在本例中我们将其固定为 3)。
  • We have to split the days of sequence length to create the train and test dataset.我们必须拆分序列长度的天数来创建训练和测试数据集。 This is done by using a sliding window where the window size is the sequence lenght.这是通过使用滑动 window 来完成的,其中 window 大小是序列长度。
  • As you can see there are no predictions available by last p records where p is the sequence length.如您所见,最后p条记录没有可用的预测,其中p是序列长度。
  • We will do the window dataset creations using timeseries_dataset_from_array method.我们将使用timeseries_dataset_from_array方法创建 window 数据集。
  • For more advance stuff follow official tf docs .有关更多高级内容,请关注官方 tf 文档

LSTM Model长短期记忆网络 Model

So pictorial what we want to achieve is show below:因此,我们想要实现的图形如下所示:

在此处输入图像描述

For each LSTM cell unrolling, we pass in the 5 features of the day, and we unroll in m time where m is the sequence length.对于每个 LSTM 单元格展开,我们传入当天的 5 个特征,并在m时间内展开,其中m是序列长度。 We are predicting the y of the last day.我们正在预测最后一天的y

Code:代码:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

# Model
regressor =  models.Sequential()
regressor.add(layers.LSTM(5, return_sequences=True))
regressor.add(layers.Dense(1))
regressor.compile(optimizer='sgd', loss='mse')

# Dummy data
n = 10000
df = pd.DataFrame(
    {
      'y': np.arange(n),
      'holidays': np.random.randn(n),
      'day_of_month': np.random.randn(n),
      'day_of_week': np.random.randn(n),
      'month': np.random.randn(n),
      'quarter': np.random.randn(n),     
    }
)

# Train test split
train_df, test_df = train_test_split(df)
print (train_df.shape, test_df.shape)\

# Create y to be predicted 
# given last n days predict todays y

# train data
sequence_length = 3
y_pred = train_df['y'][sequence_length-1:].values
train_df = train_df[:-2]
train_df['y_pred'] = y_pred

# Validataion data
y_pred = test_df['y'][sequence_length-1:].values
test_df = test_df[:-2]
test_df['y_pred'] = y_pred

# Create window datagenerators

# Train data generator
train_X = train_df[['holidays','day_of_month','day_of_week','month','month']]
train_y = train_df['y_pred']
train_dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
    train_X, train_y, sequence_length=sequence_length, shuffle=True, batch_size=4)

# Validation data generator
test_X = test_df[['holidays','day_of_month','day_of_week','month','month']]
test_y = test_df['y_pred']
test_dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
    test_X, test_y, sequence_length=sequence_length, shuffle=True, batch_size=4)

# Finally fit the model
regressor.fit(train_dataset, validation_data=test_dataset, epochs=3)

Output: Output:

(7500, 6) (2500, 6)
Epoch 1/3
1874/1874 [==============================] - 8s 3ms/step - loss: 9974697.3664 - val_loss: 8242597.5000
Epoch 2/3
1874/1874 [==============================] - 6s 3ms/step - loss: 8367530.7117 - val_loss: 8256667.0000
Epoch 3/3
1874/1874 [==============================] - 6s 3ms/step - loss: 8379048.3237 - val_loss: 8233981.5000
<tensorflow.python.keras.callbacks.History at 0x7f3e94bdd198>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM