简体   繁体   English

LSTM:层序的输入0与层不兼容

[英]LSTM: Input 0 of layer sequential is incompatible with the layer

I know there are several questions about this here, but I haven't found one which fits exactly my problem.我知道这里有几个问题,但我还没有找到一个完全适合我的问题的问题。 I'm trying to fit an LSTM with data from Pandas DataFrames but getting confused about the format I have to provide them.我正在尝试使用来自 Pandas DataFrames 的数据来拟合 LSTM,但对我必须提供的格式感到困惑。 I created a small code snipped which shall show you what I try to do:我创建了一个小代码片段,它将向您展示我尝试做的事情:

import pandas as pd, tensorflow as tf, random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

targets = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
targets['A'] = [random.random() for _ in range(len(targets))]
targets['B'] = [random.random() for _ in range(len(targets))] 
features = pd.DataFrame(index=targets.index)
for i in range(len(features)) :
    features[str(i)] = [random.random() for _ in range(len(features))] 

model = Sequential()
model.add(LSTM(units=targets.shape[1], input_shape=features.shape))
model.compile(optimizer='adam', loss='mae')

model.fit(features, targets, batch_size=10, epochs=10)

this results to:这导致:

ValueError: Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=2. ValueError: 层序的输入 0 与层不兼容:预期 ndim=3,发现 ndim=2。 Full shape received: [10, 300]收到的完整形状:[10, 300]

which I expect relates to the dimensions of the features DataFrame provided.我预计这与 DataFrame 提供的功能的尺寸有关。 I guess that once fixed this the next error would mention the targets DataFrame.我想一旦解决了这个问题,下一个错误就会提到目标DataFrame。

As far as I understand, 'units' parameter of my first layer defines the output dimensionality of this model.据我了解,我的第一层的“单位”参数定义了这个 model 的 output 维度。 The inputs have to have a 3D shape, but I don't know how to create them out of the 2D world of the Data Frames.输入必须具有 3D 形状,但我不知道如何在数据帧的 2D 世界中创建它们。 I hope you can help me understanding the reshape mechanism in Python and how to use them in combination with Pandas DataFrames.我希望你能帮助我理解 Python 中的重塑机制以及如何将它们与 Pandas DataFrames 结合使用。 (I'm quite new to Python and came from R) (我对 Python 很陌生,来自 R)

Thankls in advance提前感谢

Lets looks at the few popular ways in LSTMs are used.让我们看一下LSTM中几种流行的使用方式。

Many to Many多对多

Example: You have a sentence (composed of words in sequence).示例:您有一个句子(由按顺序排列的单词组成)。 Give these sequence of words you would like to predict the Parts of speech (POS) of each word.给出你想要预测每个单词的词性(POS)的这些单词序列。

在此处输入图像描述

So you have n words and you feed each word per timestep to the LSTM.因此,您有n单词,并且将每个时间步长的每个单词都提供给 LSTM。 Each LSTM timestep (also called LSTM unwrapping) will produce and output.每个 LSTM 时间步长(也称为 LSTM 展开)将产生 output。 The word is represented by aa set of features normally word embeddings.单词由一组通常是单词嵌入的特征表示。 So the input to LSTM is of size bath_size X time_steps X features所以 LSTM 的输入大小为bath_size X time_steps X features

Keras code: Keras 代码:

inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = True)(inputs)
outputs = keras.layers.TimeDistributed(keras.layers.Dense(5, activation='softmax'))(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')

X = np.random.randn(4,10,3) 
y = np.random.randint(0,2, size=(4,10,5))

model.fit(X, y, epochs=2)
print (model.predict(X).shape)

Many to One多对一

Example: Again you have a sentence (composed of words in sequence).示例:再次,您有一个句子(由按顺序排列的单词组成)。 Give these sequence of words you would like to predict sentiment of the sentence if it is positive or negative.给出这些单词序列,你想预测句子的情绪是正面的还是负面的。

在此处输入图像描述

Keras code Keras 代码

inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
outputs =keras.layers.Dense(5, activation='softmax')(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')

X = np.random.randn(4,10,3) 
y = np.random.randint(0,2, size=(4,5))

model.fit(X, y, epochs=2)
print (model.predict(X).shape)

Many to multi-headed多头到多头

Example: You have a sentence (composed of words in sequence).示例:您有一个句子(由按顺序排列的单词组成)。 Give these sequence of words you would like to predict sentiment of the sentence as well the author of the sentence.给出你想要预测句子的情绪以及句子作者的这些单词序列。

This is multi-headed model where one head will predict the sentiment and another head will predict the author.这是多头 model,其中一个负责预测情绪,另一个负责预测作者。 Both the heads share the same LSTM backbone.两个头共享相同的 LSTM 主干。

在此处输入图像描述

Keras code Keras 代码

inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
output_A = keras.layers.Dense(5, activation='softmax')(lstm)
output_B = keras.layers.Dense(5, activation='softmax')(lstm)

model = keras.Model(inputs=inputs, outputs=[output_A, output_B])
model.compile(loss='categorical_crossentropy', optimizer='adam')

X = np.random.randn(4,10,3) 
y_A = np.random.randint(0,2, size=(4,5))
y_B = np.random.randint(0,2, size=(4,5))

model.fit(X, [y_A, y_B], epochs=2)
y_hat_A, y_hat_B = model.predict(X)
print (y_hat_A.shape, y_hat_B.shape)

What you are looking for is Many to Multi head model where your predictions for A will be made by one head and another head will make predictions for B您正在寻找的是多对多头 model ,您对A的预测将由一个头做出,另一个头将对B做出预测

The input data for the LSTM has to be 3D. LSTM 的输入数据必须是 3D。

If you print the shapes of your DataFrames you get:如果您打印 DataFrames 的形状,您将获得:

targets : (300, 2)
features : (300, 300)

The input data has to be reshaped into (samples, time steps, features) .输入数据必须重塑为(samples, time steps, features) This means that targets and features must have the same shape.这意味着目标特征必须具有相同的形状。

You need to set a number of time steps for your problem, in other words, how many samples will be used to make a prediction.您需要为您的问题设置多个时间步长,换句话说,将使用多少样本来进行预测。

For example, if you have 300 days and 2 features the time step can be 3. So that three days will be used to make one prediction (you can choose this arbitrarily).例如,如果您有 300 天和 2 个特征,则时间步长可以是 3。因此将使用 3 天进行一次预测(您可以任意选择)。 Here is the code for reshaping your data (with a few more changes):这是重塑数据的代码(还有一些更改):

import pandas as pd
import numpy as np
import tensorflow as tf
import random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

data = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
data['A'] = [random.random() for _ in range(len(data))]
data['B'] = [random.random() for _ in range(len(data))]

# Choose the time_step size.
time_steps = 3
# Use numpy for the 3D array as it is easier to handle.
data = np.array(data)

def make_x_y(ts, data):
    """
    Parameters
    ts : int
    data : numpy array

    This function creates two arrays, x and y. 
    x is the input data and y is the target data.
    """
    x, y = [], []
    offset = 0
    for i in data:
        if offset < len(data)-ts:
            x.append(data[offset:ts+offset])
            y.append(data[ts+offset])
            offset += 1
    return np.array(x), np.array(y)

x, y = make_x_y(time_steps, data)

print(x.shape, y.shape)

nodes = 100  # This is the width of the network.
out_size = 2  # Number of outputs produced by the network. Same size as features.

model = Sequential()
model.add(LSTM(units=nodes, input_shape=(x.shape[1], x.shape[2])))
model.add(Dense(out_size))  # For the output a Dense (fully connected) layer is used.
model.compile(optimizer='adam', loss='mae')
model.fit(x, y, batch_size=10, epochs=10)

Well, just to finalize this issue I would like to provide one solution I have meanwhile worked on.好吧,为了最终确定这个问题,我想提供一个我同时研究过的解决方案。 The class TimeseriesGenerator in tf.keras.... enabled me quite easy to provide the data in the right shape to an LSTM model tf.keras 中的 class TimeseriesGenerator .... 使我能够很容易地将正确形状的数据提供给 LSTM model

from keras.preprocessing.sequence import TimeseriesGenerator
import numpy as np

window_size   = 7
batch_size    = 8
sampling_rate = 1

train_gen = TimeseriesGenerator(X_train.values, y_train.values,
                               length=window_size, sampling_rate=sampling_rate,
                               batch_size=batch_size)

valid_gen = TimeseriesGenerator(X_valid.values, y_valid.values,
                               length=window_size, sampling_rate=sampling_rate,
                               batch_size=batch_size)
test_gen  = TimeseriesGenerator(X_test.values, y_test.values,
                               length=window_size, sampling_rate=sampling_rate,
                               batch_size=batch_size)

There are many other ways on implementing generators eg using the more_itertools which provides the function windowed , or making use of tensorflow.Dataset and its function window . There are many other ways on implementing generators eg using the more_itertools which provides the function windowed , or making use of tensorflow.Dataset and its function window . For me the TimeseriesGenerator was sufficient to feed the tests I did.对我来说, TimeseriesGenerator足以满足我所做的测试。 In case you would like to see an example modeling the DAX based on some stocks I'm sharing a notebook on Github.如果您想查看基于某些股票的 DAX 建模示例,我将在 Github 上共享一个笔记本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM