简体   繁体   中英

keras lstm incorrect input_shape

I am trying to use a lstm model to predict the weather (mainly to learn about lstm's and using python).

I have a dataset of 500,000 rows each of which represents a date and there are 8 columns which are my features.

Below is my model.

 model = Sequential()      
 model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))   
 model.add(Dropout(0.2))

 model.add(LSTM(100, return_sequences=True))
 model.add(Dropout(0.2))

 model.add(LSTM(50, return_sequences=False))
 model.add(Dropout(0.2))

 model.add(Dense(1))
 model.add(Activation('linear'))

 model.fit(
        X,
        y,
        batch_size=512,
        epochs=100,
        validation_split=0.05)

For the input parameters as I understand it the first parameter is the time step so here I am saying that I think the last 30 observations should be used to predict the next value. The 8 as I understand are the features so, air pressure, temperature etc.

So my X matrix I convert into a 3D matrix with the line below so X is now 500000, 8, 1 matrix.

X = np.reshape(X, (X.shape[0], X.shape[1], 1))

When I run the model though I get the error below.

ValueError: Error when checking input: expected lstm_3_input to have shape (30, 8) but got array with shape (8, 1)

What am I doing wrong?

Your issue is with data preparation . Find details on data preparation for LSTMs here .

LSTMs map a sequence of past observations as input to an output observation. As such, the sequence of observations must be transformed into multiple samples Consider a given univariate sequence:

[10, 20, 30, 40, 50, 60, 70, 80, 90]

We can divide the sequence into multiple input/output patterns called samples, where three n_steps time steps are used as input and one time step is used as label for the one-step prediction that is being learned.

X,              y
10, 20, 30      40
20, 30, 40      50
30, 40, 50      60
# ...

So what you want to do is implemented in the split_sequence() function below:

# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
    X, y = list(), list()
    for i in range(len(sequence)):
        # find the end of this pattern
        end_ix = i + n_steps
        # check if we are beyond the sequence
        if end_ix > len(sequence)-1:
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
        X.append(seq_x)
        y.append(seq_y)
    return array(X), array(y)

Getting back to our initial example the following happens:

# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps = 3
# split into samples
X, y = split_sequence(raw_seq, n_steps)
# summarize the data
for i in range(len(X)):
    print(X[i], y[i])

# [10 20 30] 40
# [20 30 40] 50
# [30 40 50] 60
# [40 50 60] 70
# [50 60 70] 80
# [60 70 80] 90

Take away: Now your shapes should be what your LSTM model expects them to be, and you should be able to adjust your data shape to your needs. Obviously the same works for multiple input feature rows.

I think your input shape is off. The NN does not understand that you want it to take slices of 30 points to predict 31st. What you need to do is to slice your dataset into chunks of length 30 (which means each point is going to be copied 29 time) and train on that, which will have a shape of (499969, 30, 8), assuming that last point goes only into y . Also do not add a dummy dimension at the end, it is needed in conv layers for RGB channels.

I think you might need just a simple explanation of how layers work. In particular, note that all Keras layers behave something like this:

NAME(output_dim, input_shape = (...,input_dim))

For example, suppose I have 15000, 3 long vectors and I would like to change them to 5 long vectors. Then something like this would do that:

import numpy as np, tensorflow as tf

X = np.random.random((15000,3))
Y = np.random.random((15000,5))

M = tf.keras.models.Sequential()
M.add(tf.keras.layers.Dense(5,input_shape=(3,)))

M.compile('sgd','mse')
M.fit(X,Y) # Take note that I provided complete working code here. Good practice. 
           # I even include the imports and random data to check that it works. 

Likewise, if my input looks something like (1000,10,5) and I run it through an LSTM like LSTM(7); then I should know (automatically) that I will get something like (...,7) as my output. Those 5 long vectors will get changed to 7 long vectors. Rule to understand. The last dimension is always the vector you are changing and the first parameter of the layer is always the dimension to change it to.

Now the second thing to learn about LSTMs. They use a time axis (which is not the last axis, because as we just went over, that is always the "changing dimension axis") which is removed if return_sequences=False and kept if return_sequences=True. Some examples:

LSTM(7) # (10000,100,5) -> (10000,7)
# Here the LSTM will loop through the 100, 5 long vectors (like a time series with memory),
# producing 7 long vectors. Only the last 7 long vector is kept. 
LSTM(7,return_sequences=True) # (10000,100,5) -> (10000,100,7)
# Same thing as the layer above, except we keep all the intermediate steps. 

You provide a layer that looks like this:

LSTM(50,input_shape=(30,8),return_sequences=True) # (10000,30,8) -> (10000,30,50)

Notice the 30 is the TIME dimension used in your LSTM model. The 8 and the 50 are the INPUT_DIM and OUTPUT_DIM, and have nothing to do with the time axis. Another common misunderstanding, notice that the LSTM expects you to provide each SAMPLE with it's own COMPLETE PAST and TIME AXIS. That is, an LSTM does not use previous sample points for the next sample point; each sample is independent and comes with it's own complete past data.

So let's take a look at your model. Step one. What is your model doing and what kind of data is it expecting?

from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential

model = Sequential()      
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))   
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')

print(model.input_shape)
model.summary() # Lets see what your model is doing. 

So, now I clearly see your model does: (10000,30,8) -> (10000,30,50) -> (10000,30,100) -> (10000,50) -> (10000,1)

Did you expect that? Did you see that those would be the dimensions of the intermediate steps? Now that I know what input and output your model is expecting, I can easily verify that your model trains and works on that kind of data.

from tensorflow.keras.layers import LSTM, Dropout, Activation
from tensorflow.keras.models import Sequential
import numpy as np

X = np.random.random((10000,30,8))
Y = np.random.random((10000,1))

model = Sequential()      
model.add(LSTM(50, input_shape=(30, 8), return_sequences=True))   
model.add(Dropout(0.2))
model.add(LSTM(100, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.add(Activation('linear'))
model.compile('sgd','mse')

model.fit(X,Y)

Did you notice that your model was expecting inputs like (...,30,8)? Did you know your model was expecting output data that looked like (...,1)? Knowing what your model wants, also means you can now change your model to fit the data your interested in. If you want your data to run over your 8 parameters like a time axis, then your input dimension needs to reflect that. Change the 30 to an 8 and change the 8 to a 1. If you do this, notice also that your first layer is expanding each 1 long vector (a single number) into a 50 long vector. Does that sound like what you wanted the model to do? Maybe your LSTM should be an LSTM(2) or LSTM(5) instead of 50...etc. You could spend the next 1000 hours trying to find the right parameters that work with the data you are using.

Maybe you don't want to go over your FEATURE space as a TIME SPACE, maybe try repeating your data into batches of size 10, where each sample has it's own history, dimensions say (10000,10,8). Then a LSTM(50) would use your 8 long feature space and change it into a 50 long feature space while going over the TIME AXIS of 10. Maybe you just want to keep the last one with return_sequences=False.

Let me copy a function I used for preparing my data for LSTM:

from itertools import islice

def slice_data_for_lstm(data, lookback):
    return np.array(list(zip(*[islice(np.array(data), i, None, 1) for i in range(lookback)])))

X_sliced = slice_data_for_lstm(X, 30)

lookback should be 30 in your case and will create 30 stacks of your (8, 1) features. The resulting data is in shape (N, 30, 8, 1).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM