简体   繁体   中英

Timeseries input to an LSTM

I have dataset containing water samples collected from different locations. For example, ABC1 water sample is taken from a river in Arizona and ABC2 is a water sample taken from a river in Boston. They are both rivers, they have the same feature columns(pH, temp, etc...) but they are in different locations so the changes in features are individual to them. So my goal is to create one river model because I do not have enough data to create individual models. There are total 11 columns that I want to predict next months values. My dataset looks like this:

Date         Sample_Name        pH    temp    etc...

2009-01-01    ABC1              7.2    12
2009-01-02    ABC2              5.5    11
.
.
2009-01-02    ABC1              7.2    10
2009-01-02    ABC2              7.3    10
.
.
2013-06-02    ABC2              6.5    22
2013-06-04    ABC1              6.5    22
.
2015-01-05    ABC1              8.9    13
2015-01-05    ABC4              8.8    13

I want to feed every sample and its sequence to an LSTM model. For example; every measurement(row) of ABC1 must be given to a model as a sequence, or a batch. Is it possible to do this kind of data preparation using TimeseriesGenerator? How can I prepare my data in a way to feed it to the model as I described? Also does it help to sort the dataset with date and sample name(alphabetically)? I am trying to achieve something like this

I want to generate data using:

from keras.preprocessing.sequence import TimeseriesGenerator
n_timesteps = 2
n_features = 10
batch_size = 5
generator = TimeseriesGenerator(df, df, length, sampling_rate = 10, stride = 1, batch_size = batch_size)

The simple LSTM model that I want to feed my data in:

from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.utils import Sequence

model = Sequential()
model.add(LSTM(n_features, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(Dense(10))
model.compile(optimizer='adam', loss='mse', metrics = ['accuracy'])

Looking at the docs ,tf.keras.preprocessing.sequence.TimeseriesGenerator cannot take a dictionary as the first argument. The 'slice' error is just a manifestation of that fact, as the function tries to use slices of the first argument (dict) and fails. again from the docs:

Arguments: data: Indexable generator (such as list or Numpy array) containing consecutive data points (timesteps).

so perhaps you want to pass input_dict['ABC1'] or possibly input_dict['ABC1'].values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM