What does a Keras TimeDistributed layer actually do?

Question

Given a time-series, I have a multi-step forecasting task, where I want to forecast the same number of times as time steps in a given sequence of the time-series. If I have the following model:

input1 = Input(shape=(n_timesteps, n_channels))
lstm = LSTM(units=100, activation='relu')(input1)
outputs = Dense(n_timesteps, activation="softmax")(lstm)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss="mse", optimizer="adam",
              metrics=["accuracy"])

The n_timesteps on the dense layer means that I will have n_timesteps predictions. But if I wrap the dense layer in a TimeDistributed (or equivalently set return_sequences=True in the LSTM layer), does the number of units still have to be n_timesteps or is it 1, since with the TimeDistributed I would be applying the dense layer to all the times steps in the sequence.

Answer 1

Based on the example you posted, the TimeDistributed will essentially apply a Dense layer with a softmax activation function to each timestep:

import tensorflow as tf

n_timesteps = 10
n_channels = 30
input1 = tf.keras.layers.Input(shape=(n_timesteps, n_channels))
lstm = tf.keras.layers.LSTM(units=100, activation='relu', return_sequences=True)(input1)
outputs = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(n_channels, activation="softmax"))(lstm)
model = tf.keras.Model(inputs=input1, outputs=outputs)

Note that each fully connected layer is equal to the size of the n_channels in order to give each channel a fair chance of being predicted at timestep n .

If you are working on a multi-label problem, you can try something like this:

import tensorflow as tf

n_timesteps = 10
features = 3
input1 = tf.keras.layers.Input(shape=(n_timesteps, features))
lstm = tf.keras.layers.LSTM(units=100, activation='relu', return_sequences=False)(input1)
outputs = tf.keras.layers.Dense(n_timesteps, activation="sigmoid")(lstm)
model = tf.keras.Model(inputs=input1, outputs=outputs)

x = tf.random.normal((1, n_timesteps, features))
y = tf.random.uniform((1, n_timesteps), dtype=tf.int32, maxval=2)

print(x)
print(y)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(x, y, epochs=2)

What does a Keras TimeDistributed layer actually do?

Question

1 answers

solution1
2 ACCPTED 2022-03-29 05:20:28

What does a Keras TimeDistributed layer actually do?

Question

1 answers

solution1 2 ACCPTED 2022-03-29 05:20:28

solution1
2 ACCPTED 2022-03-29 05:20:28