How to interpret the output of a RNN with Keras?

Question

I would like to use a RNN for time series prediction to use 96 backwards steps to predict 96 steps into the future. For this I have the following code:

#Import modules
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from tensorflow import keras

# Define the parameters of the RNN and the training
epochs = 1
batch_size = 50
steps_backwards = 96
steps_forward = 96
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
randomSeedNumber = 50
helpValueStrides =  int(steps_backwards /steps_forward)

#Read dataset
df = pd.read_csv('C:/Users1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])

# standardize data

data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:3]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)


scaler_standardized_X = StandardScaler()
data_X = scaler_standardized_X.fit_transform(data_X)
data_X = pd.DataFrame(data_X)
scaler_standardized_Y = StandardScaler()
data_Y = scaler_standardized_Y.fit_transform(data_Y)
data_Y = pd.DataFrame(data_Y)


# Prepare the input data for the RNN

series_reshaped_X =  np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y =  np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])


timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)

X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards] 
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards] 
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards] 

   
Y_train = series_reshaped_Y[:timeslot_x_train_end, steps_backwards:] 
Y_valid = series_reshaped_Y[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards:] 
Y_test = series_reshaped_Y[timeslot_x_valid_end:, steps_backwards:]                                
   
   
# Build the model and train it

np.random.seed(randomSeedNumber)
tf.random.set_seed(randomSeedNumber)

model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]),
keras.layers.SimpleRNN(10, return_sequences=True),
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides), 
keras.layers.TimeDistributed(keras.layers.Dense(1))
])

model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))

#Predict the test data
Y_pred = model.predict(X_test)

prediction_lastValues_list=[]

for i in range (0, len(Y_pred)):
  prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))

# Create thw dataframe for the whole data
wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,1]))
wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {2:'Feature 2'}, inplace = True)
wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list
wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs()
wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100


# Inverse the scaling (traInv: transformation inversed)

data_X_traInv = scaler_standardized_X.inverse_transform(data_X)
data_Y_traInv = scaler_standardized_Y.inverse_transform(data_Y)
series_reshaped_X_notTransformed =  np.array([data_X_traInv[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
X_test_notTranformed = series_reshaped_X_notTransformed[timeslot_x_valid_end:, :steps_backwards] 
predictions_traInv = scaler_standardized_Y.inverse_transform(wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1))

edictions_traInv = wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1)

# Create thw dataframe for the inversed transformed data
wholeDataFrameWithPrediciton_traInv = pd.DataFrame((X_test_notTranformed[:,0]))
wholeDataFrameWithPrediciton_traInv.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton_traInv.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton_traInv['predictions'] = predictions_traInv
wholeDataFrameWithPrediciton_traInv['difference_absolute'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual']).abs()
wholeDataFrameWithPrediciton_traInv['difference_percentage'] = ((wholeDataFrameWithPrediciton_traInv['difference_absolute'])/(wholeDataFrameWithPrediciton_traInv['actual']))*100
wholeDataFrameWithPrediciton_traInv['difference'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual'])

Here you can have some test data (don't care about the actual values as I made them up, just the shape is important) Download test data

How can the output of the Y_pred data be interpreted? Which of those values yields me the predicted values 96 steps into the future? I have attached a screenshot of the 'Y_pred' data. One time with 5 output neurons in the last layer and one time only with 1. Can anyone tell me, how to interpret the 'Y_pred' data meaning what exactly is the RNN predicting? I can use any values in the output (last layer ) of the RNN model. The 'Y_pred' data always has the shape (Batch size of X_test, timesequence, Number of output neurons). My question is targeting at the last dimension. I thought that these might be the features, but this is not true in my case, as I only have 1 output features (you can see that in the shape of the Y_train , Y_test and Y_valid data).

**Reminder **: The bounty is expiring soon and unfortunately I still have not received any answer. So I would like to remind you on the question and the bounty. I'll highly appreciate every comment.

Answer 1

It may be useful to step through the model inputs/outputs in detail.

When using the keras.layers.SimpleRNN layer with return_sequences=True , the output will return a 3-D tensor where the 0th axis is the batch size, the 1st axis is the timestep, and the 2nd axis is the number of hidden units (in the case for both SimpleRNN layers in your model, 10).

The Conv1D layer will produce an output tensor where the last dimension becomes the number of hidden units (in the case for your model, 16), as it's just being convolved with the input.

keras.layers.TimeDistributed , the layer supplied (in the example provided, Dense(1) ) will be applied to each timestep in the batch independently. So with 96 timesteps, we have 96 outputs for each record in the batch.

So stepping through your model:

model = keras.models.Sequential([
    keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
    keras.layers.SimpleRNN(10, return_sequences=True), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
    keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 16),
    keras.layers.TimeDistributed(keras.layers.Dense(1)) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
])

To answer your question, the output tensor from your model contains the predicted values for 96 steps into the future, for each sample. If it's easier to conceptualize, for the case of 1 output, you can apply np.squeeze to the result of model.predict , which will make the output 2-D:

Y_pred = model.predict(X_test) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
Y_pred_squeezed = np.squeeze(Y_pred) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS)

In that way, you have a rectangular matrix where each row corresponds to a sample in the batch, and each column i corresponds to the prediction for the timestep i .

In the loop after the prediction step, all the timestep predictions are being discarded except for the first one:

for i in range(0, len(Y_pred)):
    prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))

which means the end result is just a list of predictions for the first timestep for each sample in the batch. If you wanted the prediction for the 96th timestep, you could do:

for i in range(0, len(Y_pred)):
    prediction_lastValues_list.append((Y_pred[i][-1][1 - 1]))

Notice the -1 instead of 0 for the second bracket, to ensure we grab the last predicted timestep instead of the first.

As a side note, to replicate the results, I had to make one change to your code, specifically when creating series_reshaped_X and series_reshaped_Y . I hit an exception when using np.array to create the array from the list: ValueError: cannot copy sequence with size 192 to array axis with dimension 3 , but looking at what you were doing (joining tensors along a new axis), I changed it to np.stack , which will accomplish the same goal ( https://numpy.org/doc/stable/reference/generated/numpy.stack.html ):

series_reshaped_X = np.stack([data_X[i:i + (steps_backwards + steps_forward)].copy() for i in
                              range(len(data) - (steps_backwards + steps_forward))])
series_reshaped_Y = np.stack([data_Y[i:i + (steps_backwards + steps_forward)].copy() for i in
                              range(len(data) - (steps_backwards + steps_forward))])

Answer 2

I disagree with @danielcahall on just one point:

The output tensor from your model contains the predicted values for 96 steps into the future, for each sample

The output does contain 96 time steps, one for each input time step, and you can take an output to mean whatever you want. But this is just not a good model for what you're trying to do. The main reason is that the RNNs you're using are single direction.

x   x   x   x   x   x    # input
|   |   |   |   |   | 
x-->x-->x-->x-->x-->x    # SimpleRNN
|   |   |   |   |   | 
x-->x-->x-->x-->x-->x    # SimpleRNN
|  /|\ /|\ /|\ /|\  | 
| / | \ | \ | \ | \ |
x   x   x   x   x   x    # Conv
|   |   |   |   |   | 
x   x   x   x   x   x    # Dense -> output

So the first time index of the output only sees the first 2 input times (thanks to the Conv), it can't see the later times. The first prediction is based only on old data. It's only the last few outputs that can see all the inputs.

use 96 backwards steps to predict 96 steps into the future

Most of the outputs just can't see all the data.

This model would be appropriate if you were trying to predict 1 step into the future from each of the input times.

To predict 96 steps into the future it would be much more reasonable to drop the return_sequences=True and the Conv layer. Then expand the Dense layer to make the prediction:

model = keras.models.Sequential([
    keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
    keras.layers.SimpleRNN(10), # output size is (BATCH_SIZE, 10)
    keras.layers.Dense(96) # output size is (BATCH_SIZE, 96)
])

That way all 96 predictions see all 96 inputs.

See https://www.tensorflow.org/tutorials/structured_data/time_series for more details.

Also SimpleRNN is terrible. Never use it over more than a couple of steps.

How to interpret the output of a RNN with Keras?

Question

2 answers

solution1
0 2021-11-30 13:31:35

solution2
0 2021-11-30 13:57:57

How to interpret the output of a RNN with Keras?

Question

2 answers

solution1 0 2021-11-30 13:31:35

solution2 0 2021-11-30 13:57:57

solution1
0 2021-11-30 13:31:35

solution2
0 2021-11-30 13:57:57