简体   繁体   中英

Creating tensor of dynamic shape from python lists to feed tensorflow RNN

I'm creating an end-to-end speech recognition architecture, in which my data is a list of segmented spectrograms. My data has shape (batch_size, timesteps, 8, 65, 1) in which batch_size is fixed but timesteps is varying. I can't figure out, how to put this data into a tensor with the appropriate shape to feed my model. Here is a piece of code that shows my problem:

import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense, Dropout, Flatten, TimeDistributed
from tensorflow.keras.layers import SimpleRNN, LSTM
from tensorflow.keras import Input, layers
from tensorflow.keras import backend as K

segment_width = 8
segment_height = 65
segment_channels = 1

batch_size = 4

segment_lengths = [28, 33, 67, 43]
label_lengths = [16, 18, 42, 32]

TARGET_LABELS = np.arange(35)

# Generating data
X = [np.random.uniform(0,1, size=(segment_lengths[k], segment_width, segment_height, segment_channels))
     for k in range(batch_size)]

y = [np.random.choice(TARGET_LABELS, size=label_lengths[k]) for k in range(batch_size)]

# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(None, segment_width, segment_height, segment_channels),
                               dtype='float32')
input_segment_lengths = tf.keras.Input(name='input_segment_lengths', shape=[1], dtype='int64')
input_label_lengths = tf.keras.Input(name='input_label_lengths', shape=[1], dtype='int64')
# More complex architecture comes here
outputs = Flatten()(input_segments_data)

model = tf.keras.Model(inputs=[input_segments_data, input_segment_lengths, input_label_lengths], outputs = outputs)

def dummy_loss(y_true, y_pred):
  return y_pred

model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()

output:

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_segments_data (InputLayer [(None, None, 8, 65, 0                                            
__________________________________________________________________________________________________
input_segment_lengths (InputLay [(None, 1)]          0                                            
__________________________________________________________________________________________________
input_label_lengths (InputLayer [(None, 1)]          0                                            
__________________________________________________________________________________________________
flatten (Flatten)               (None, None)         0           input_segments_data[0][0]        
==================================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
__________________________________________________________________________________________________

Now when I try to predict from my random data:

model.predict([X, segment_lengths, segment_lengths])

I get this error:

ValueError: Error when checking input: expected input_segments_data to have 5 dimensions, but got array with shape (4, 1)

How can I convert X (which is a list of arrays) to a tensor of shape (None, None, 8, 65, 1) and feed it to my model? I don't want to use zero padding!

Keras model takes numpy array (tensor) as input. You cannot have a tensor with variable timesteps. Instead, what you can do is to pad all the data into same shape, using eg pad_sequence And then, you can add a Masking layer to your model to ignore the padded values.

This is a common issue with Tensorflow and other deep learning frameworks that operate on tensors. Unfortunately, there is no current easy way to this exactly as you asked, besides padding your sequences and then masking.

To do this, you simply have to store your input data in a numpy array with fixed dimensions and feed that to the model. You have to add dummy values to represent the missing timesteps in your sequences (a common value is 0).

Then, you have to add a Masking layer to your model, that will tell Keras to ignore the timesteps that have the dummy features. From the documentation :

keras.layers.Masking(mask_value=0.0)

If all features for a given sample timestep are equal to mask_value , then the sample timestep will be masked (skipped) in all downstream layers (as long as they support masking).

I've adapted and simplified part of your code to give you an idea of how this works. You can adapt this to your variable-sized labels, as well:

# Generating data (using a dummy zero-array to store padded sequences)
X = np.zeros((batch_size, max(segment_lengths), segment_width, segment_height, segment_channels))
X_true = [np.ones((segment_lengths[k], segment_width, segment_height, segment_channels)) 
          for k in range(batch_size)]

# Populate dummy array
for i, x in enumerate(X_true): 
    X[i, -segment_lengths[i]:, ...] = x

# Model definition
input_segments_data = tf.keras.Input(name='input_segments_data', shape=(max(segment_lengths), segment_width, segment_height, segment_channels))
masked_segments_data = tf.keras.layers.Masking()(input_segments_data)

# More complex architecture comes here
outputs = tf.keras.layers.Flatten()(input_segments_data)

model = tf.keras.Model(inputs=input_segments_data, outputs = outputs)

def dummy_loss(y_true, y_pred):
  return y_pred

model.compile(optimizer="Adam", loss=dummy_loss)
model.summary()

A drawback of this approach is that if you actually have a "real" feature that is exactly like a dummy feature (eg, all zeros), the model will mask it. Choose your masking value appropriately to avoid this.

An alternative approach would be to do something similar as what you did, but using batches of size 1. This, however, is likely to cause instability in your training procedure and I would avoid it if possible.

As a final note, Tensorflow 2 added support for RaggedTensors , which are tensors with one or more variable dimensions. Currently there is no support for RNNs, but it will probably be added eventually.

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM