tf.data.Dataset object as input to tf.Keras model -- ValueError

Question

I am attempting to train a simple 3DCNN for action classification on a subset of the kinetics dataset. I am passing a tf.data.Dataset.from_generator() object as the input in the call to model.fit().

tensorflow version: r1.12

The generator that the tf.data.Dataset is initialized from yields a tuple of np.arrays. The first is a pre-processed video with shape (50,45,80,3), the second is the one-hot encoding of the class with shape (22,)

The code:

import os
import numpy as np
import itertools

import tensorflow as tf
import tensorflow.data as data
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import MaxPooling3D, Conv3D, BatchNormalization, Dense 
from tensorflow.keras.layers import Dropout, Activation, Flatten, Input


def train_generator():
    train_dir = '/home/kjd/Storage/kinetics-frames_proc_small'
    classes = os.listdir(train_dir)
    for index, label in enumerate(classes):
        clips = os.listdir(train_dir + '/' + label)
        for clip in clips:
            data = np.load(train_dir + '/' + label + '/' + clip)
            yield data, np.eye(22)[index].astype(int)


EPOCHS = 3
BATCH_SIZE = 32
dataset = data.Dataset.from_generator(train_generator, (tf.int64, tf.int64))



model = Sequential()
model.add(Conv3D(16, (3,3,3), strides=(1,1,1), padding='same', activation='relu',
                 input_shape=(50,45,80,3)))
model.add(Conv3D(32, (3,3,3), strides=(1,1,1), padding='same', activation='relu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))
model.add(BatchNormalization())
model.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', activation='relu'))
model.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same', activation='relu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))
model.add(BatchNormalization())
model.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same', activation='relu'))
model.add(Conv3D(512, (3,3,3), strides=(1,1,1), padding='same', activation='relu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(22, activation='softmax'))


model.compile('adam', 'categorical_crossentropy', metrics=['accuracy'])
model.fit(dataset, batch_size=BATCH_SIZE, epochs=EPOCHS, shuffle=False,
          steps_per_epoch=1000)

The error:

Traceback (most recent call last):
  File "train.py", line 55, in <module>
    steps_per_epoch=1000)
  File "/home/kjd/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1683, in fit
    shuffle=shuffle)
  File "/home/kjd/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1200, in _standardize_user_data
    class_weight, batch_size)
  File "/home/kjd/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1328, in _standardize_weights
    exception_prefix='input')
  File "/home/kjd/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 294, in standardize_input_data
    data = [standardize_single_array(x) for x in data]
  File "/home/kjd/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 294, in <listcomp>
    data = [standardize_single_array(x) for x in data]
  File "/home/kjd/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 228, in standardize_single_array
    if x.shape is not None and len(x.shape) == 1:
  File "/home/kjd/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 745, in __len__
    raise ValueError("Cannot take the length of shape with unknown rank.")
ValueError: Cannot take the length of shape with unknown rank.

It seems tf.keras doesn't like something about the format of my input data. I'm fairly new to tf/keras and not gleaning a whole lot from this error message though. If anyone has any insight into what the problem is, your thoughts would be much appreciated.

Answer 1

I just got stuck with a similar problem trying to distribute my

<DatasetV1Adapter shapes: <unknown>, types: tf.float32>" dataset using strategy.experimental_distribute_dataset() with tf.distribute.MirroredStrategy() as strategy. I got the same error as above (" raise ValueError("Cannot take the length of shape with unknown rank ValueError: Cannot take the length of shape with unknown rank. ") For anyone who gets stuck with a similar problem, my solution was to use my DatasetV1Adapter dataset and create a new dataset using data.Dataset.from_generator as follows:

        def generator(dataset):
            # dataset of type DatasetV1Adapter 
            for datapoint in dataset:
                yield datapoint

    dataset = tf.data.Dataset.from_generator(generator, (tf.float32), output_shapes=([None, None, None, None]))

dataset_dist = strategy.experimental_distribute_dataset(dataset)

Worked for me!

Answer 2

I had this issue recently; you probably need to provide the output_shapes argument:

dataset = data.Dataset.from_generator(train_generator, (tf.int64, tf.int64), output_shapes=(tf.TensorShape([None, None, None, None]), tf.TensorShape([None])))

assuming a 4-dimensional input image and a 1-dimensional output array.

tf.data.Dataset object as input to tf.Keras model -- ValueError

Question

2 answers

solution1
1 2019-12-12 13:56:07

solution2
0 2019-08-27 22:43:41

tf.data.Dataset object as input to tf.Keras model -- ValueError

Question

2 answers

solution1 1 2019-12-12 13:56:07

solution2 0 2019-08-27 22:43:41

solution1
1 2019-12-12 13:56:07

solution2
0 2019-08-27 22:43:41