Training a ProGAN : ValueError: Data cardinality is ambiguous (tensorflow/keras)

Question

I am new on stackoverflow.

I am trying to use a ProGAN notebook from Soon Yau Cheong. I am trying to understand everything line by line then make it work with my own project.

Notebook is here: https://github.com/PacktPublishing/Hands-On-Image-Generation-with-TensorFlow-2.0/blob/master/Chapter07/ch7_progressive_gan.ipynb

Short error is:

ValueError: Data cardinality is ambiguous:
x sizes: 16, 1
y sizes: 16 Make sure all arrays contain the same number of samples.

This error happens shortly after starting the training by launching the cell:

gan.train(train_datasets, 20000, 4000)

I think it starts from the notebook part here:

def train_step(self,  log2_res, data_gen, alpha):
    real_images = next(data_gen)
    self.d_loss = self.train_discriminator_wgan_gp(real_images, alpha)

    real_images = next(data_gen)
    batch_size = real_images.shape[0]
    real_labels = tf.ones(batch_size)
    
    z = tf.random.normal((batch_size, self.z_dim))

    self.g_loss = self.model.train_on_batch([z, alpha], real_labels)

Then the mayhem seems to be happening in keras scripts but it's hard for me to understand what's going on.

Here are the things I am trying to give to keras through its train_on_batch() method (doc here: https://keras.io/api/models/model_training_apis/ ):

-z are (batch_size, 512) latent vectors used as seeds for image generation.

-alpha is (1, 1) and is the fade-in number going from 0 to 1 during fade-in phases where the output size progressively doubles.

-real_labels is a (batch_size,) array of ones representing the "real image" labels for a batch_size number of instances.

With no success, I tried to do things such as making real_labels = tf.expand_dims(real_labels, axis=-1) (in order to reshape it into (batch_size, 1))

I wonder what I can do because I kinda have to pass those variables as if. Maybe something changed in keras since the author made his code and it's something easy enough to fix...? Please, help!

My environment:

-Windows 10

-miniconda venv with python 3.7,

-tensorflow-gpu==2.6.0 installed via conda install

(I know it's not the exact requirements of the notebook from the git I linked but I spent hours trying to install tensorflow-gpu 2.2.0 with the right cuda/cudnn and tensorflow still does not see my GPU, but on version 2.6.0, at least it does thanks to conda facilitating the installation on recent tensorflow versions.)

Here is the complete error message with refs to the keras scripts:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [22], in <cell line: 1>()
----> 1 gan.train(train_datasets, 200, 40)

Input In [21], in ProgressiveGAN.train(self, datasets, steps_per_phase, tick_interval)
    387             print(msg)
    389             self.checkpoint(self.val_z, log2_res, step)
--> 391         self.train_step(log2_res, data_gen, self.alpha)
    393 if log2_res != self.log2_resolution:
    394     self.grow_model(log2_res+1)

Input In [21], in ProgressiveGAN.train_step(self, log2_res, data_gen, alpha)
    306 print("alpha shape", alpha.shape)
    307 print("real labels", real_labels)
--> 308 self.g_loss = self.model.train_on_batch([z, alpha], real_labels)

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\keras\engine\training.py:1850, in Model.train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics, return_dict)
   1847 _disallow_inside_tf_function('train_on_batch')
   1848 with self.distribute_strategy.scope(), \
   1849      training_utils.RespectCompiledTrainableState(self):
-> 1850   iterator = data_adapter.single_batch_iterator(self.distribute_strategy, x,
   1851                                                 y, sample_weight,
   1852                                                 class_weight)
   1853   self.train_function = self.make_train_function()
   1854   logs = self.train_function(iterator)

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\keras\engine\data_adapter.py:1650, in single_batch_iterator(strategy, x, y, sample_weight, class_weight)
   1647 else:
   1648   data = (x, y, sample_weight)
-> 1650 _check_data_cardinality(data)
   1651 dataset = dataset_ops.DatasetV2.from_tensors(data)
   1652 if class_weight:

File ~\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\keras\engine\data_adapter.py:1666, in _check_data_cardinality(data)
   1663   msg += "  {} sizes: {}\n".format(
   1664       label, ", ".join(str(i.shape[0]) for i in nest.flatten(single_data)))
   1665 msg += "Make sure all arrays contain the same number of samples."
-> 1666 raise ValueError(msg)

ValueError: Data cardinality is ambiguous:
  x sizes: 16, 1
  y sizes: 16
Make sure all arrays contain the same number of samples.

Answer 1

Okay so I had a deeper look and found a solution. In fact, "x sizes" were just the sizes for my two listed input (latent vec and alpha factor).

Repeating the alpha factor in a batch sized array before giving it to train_on_batch() corrected the cardinality issue and allowed the train to work.

 # add this:
 alpha = tf.repeat(alpha, repeats=batch_size , axis=None, name=None)
 
 self.g_loss = self.model.train_on_batch([z, alpha], real_labels)

I am a bit confused as to why it was not needed for the author when he made his notebook though (something must have changed between tf 2.2.0 and 2.6.0 or a line was omitted in the notebook), in his handbook, he does talk about how the alpha fade-in parameter is being passed as an input to work at runtime and so it is batch sized, etc.

Training a ProGAN : ValueError: Data cardinality is ambiguous (tensorflow/keras)

Question

1 answers

solution1
1 2022-04-06 07:21:38

Training a ProGAN : ValueError: Data cardinality is ambiguous (tensorflow/keras)

Question

1 answers

solution1 1 2022-04-06 07:21:38

solution1
1 2022-04-06 07:21:38