简体   繁体   中英

batch size > 1 gives an error using TensorFlow 1.x

I am using this example of a VAE.

The only difference I made was change the loss from binary cross entropy to MSE, like this:

class OptimizerVAE(object):

def __init__(self, model, learning_rate=1e-3):
    """
    OptimizerVAE initializer
    :param model: a model object
    :param learning_rate: float, learning rate of the optimizer
    """

    # binary cross entropy error
    self.bce = tf.keras.losses.mse(model.x, model.logits)
    self.reconstruction_loss = tf.reduce_mean(tf.reduce_sum(self.bce, axis=-1))

    if model.distribution == 'normal':
        # KL divergence between normal approximate posterior and standard normal prior
        self.p_z = tf.distributions.Normal(tf.zeros_like(model.z), tf.ones_like(model.z))
        kl = model.q_z.kl_divergence(self.p_z)
        self.kl = tf.reduce_mean(tf.reduce_sum(kl, axis=-1))*0.1
    elif model.distribution == 'vmf':
        # KL divergence between vMF approximate posterior and uniform hyper-spherical prior
        self.p_z = HypersphericalUniform(model.z_dim - 1, dtype=model.x.dtype)
        kl = model.q_z.kl_divergence(self.p_z)
        self.kl = tf.reduce_mean(kl)*0.1
    else:
        raise NotImplemented

    self.ELBO = - self.reconstruction_loss - self.kl

    self.train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(-self.ELBO)

    self.print = {'recon loss': self.reconstruction_loss, 'ELBO': self.ELBO, 'KL': self.kl}

and when running the original architecture, the model runs perfectly (2 MLP layers), no matter the size of the batches (specified as "None" in the github code).

I am trying to change this to a convolutional model, but when I change just the encoder to this:

def _encoder(self, x):
    """
    Encoder network
    :param x: placeholder for input
    :return: tuple `(z_mean, z_var)` with mean and concentration around the mean
    """
    
    # 2 hidden layers encoder
    #h0 = tf.layers.dense(x, units=self.h_dim * 2, activation=self.activation)
    #h1 = tf.layers.dense(h0, units=self.h_dim, activation=self.activation)
    h1 = tf.layers.conv1d(x, filters = 32, kernel_size = 7, activation = tf.nn.relu)
    h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation =tf.nn.relu)
    h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation = tf.nn.relu)
    h1 = tf.layers.flatten(h1)
    h1 = tf.layers.dense(h1, 32, activation = tf.nn.relu)

    if self.distribution == 'normal':
        # compute mean and std of the normal distribution
        z_mean = tf.layers.dense(h1, units=self.z_dim, activation=None, name = 'z_output')
        z_var = tf.layers.dense(h1, units=self.z_dim, activation=tf.nn.softplus)
    elif self.distribution == 'vmf':
        # compute mean and concentration of the von Mises-Fisher
        z_mean = tf.layers.dense(h1, units=self.z_dim, activation=lambda x: tf.nn.l2_normalize(x, axis=-1))
        # the `+ 1` prevent collapsing behaviors
        z_var = tf.layers.dense(h1, units=1, activation=tf.nn.softplus) + 1
    else:
        raise NotImplemented

    return z_mean, z_var

and when running the model, I get the error:

InvalidArgumentError: Incompatible shapes: [32,1] vs. [32,512,1]
 [[{{node gradients/SquaredDifference_grad/BroadcastGradientArgs}}]]

32 is the batch_size when running the model. The thing that is confusing me is when I run this with batch_size = 1, the model runs!

Where is this going wrong? is it the optimizer and the way it averages?

I solved the issue by reshaping the output from the decoder in form: (win_size, 1), since the MLP fails to add that extra dim'n in!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM