简体   繁体   中英

variational autoencoder implementation

I am trying to implement a variational autoencoder using python and tensorflow. I have seen various implementations on the internet. I have managed to create my own using the various parts that I found and made them work with my specific case. I have concluded with an autoencoder here: my autoncoder on git

Briefly I have an autoencoder that contains:

1) an encoder with 2 convolutional layers and 1 flatten layer,

2) the latent space ( of dimension 2),

3) and a decoder with the reverse parts of the encoder.

My problem is when I try to implement the variational part of the autoencoder. By that I mean the math procedure in the latent space. Atleast that is where I pinpoint the problem.

To be more clear I have the following 2 cases:

Case 1 : Without actually implementing any variational maths, just simply set the variables in the latent space and feed them in the decoder with no math applied.In that case the cost function is just the difference between input and output. You can see the code for that case in these figures on the git(sorry cannot post more links): figure1_code_part1.png, figure1_code_part2.png

Case2 : Trying to implement the maths in the latent space variables. You can see the code for that case in these figures: figure_2_code_part1.png, figure_2_code_part2.png

The plot of the latent space I get in each of the cases is: figure_1.png figure_2.png

I think something is clearly wrong with the variational implementation, but I can't figure out what. Everyone who implements variational auto encoder uses these mathematic formulas (at least the ones I found on the internet). Probably I am missing something.

Any comments/suggestions are welcome. Thanks!!!

Here is how the mu and sigma with the KL_term to be calculated: I am not sure about the linear part of your code. Hence, I suggested the following:

Please note that here, before the fully connected layers at the encoder side, I have a conv4 layer of shape: [7, 7, 256] .

# These are the weights and biases of the mu and sigma on the encoder side
w_c_mu = tf.Variable(tf.truncated_normal([7 * 7 * 256, latent_dim], stddev=0.1), name='weight_fc_mu')
b_c_mu = tf.Variable(tf.constant(0.1, shape=[latent_dim]), name='biases_fc_mu')
w_c_sig = tf.Variable(tf.truncated_normal([7 * 7 * 256, latent_dim], stddev=0.1), name='weight_fc_sig')
b_c_sig = tf.Variable(tf.constant(0.1, shape=[latent_dim]), name='biases_fc_sig')
epsilon = tf.random_normal([1, latent_dim])

with tf.variable_scope('mu'):
    mu = tf.nn.bias_add(tf.matmul(conv4_reshaped, w_c_mu), b_c_mu)
    tf.summary.histogram('mu', mu)

with tf.variable_scope('stddev'):
    stddev = tf.nn.bias_add(tf.matmul(conv4_reshaped, w_c_sig), b_c_sig)
    tf.summary.histogram('stddev', stddev)

with tf.variable_scope('z'):
    # This formula was adopted from the following paper: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7979344
    latent_var = mu + tf.multiply(tf.sqrt(tf.exp(stddev)), epsilon)
    tf.summary.histogram('features_sig', stddev)

...

with tf.name_scope('loss_KL'):
    temp2 = 1 + tf.log(tf.square(stddev + 1e-9)) - tf.square(mu) - tf.square(stddev)
    KL_term = - 0.5 * tf.reduce_sum(temp2, reduction_indices=1)
    tf.summary.scalar('KL_term', tf.reduce_mean(KL_term))

with tf.name_scope('total_loss'):
    variational_lower_bound = tf.reduce_mean(log_likelihood + KL_term)
    tf.summary.scalar('loss', variational_lower_bound)

with tf.name_scope('optimizer'):
    update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
    with tf.control_dependencies(update_ops):
        optimizer = tf.train.AdamOptimizer(0.00001).minimize(variational_lower_bound)

For the full code: https://gist.github.com/issa-s-ayoub/5267558c4f5694d479a84d960c265452

Wish that helps!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM