Tensorflow reuse variables different before and after training

Question

In trying to learn a bit about Tensorflow, I had been building a Variational Auto Encoder, which is working, however I noticed that, after training, I was getting different results from the decoders which are sharing the same variables.

I created two decoders, because the first I train against my dataset, the second I want to eventually feed a new Z encoding in order to produce new values.

My check is that I shoud be able to send the Z values generated from the encoding process to both decoders and get equal results.

I have 2 Decoders (D, D_new). D_new shares the variable scope from D.

before training, I can send values into the Encoder (E) to generate output values as well as the Z values it generated (Z_gen).

if I use Z_gen as input to D_new before training then its output is identical to the output of D, which is expected.

After a few iterations of training, however, the output of D compared with D_new begins to diverge (although they are quite similar).

I have paired this down to a more simple version of my code which still reproduces the error. I'm wondering if others have found this to be the case and where I might be able to correct for it.

The below code can be run in a jupyter notebook. I'm using Tensorflow r0.11 and Python 3.5.0



    import numpy as np
    import tensorflow as tf
    import matplotlib
    import matplotlib.pyplot as plt
    import os
    import pylab as pl
    mgc = get_ipython().magic
    mgc(u'matplotlib inline')
    pl.rcParams['figure.figsize'] = (8.0, 5.0)



    ##-- Helper function Just for visualizing the data
    def plot_values(values, file=None):
        t = np.linspace(1.0,len(values[0]),len(values[0]))
        for i in range(len(values)):
            plt.plot(t,values[i])
        if file is None:
            plt.show()
        else:
            plt.savefig(file)
        plt.close()



    def encoder(input, n_hidden, n_z):
        with tf.variable_scope("ENCODER"):
            with tf.name_scope("Hidden"):
                n_layer_inputs = input.get_shape()[1].value
                n_layer_outputs = n_hidden
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="E_Hidden", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    a = tf.tanh(tf.matmul(input,w))
                prevLayer = a

            with tf.name_scope("Z"):
                n_layer_inputs = prevLayer.get_shape()[1].value
                n_layer_outputs = n_z
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="E_Z", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    Z_gen = tf.matmul(prevLayer,w)
        return Z_gen

    def decoder(input, n_hidden, n_outputs, reuse=False):
        with tf.variable_scope("DECODER", reuse=reuse):
            with tf.name_scope("Hidden"):
                n_layer_inputs = input.get_shape()[1].value
                n_layer_outputs = n_hidden
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="D_Hidden", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    a = tf.tanh(tf.matmul(input,w))
                prevLayer = a

            with tf.name_scope("OUTPUT"):
                n_layer_inputs = prevLayer.get_shape()[1].value
                n_layer_outputs = n_outputs
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="D_Output", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    out = tf.sigmoid(tf.matmul(prevLayer,w))
        return out

Here is where the Tensorflow graph is setup:



    batch_size = 3
    n_inputs = 100
    n_hidden_nodes = 12
    n_z = 2

    with tf.variable_scope("INPUT_VARS"):
        with tf.name_scope("X"):
            X = tf.placeholder(tf.float32, shape=(None, n_inputs))
        with tf.name_scope("Z"):
            Z = tf.placeholder(tf.float32, shape=(None, n_z))

    Z_gen = encoder(X,n_hidden_nodes,n_z)

    D = decoder(Z_gen, n_hidden_nodes, n_inputs)
    D_new = decoder(Z, n_hidden_nodes, n_inputs, reuse=True)

    with tf.name_scope("COST"):
        loss = -tf.reduce_mean(X * tf.log(1e-6 + D) + (1-X) * tf.log(1e-6 + 1 - D))
        train_step = tf.train.AdamOptimizer(0.001, beta1=0.5).minimize(loss)

I'm generating a training set of 3 samples of normal distribution noise with 100 data points and then sort it to more easily visualize:



    train_data = (np.random.normal(0,1,(batch_size,n_inputs)) + 3) / 6.0
    train_data.sort()
    plot_values(train_data)

startup the session:



    sess = tf.InteractiveSession()
    sess.run(tf.group(tf.initialize_all_variables(), tf.initialize_local_variables()))

Lets just look at what the network initially generates before training...



    resultA, Z_vals = sess.run([D, Z_gen], feed_dict={X:train_data})
    plot_values(resultA)

Pulling the Z generated values and feeding them to D_new which is reusing the variables from D:



    resultB = sess.run(D_new, feed_dict={Z:Z_vals})
    plot_values(resultB)

Just for sanity I'll plot the difference between the two to be sure they're the same...

Now run 1000 training epochs and plot the result...



    for i in range(1000):
        _, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})
    plot_values(resultA)

Now lets feed those same Z values to D_new and plot those results...



    resultB = sess.run(D_new, feed_dict={Z:Z_vals})
    plot_values(resultB)

They look pretty similar. But (I think) they should be exactly the same. Let's look at the difference...



    plot_values(resultA - resultB)

You can see there is some variation now. This becomes much more dramatic with a larger network on more complex data, but still shows up in this simple example. Any clues as to what's going on?

Answer 1

There are some methods (don't know which one specifically) which can be supplied with a seed value. Besides those, I'm not even sure if the training process is completely deterministic, especially when the GPU is involved , simply by the nature of parallelization.

See this question .

Answer 2

While I don't have a full explanation for the reason why, I was able to resolve my issue by changing:

for i in range(1000):
    _, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})
plot_values(resultA)

resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
plot_values(resultA - resultB)

to...

for i in range(1000):
    _, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})

resultA, Z_vals = sess.run([D, Z_gen], feed_dict={X:train_data})

plot_values(resultA)

resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
plot_values(resultA - resultB)

Note, that I simply ran and extracted the result and Z_vals one last time, without the train_step .

The reason I was still seeing problems in my more complex setup was that I had bias variables (even though they were set to 0.0) that were being generated with...

b = tf.Variable(tf.constant(self.bias_k, shape=[n_layer_outputs], dtype=tf.float32))

And that is somehow not considered while using reuse with a tf.variable_scope . So there were variables technically not being reused. Why they presented such a problem when set to 0.0 I'm not sure.

Tensorflow reuse variables different before and after training

Question

2 answers

solution1
0 2016-11-22 21:32:26

solution2
0 2016-11-22 23:40:33

Tensorflow reuse variables different before and after training

Question

2 answers

solution1 0 2016-11-22 21:32:26

solution2 0 2016-11-22 23:40:33

solution1
0 2016-11-22 21:32:26

solution2
0 2016-11-22 23:40:33