简体   繁体   中英

Tensorflow reuse variables different before and after training

In trying to learn a bit about Tensorflow, I had been building a Variational Auto Encoder, which is working, however I noticed that, after training, I was getting different results from the decoders which are sharing the same variables.

I created two decoders, because the first I train against my dataset, the second I want to eventually feed a new Z encoding in order to produce new values.

My check is that I shoud be able to send the Z values generated from the encoding process to both decoders and get equal results.

I have 2 Decoders (D, D_new). D_new shares the variable scope from D.

before training, I can send values into the Encoder (E) to generate output values as well as the Z values it generated (Z_gen).

if I use Z_gen as input to D_new before training then its output is identical to the output of D, which is expected.

After a few iterations of training, however, the output of D compared with D_new begins to diverge (although they are quite similar).

I have paired this down to a more simple version of my code which still reproduces the error. I'm wondering if others have found this to be the case and where I might be able to correct for it.

The below code can be run in a jupyter notebook. I'm using Tensorflow r0.11 and Python 3.5.0



    import numpy as np
    import tensorflow as tf
    import matplotlib
    import matplotlib.pyplot as plt
    import os
    import pylab as pl
    mgc = get_ipython().magic
    mgc(u'matplotlib inline')
    pl.rcParams['figure.figsize'] = (8.0, 5.0)



    ##-- Helper function Just for visualizing the data
    def plot_values(values, file=None):
        t = np.linspace(1.0,len(values[0]),len(values[0]))
        for i in range(len(values)):
            plt.plot(t,values[i])
        if file is None:
            plt.show()
        else:
            plt.savefig(file)
        plt.close()



    def encoder(input, n_hidden, n_z):
        with tf.variable_scope("ENCODER"):
            with tf.name_scope("Hidden"):
                n_layer_inputs = input.get_shape()[1].value
                n_layer_outputs = n_hidden
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="E_Hidden", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    a = tf.tanh(tf.matmul(input,w))
                prevLayer = a

            with tf.name_scope("Z"):
                n_layer_inputs = prevLayer.get_shape()[1].value
                n_layer_outputs = n_z
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="E_Z", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    Z_gen = tf.matmul(prevLayer,w)
        return Z_gen

    def decoder(input, n_hidden, n_outputs, reuse=False):
        with tf.variable_scope("DECODER", reuse=reuse):
            with tf.name_scope("Hidden"):
                n_layer_inputs = input.get_shape()[1].value
                n_layer_outputs = n_hidden
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="D_Hidden", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    a = tf.tanh(tf.matmul(input,w))
                prevLayer = a

            with tf.name_scope("OUTPUT"):
                n_layer_inputs = prevLayer.get_shape()[1].value
                n_layer_outputs = n_outputs
                with tf.name_scope("Weights"):
                    w = tf.get_variable(name="D_Output", shape=[n_layer_inputs, n_layer_outputs], dtype=tf.float32)
                with tf.name_scope("Activation"):
                    out = tf.sigmoid(tf.matmul(prevLayer,w))
        return out

Here is where the Tensorflow graph is setup:



    batch_size = 3
    n_inputs = 100
    n_hidden_nodes = 12
    n_z = 2

    with tf.variable_scope("INPUT_VARS"):
        with tf.name_scope("X"):
            X = tf.placeholder(tf.float32, shape=(None, n_inputs))
        with tf.name_scope("Z"):
            Z = tf.placeholder(tf.float32, shape=(None, n_z))

    Z_gen = encoder(X,n_hidden_nodes,n_z)

    D = decoder(Z_gen, n_hidden_nodes, n_inputs)
    D_new = decoder(Z, n_hidden_nodes, n_inputs, reuse=True)

    with tf.name_scope("COST"):
        loss = -tf.reduce_mean(X * tf.log(1e-6 + D) + (1-X) * tf.log(1e-6 + 1 - D))
        train_step = tf.train.AdamOptimizer(0.001, beta1=0.5).minimize(loss)

I'm generating a training set of 3 samples of normal distribution noise with 100 data points and then sort it to more easily visualize:



    train_data = (np.random.normal(0,1,(batch_size,n_inputs)) + 3) / 6.0
    train_data.sort()
    plot_values(train_data)

在此处输入图片说明

startup the session:



    sess = tf.InteractiveSession()
    sess.run(tf.group(tf.initialize_all_variables(), tf.initialize_local_variables()))

Lets just look at what the network initially generates before training...



    resultA, Z_vals = sess.run([D, Z_gen], feed_dict={X:train_data})
    plot_values(resultA)

在此处输入图片说明

Pulling the Z generated values and feeding them to D_new which is reusing the variables from D:



    resultB = sess.run(D_new, feed_dict={Z:Z_vals})
    plot_values(resultB)

在此处输入图片说明

Just for sanity I'll plot the difference between the two to be sure they're the same... 在此处输入图片说明

Now run 1000 training epochs and plot the result...



    for i in range(1000):
        _, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})
    plot_values(resultA)

在此处输入图片说明

Now lets feed those same Z values to D_new and plot those results...



    resultB = sess.run(D_new, feed_dict={Z:Z_vals})
    plot_values(resultB)

在此处输入图片说明

They look pretty similar. But (I think) they should be exactly the same. Let's look at the difference...



    plot_values(resultA - resultB)

在此处输入图片说明

You can see there is some variation now. This becomes much more dramatic with a larger network on more complex data, but still shows up in this simple example. Any clues as to what's going on?

There are some methods (don't know which one specifically) which can be supplied with a seed value. Besides those, I'm not even sure if the training process is completely deterministic, especially when the GPU is involved , simply by the nature of parallelization.

See this question .

While I don't have a full explanation for the reason why, I was able to resolve my issue by changing:

for i in range(1000):
    _, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})
plot_values(resultA)

resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
plot_values(resultA - resultB)

to...

for i in range(1000):
    _, resultA, Z_vals = sess.run([train_step, D, Z_gen], feed_dict={X:train_data})

resultA, Z_vals = sess.run([D, Z_gen], feed_dict={X:train_data})

plot_values(resultA)

resultB = sess.run(D_new, feed_dict={Z:Z_vals})
plot_values(resultB)
plot_values(resultA - resultB)

Note, that I simply ran and extracted the result and Z_vals one last time, without the train_step .

The reason I was still seeing problems in my more complex setup was that I had bias variables (even though they were set to 0.0) that were being generated with...

b = tf.Variable(tf.constant(self.bias_k, shape=[n_layer_outputs], dtype=tf.float32))

And that is somehow not considered while using reuse with a tf.variable_scope . So there were variables technically not being reused. Why they presented such a problem when set to 0.0 I'm not sure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM