Using a pre-trained model to train a new model in tensor flow

Question

I am creating a CNN auto encoder to act as a feature extractor followed by a simple MLP classifier in tensor flow. I am doing the training separately, so I first train the auto encoder to encode the data into a lower dimensional feature space, and then train the MLP classifier separately by passing the inputs through the trained auto-encoder, and then through the MLP.

I am currently having issues in connecting the two models. My method is to load the old graph, and get placeholders to the output and original input tensors. Then, I also create a stop gradient on the last layer of the original graph so that I only train the MLP and not the auto encoder. Then I initialize only the variables of the new graph by using a variable scope.

I am getting multiple errors when I run the code, ranging from uninitialized variables to having too many. Is there a better way to do this? I will include the code below.

Code for working auto encoder training

import tensorflow as tf
import numpy as np
import math

def lrelu(x, leak=0.2, name="lrelu"):
    """Leaky rectifier.
    Parameters
    ----------
    x : Tensor
        The tensor to apply the nonlinearity to.
    leak : float, optional
        Leakage parameter.
    name : str, optional
        Variable scope to use.
    Returns
    -------
    x : Tensor
        Output of the nonlinearity.
    """
    with tf.variable_scope(name):
        f1 = 0.5 * (1 + leak)
        f2 = 0.5 * (1 - leak)
        return f1 * x + f2 * abs(x)

def corrupt(x):
    """Take an input tensor and add uniform masking.
    Parameters
    ----------
    x : Tensor/Placeholder
        Input to corrupt.
    Returns
    -------
    x_corrupted : Tensor
        50 pct of values corrupted.
    """
    return tf.multiply(x, tf.cast(tf.random_uniform(shape=tf.shape(x),
                                               minval=0,
                                               maxval=2,
                                               dtype=tf.int32), tf.float32))

def autoencoder(input_shape = [None, 784],
               n_filters = [1, 10, 10, 10],
               filter_sizes = [3, 3, 3, 3],
               corruption = False):
    """Build a deep denoising autoencoder w/ tied weights.
    Parameters
    ----------
    input_shape : list, optional
        Description
    n_filters : list, optional
        Description
    filter_sizes : list, optional
        Description
    Returns
    -------
    x : Tensor
        Input placeholder to the network
    z : Tensor
        Inner-most latent representation
    y : Tensor
        Output reconstruction of the input
    cost : Tensor
        Overall cost to use for training
    Raises
    ------
    ValueError
        Description
    """

    # Input to network
    x = tf.placeholder(tf.float32, input_shape, name = 'x')
    print(x)

    # Convert 2D input is converted to square
    if len(x.get_shape()) == 2:
        x_dim = np.sqrt(x.get_shape().as_list()[1])
        if x_dim != int(x_dim):
            raise ValueError('Unsupported Input Dimensions')
        x_dim = int(x_dim)
        x_tensor = tf.reshape(x, [-1, x_dim, x_dim, n_filters[0]])
    elif len(x.get_shape()) == 4:
        x_tensor = x
    else:
        raise ValueError('Unsupported Input Dimensions')
    current_input = x_tensor

    # Optionally apply denoising autoencoder
    if corruption:
        current_input = corrupt(current_input)

    # Encoder
    encoder = []
    shapes = []
    for layer_i, n_output in enumerate(n_filters[1:]):
        n_input = current_input.get_shape().as_list()[3] # This will be # Channels
        shapes.append(current_input.get_shape().as_list())
        W = tf.Variable(
            tf.random_uniform([
                filter_sizes[layer_i],
                filter_sizes[layer_i],
                n_input, n_output],
                -1.0 / math.sqrt(n_input),
                1.0/math.sqrt(n_input))) # This is so we don't have to initialize ourselves
        b = tf.Variable(tf.zeros([n_output]))
        encoder.append(W)
        output = lrelu(
            tf.add(tf.nn.conv2d(
                current_input, W, strides = [1,2,2,1], padding = 'SAME'), b))
        current_input = output
        print(W)
        print(b)
        print(output)

    # Store the latent representation
    z = current_input
    print(z)
    encoder.reverse()
    shapes.reverse()

    for layer_i, shape in enumerate(shapes):
        W = encoder[layer_i]
        b = tf.Variable(tf.zeros([W.get_shape().as_list()[2]]))
        output = lrelu(tf.add(
            tf.nn.conv2d_transpose(
                current_input, W,
                tf.stack([tf.shape(x)[0], shape[1], shape[2], shape[3]]),
                strides = [1,2,2,1], padding = 'SAME'), b))
        current_input = output

    # Now we have a reconstruction
    y = current_input
    cost = tf.reduce_sum(tf.square(y - x_tensor))

    return {'x': x, 'z': z, 'y': y, 'cost': cost}

# %%
def test_mnist():
    """Test the convolutional autoencder using MNIST."""
    # %%
    import tensorflow as tf
    import tensorflow.examples.tutorials.mnist.input_data as input_data
    import matplotlib.pyplot as plt

    # %%
    # load MNIST as before
    mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
    mean_img = np.mean(mnist.train.images, axis=0)
    ae = autoencoder()

    # %%
    learning_rate = 0.01
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(ae['cost'])

    # Create saver
    saver = tf.train.Saver(tf.trainable_variables())

    # %%
    # We create a session to use the graph
    sess = tf.Session()
    sess.run(tf.global_variables_initializer())

    # %%
    # Fit all training data
    batch_size = 100
    n_epochs = 1
    for epoch_i in range(n_epochs):
        for batch_i in range(mnist.train.num_examples // batch_size):
            batch_xs, _ = mnist.train.next_batch(batch_size)
            train = np.array([img - mean_img for img in batch_xs])
            sess.run(optimizer, feed_dict={ae['x']: train})
        print(epoch_i, sess.run(ae['cost'], feed_dict={ae['x']: train}))

    save_path = saver.save(sess, "AutoEncoderCheckpoints/AutoEncoderMNIST.ckpt")
    print("Model saved in path: %s" % save_path)

    # %%
    # Plot example reconstructions
    n_examples = 10
    test_xs, _ = mnist.test.next_batch(n_examples)
    test_xs_norm = np.array([img - mean_img for img in test_xs])
    recon, latent = sess.run([ae['y'], ae['z']], feed_dict={ae['x']: test_xs_norm})
    print(recon.shape)
    print(latent.shape)
    fig, axs = plt.subplots(2, n_examples, figsize=(20, 6))
    for example_i in range(n_examples):
        axs[0][example_i].imshow(
            np.reshape(test_xs[example_i, :], (28, 28)))
        axs[1][example_i].imshow(
            np.reshape(
                np.reshape(recon[example_i, ...], (784,)) + mean_img,
                (28, 28)))
    fig.show()
    plt.draw()
#     plt.waitforbuttonpress()

    new_fig, new_axs = plt.subplots(10, n_examples, figsize = (20,20))
    for chan in range(10):
        for example_i in range(n_examples):
            new_axs[chan][example_i].imshow(
            np.reshape(latent[example_i,...,chan],
            (4,4)))
    new_fig.show()
    plt.draw()

# %%
if __name__ == '__main__':
    test_mnist()

Code not working to train MLP without retraining the auto encoder

aeMLP_saver = tf.train.import_meta_graph('AutoEncoderCheckpoints/AutoEncoderMNIST.ckpt.meta')
aeMLP_graph = tf.get_default_graph()

weights = {
    'h1': tf.Variable(tf.random_normal([160, 320])),
    'h2': tf.Variable(tf.random_normal([320, 640])),
    'out': tf.Variable(tf.random_normal([640, 10]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([320])),
    'b2': tf.Variable(tf.random_normal([640])),
    'out': tf.Variable(tf.random_normal([10]))
}

# with tf.Graph().as_default():
with tf.variable_scope("model2"):
    x_plh = aeMLP_graph.get_tensor_by_name('x:0')
    output_conv = aeMLP_graph.get_tensor_by_name('lrelu_2/add:0')

    output_conv_sg = tf.stop_gradient(output_conv)
    print(output_conv_sg)

    output_conv_shape = output_conv_sg.get_shape().as_list()
    print(output_conv_shape)

    new_input = tf.reshape(output_conv_sg, [-1, 160])

    Y = tf.placeholder("float", [None, 10])
    # Hidden fully connected layer with 256 neurons
    layer_1 = tf.add(tf.matmul(new_input, weights['h1']), biases['b1'])
    # Hidden fully connected layer with 256 neurons
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    # Output fully connected layer with a neuron for each class
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    print(layer_1)
    print(layer_2)
    print(out_layer)
    y_pred = tf.nn.softmax(out_layer)

    correct_prediction = tf.equal(tf.argmax(y_pred,1), tf.argmax(Y,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=Y))
    learning_rate = 0.001
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_op)


# out_layer_mlp, y_pred = multilayer_perceptron(new_input)

model_2_variables_list = tf.get_collection(
tf.GraphKeys.GLOBAL_VARIABLES, 
scope="model2"
)

print(model_2_variables_list)

init2 = tf.variables_initializer(model_2_variables_list)

import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data
import matplotlib.pyplot as plt

# %%
# load MNIST as before
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
mean_img = np.mean(mnist.train.images, axis=0)

# Create saver
saver_new = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init2)

     # %%
    # Fit all training data
    batch_size = 100
    n_epochs = 1
    for epoch_i in range(n_epochs):
        for batch_i in range(mnist.train.num_examples // batch_size):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            train = np.array([img - mean_img for img in batch_xs])
            _,c = sess.run([optimizer, loss_op], feed_dict={x_plh: train, Y: batch_ys})
        print(epoch_i, " || ", c)
        batch_xt, batch_yt = mnist.test.next_batch(batch_size)
        test = train = np.array([img - mean_img for img in batch_xt])
        acc = sess.run(accuracy, feed_dict = {x_plh: test, Y: batch_yt})
        print("Accuracy is: ", acc)

    save_path = saver_new.save(sess, "AutoEncoderCheckpoints/AutoEncoderClassifierMNIST.ckpt")
    print("Model saved in path: %s" % save_path)

Both of the codes above are runnable so you will be able to recreate the errors I am getting. I have read some posts about possibly freezing the graph but I am not sure if that is the best solution.

Answer 1

This post will be more useful to others if you actually include the error you are getting.

The first obvious issue is that importing the graph tf.train.import_meta_graph does not initialize the variables. See https://www.tensorflow.org/api_docs/python/tf/train/import_meta_graph for an example of calling restore to actually restore the variable values.

At a high-level, since you have the code that builds your original training graph, going through save/restore is likely unnecessary. One possible way you can go about this is to build the whole graph (AE and MLP). Train AE first (by calling sess.run with AE's training op), then stop_gradients and train MLP. You can also build separate towers that share variables in you want. The reason I suggest not going through save/restore (unless you have some other use case for it) is because relying on tensor names can be brittle.

Using a pre-trained model to train a new model in tensor flow

Question

1 answers

solution1
0 2018-07-18 00:43:38

Using a pre-trained model to train a new model in tensor flow

Question

1 answers

solution1 0 2018-07-18 00:43:38

solution1
0 2018-07-18 00:43:38