简体   繁体   中英

Tensorflow Hub Module's trainable variables are not updated during training

My question is quite similar to one asked here: https://github.com/tensorflow/hub/issues/269 . But those question remains unanswered, so I will ask it here. Steps to reproduce:

tensorflow 1.14.0 tensorflow-hub 0.5.0 Python 3.7.4 Windows 10

Here is the sample notebook with the problem reproduced: https://colab.research.google.com/drive/1PKUyoQRP3othu6cu7v7N7yn8K2pjkuKP

  1. Load a tensor_hub Inception 3 module as trainable:

    module_spec = hub.load_module_spec('https://tfhub.dev/google/imagenet/inception_v3/feature_vector/3')
    height, width = hub.get_expected_image_size(module_spec)
        with tf.Graph().as_default() as graph:
            resized_input_tensor =  tf.compat.v1.placeholder(tf.float32, [None, height, width, 3])
            module = hub.Module(module_spec, trainable=True, tags={"train"})  
            bottleneck_tensor = module(inputs=dict(images=resized_input_tensor, batch_norm_momentum=0.997),signature="image_feature_vector_with_bn_hparams")  
  1. Save all the Trainable/Model/Global variables created at this moment to separate 'base model' lists (3 lists) Example of vars: base_model trainable_variables vars : 188, ['module/InceptionV3/Conv2d_1a_3x3/weights:0', 'module/InceptionV3/Conv2d_1a_3x3/BatchNorm/beta:0'.. base_model model_variables vars : 188, ['module/InceptionV3/Conv2d_1a_3x3/BatchNorm/moving_mean:0', 'module/InceptionV3/Conv2d_1a_3x3/BatchNorm/moving_variance:0 base_model variables vars : 0, [] #empty list

  2. Add a custom classification layer on top of model:


    batch_size, previous_tensor_size = bottleneck_tensor.get_shape().as_list()
    ground_truth_input = tf.compat.v1.placeholder(tf.int64, [batch_size], name='GroundTruthInput')
    initial_value = tf.random.truncated_normal([previous_tensor_size, class_count], stddev=0.001)
    layer_weights = tf.Variable(initial_value, name='final_weights')
    layer_biases = tf.Variable(tf.zeros([class_count]), name='final_biases')
    logits = tf.matmul(hidden_layer, layer_weights) + layer_biases
    final_tensor = tf.nn.softmax(logits, name=final_tensor_name)

  1. Again, get all newly added variable names into 3 new different 'Custom' lists:

    custom trainable_variables vars: 2, ['final_weights:0', 'final_biases:0'] custom model_variables vars: 0, [] custom variables vars: 0, []

  2. Add train operation. Because of base model has batch normalization, we have to care about update ops. That is why I use tf.contrib.training.create_train_op:

    cross_entropy_all = tf.compat.v1.losses.sparse_softmax_cross_entropy(labels=ground_truth_input, logits=logits)
    optimizer = tf.compat.v1.train.AdamOptimizer()

    #the update ops are set to the contents of the tf.GraphKeys.UPDATE_OPS collection.
    #variables to train will default to all tf.compat.v1.trainable_variables().
    train_step = tf.contrib.training.create_train_op(cross_entropy_mean, optimizer)
    1. Again, get all newly added variable names into 3 new different 'Optimizer' lists: optimizer trainable_variables vars: 0, [] optimizer model_variables vars: 0, [] optimizer variables vars: 383, ['global_step:0', 'beta1_power:0', 'beta2_power:0', 'module/InceptionV3/Conv2d_1a_3x3/weights/Adam:0', 'module/InceptionV3/Conv2d_1a_3x3/weights/Adam_1:0', 'module/InceptionV3/Conv2d_1a_3x3/BatchNorm/beta/Adam:0', 'module/InceptionV3/Conv2d_1a_3x3/BatchNorm/beta/Adam_1:0', 'module/InceptionV3/Conv2d_2a_3x3/weights/Adam:0', 'module/InceptionV3/Conv2d_2a_3x3/weights/Adam_1:0', 'module/InceptionV3/Conv2d_2a_3x3/BatchNorm/beta/Adam:0',...

Now do the regular training:


    with tf.compat.v1.Session(graph=graph) as sess:
        # Initialize all weights: for the module to their pretrained values,
        # and for the newly added retraining layer to random initial values.
        init = tf.compat.v1.global_variables_initializer()
        sess.run(init)

        #dump the checkssum for all the variables lists collected during graph building

        for i in range(1000):
            # Get a batch of input resized images values, calculated fresh
            (train_data, train_ground_truth) = get_random_batch_data(sess, image_lists....)

            #dump the checksum for all the variables lists collected during graph building


            # Feed the input placeholder and ground truth into the graph, and run a training
            # step.
            sess.run([train_step], feed_dict = {
                resized_input_tensor: train_data,
                ground_truth_input: train_ground_truth})

            #dump now again the checksum for all the variables lists collected during graph building

So, after the each training step the checksum is changed only for the two variable lists, custom trainable and optimizer global:


    base_model trainable_variables, 2697202.0, cf4682249fc1f48e9a346149f84e503d unchanged
    base_model model_variables,
        2936996.0, 6f995f5f0f032604a49a96ceec576cf7 unchanged
    base_model variables, 0, d41d8cd98f00b204e9800998ecf8427e unchanged
    custom trainable_variables, -0.7915199408307672, 889c333a56b9496d412eacdcbeb3bef1 **changed**
    custom model_variables, 0, d41d8cd98f00b204e9800998ecf8427e unchanged
    custom variables, 0, d41d8cd98f00b204e9800998ecf8427e unchanged
    optimizer trainable_variables, 0, d41d8cd98f00b204e9800998ecf8427e unchanged
    optimizer model_variables, 0, d41d8cd98f00b204e9800998ecf8427e unchanged
    optimizer variables,
        5580902.81437762, d2cb2d4b253a1c12452f560eea35ac42 **changed**

So, the question is, why base model's Trainable variables have not been changed ?? They are BatchNorm/moving_mean, BatchNorm/moving_variance, Conv2d_1a_3x3/weights, which definitely should be updated during training. Even more, moving_variance also should be changed, because UPDATE_OPS are included as dependency for the train step inside the tf.contrib.training.create_train_op call. I've checked the UPDATE_OPS list, it contains valid values like: Update ops: tf.Operation 'module_apply_image_feature_vector_with_bn_hparams/InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/AssignMovingAvg/AssignSubVariableOp' type=AssignSubVariableOp>,

Okay, after deep debugging of a issue I found that the problem is in following: just getting the variable from a global variable list and getting it's value using eval() on it is not enough: it will return some value but it is not current (at least this is what happening for the variables of imported model with dtype=resource).

To calculate current value we have to get first a value tensor using variable. value() or variable. read_value() and doing eval() for it (for the returned 'value' tensor).

This resolves the question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM