卷积神经网络：关于训练的问题，为什么偏倚数组通过训练而不是权重矩阵演化？

Question

I wanted to test the implementation of a convolutional nn with 2 conv layers and two fully connected layers. 我想测试具有2个conv层和2个完全连接的层的卷积nn的实现。 The simple neural network model works fine for me but I have an issue when I add the conv layers. 简单的神经网络模型对我来说很好用，但是在添加conv层时遇到了一个问题。 Initially I wanted to tune different hyperparameters to optimize the performance of the model. 最初，我想调整不同的超参数以优化模型的性能。 To try to understand why the training was not working (the validation accuracy staying around the value 0.1), I also added visualization through TensorBoard. 为了尝试理解为什么训练不起作用（验证精度保持在0.1左右），我还通过TensorBoard添加了可视化。

When I run the following code, with just one set of hyperparameters, the model isn't really training because the accuracy never increases. 当我运行下面的代码时，仅使用一组超参数，该模型并不是真正的训练对象，因为其准确性永远不会提高。 However, I was able to see with TensorBoard that all my variables were initialized, and the biaises were updated but not the weight matrices for the different layers. 但是，我可以通过TensorBoard看到我的所有变量都已初始化，并且偏移值已更新，但不同层的权重矩阵却未更新。

This is what I have with TensorBoard: 这就是我使用TensorBoard所拥有的：

培训的结果

I really don't understand why the model struggles to update the weights. 我真的不明白为什么模型很难更新权重。 I know it can sometimes come from the initialization but I think I used the right options, right? 我知道有时它可能来自初始化，但我认为我使用了正确的选项，对吗？

If you have any idea where the bug would be I'd be really interested ! 如果您有任何疑问，我将非常感兴趣！

PS : the code isn't the most elegant but when I saw it wasn't working I wanted it to be as simple as possible PS：代码不是最优雅，但是当我看到它不起作用时，我希望它尽可能简单

from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

LOGDIR = 'tensorboard_claire/tuning2'

patch_size = 5
kernel_size = 2
depth = 16
num_hidden = 64

def generate_hyperparameters():
# Randomly choose values for the hyperparameters.
    return {"learning_rate": 10 ** np.random.uniform(-3, -1),
            "batch_size": np.random.randint(1, 100),
            "dropout": np.random.uniform(0, 1),
            "stddev": 10 ** np.random.uniform(-4, 2)}

pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
save = pickle.load(f)
train_dataset = save['train_dataset']
train_labels = save['train_labels']
valid_dataset = save['valid_dataset']
valid_labels = save['valid_labels']
test_dataset = save['test_dataset']
test_labels = save['test_labels']
del save  # hint to help gc free up memory
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

image_size = 28
num_labels = 10
num_channels = 1 # grayscale

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size, image_size, 
    num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

def conv_layer(data, weights, biases):    
  conv = tf.nn.conv2d(data, weights, [1, 2, 2, 1], padding='SAME')
  hidden = tf.nn.relu(conv + biases)
  pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')

  return pool

def reshape_drop(data):
  shape = data.get_shape().as_list()
  reshape = tf.reshape(data, [shape[0], shape[1] * shape[2] * shape[3]])
  return reshape

def train_cnn_and_compute_accuracy(hyperparameters, name='train'):
# Construct a deep network, train it, and return the accuracy on the
# validation data.
  batch_size = hyperparameters["batch_size"]
  std = hyperparameters["stddev"]

  graph = tf.Graph()
  with graph.as_default():   
    # Input data.
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)

    # Variables

    weights = {
       'conv1' : tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=std), name='convw1'),
       'conv2' : tf.Variable(tf.random_normal([patch_size, patch_size, depth, depth], stddev=std), name='convw2'),
       'fc1' : tf.Variable(tf.random_normal([2 * 2 * depth, num_hidden], stddev=std), name='fcw1'),
       'fc2' : tf.Variable(tf.random_normal([num_hidden, num_labels], stddev=std), name='fcw2')
       }

   biases = {
       'conv1' : tf.Variable(tf.zeros([depth]), name='convb1'),
       'conv2' : tf.Variable(tf.constant(1.0, shape=[depth]), name='convb2'),
       'fc1' : tf.Variable(tf.constant(1.0, shape=[num_hidden]), name='fcb1'),
       'fc2' : tf.Variable(tf.constant(1.0, shape=[num_labels]), name='fcb2')
       }

   # Neural network model with 2 convolutional layers and 2 fully connected layers
   # with max pooling and dropout

   with tf.name_scope("1st_conv_layer"):
       conv_1_train = conv_layer(tf_train_dataset, weights['conv1'], biases['conv1'])
       conv_1_valid = conv_layer(tf_valid_dataset, weights['conv1'], biases['conv1'])

       tf.summary.histogram("convw1", weights['conv1'])
       tf.summary.histogram("convb1", biases['conv1'])

    with tf.name_scope("2nd_conv_layer"):
        conv_2_train = conv_layer(conv_1_train, weights['conv2'], biases['conv2'])
        conv_2_valid = conv_layer(conv_1_valid, weights['conv2'], biases['conv2'])

        tf.summary.histogram("convw2", weights['conv2'])
        tf.summary.histogram("convb2", biases['conv2'])

    with tf.name_scope('dropout'):
        dropped_train = tf.nn.dropout(conv_2_train, hyperparameters["dropout"])
        dropped_valid = tf.nn.dropout(conv_2_valid, hyperparameters["dropout"])
        reshape_train = reshape_drop(dropped_train)
        reshape_valid = reshape_drop(dropped_valid)

    with tf.name_scope("1st_fc_layer"):
        fc1_train = tf.nn.relu(tf.matmul(reshape_train, weights['fc1']) + biases['fc1'])
        fc1_valid = tf.nn.relu(tf.matmul(reshape_valid, weights['fc1']) + biases['fc1'])

        tf.summary.histogram("fcw1", weights['fc1'])
        tf.summary.histogram("fcb1", biases['fc1'])

    with tf.name_scope("2nd_fc_layer"):
        fc2_train = tf.nn.relu(tf.matmul(fc1_train, weights['fc2']) + biases['fc2'])
        fc2_valid = tf.nn.relu(tf.matmul(fc1_valid, weights['fc2']) + biases['fc2'])

        tf.summary.histogram("fcw2", weights['fc2'])
        tf.summary.histogram("fcb2", biases['fc2'])

    # Predictions

    logits = fc2_train
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(fc2_valid)

    # Loss with or without regularization
    with tf.name_scope('xentropy'):
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
        tf.summary.scalar("xent", loss)

    # Decaying learning rate and GradientDescent optimizer

    with tf.name_scope('train'):
        global_step = tf.Variable(0, trainable=False)
        learning_rate = tf.train.exponential_decay(hyperparameters["learning_rate"], global_step, 100, 0.96, staircase=True)
        tf.summary.scalar("learning_rate", learning_rate)
        optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=global_step)

    with tf.name_scope("valid_accuracy"):
        correct_prediction = tf.equal(tf.argmax(valid_prediction, 1), tf.argmax(valid_labels, 1))
    #Casts a tensor to a new type.
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        tf.summary.scalar("valid_accuracy", accuracy)

    num_steps = 1001
    val_acc = 0

    with tf.Session(graph=graph) as session:
        summ = tf.summary.merge_all()
        tf.global_variables_initializer().run()
        writer = tf.summary.FileWriter(LOGDIR+"/"+make_hparam_string(hyperparameters))
        writer.add_graph(session.graph)

        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions, summary = session.run([optimizer, loss, train_prediction, summ], feed_dict=feed_dict)

            if step in np.arange(0, num_steps, 70):
                print("Current step: " + str(step))
                val_acc = accuracy.eval()
                print("Validation accuracy : " + str(val_acc))

            if step % 5 == 0:
                writer.add_summary(summary, step)

    return val_acc

    session.close()
    writer.close()

def make_hparam_string(h):
    learning_rate = h["learning_rate"]
    batch_size = h["batch_size"]
    dropout = h["dropout"]
    stddev = h["stddev"]
    return ("lr_" + str(learning_rate) + ",dp_" + str(dropout) + ",batch_size_" + str(batch_size) + ",stddev_" + str(stddev))

# Generate a bunch of hyperparameter configurations.
hyperparameter_configurations = [generate_hyperparameters() for _ in range(5)]

# Launch some experiments.
results = []
for hyperparameters in hyperparameter_configurations:
    print("Hyperparameters : ", hyperparameters.values())
    acc = train_cnn_and_compute_accuracy(hyperparameters)
    results.append(acc)

Answer 1

The code is a bit messy, but in any case the std of 100 is enormous, it should be around 0.1 and less. 代码有点混乱，但是无论如何std 100很大，它应该在0.1左右或更小。 Next thing is that you should not use relu (or any other activation function) for the last layer before the soft max. 接下来的事情是，您不应在soft max之前的最后一层使用relu（或任何其他激活功能）。 Then dropout limits are quite wide as well, if you want to keep them, at least try to remove drop-out and make sure network can train without it (if you randomly get 0.1 your weights will hardly get updated) and return it afterwards. 然后，辍学限制也很广泛，如果要保留它们，至少要尝试删除辍学，并确保没有它的网络也可以进行训练（如果您随机获得0.1，则权重将几乎不会更新），然后再返回。
Try to fix this first and if it doesn't help, we can look closer. 尝试先解决此问题，如果无济于事，我们可以仔细看一看。

卷积神经网络：关于训练的问题，为什么偏倚数组通过训练而不是权重矩阵演化？

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-11-06 17:43:40

卷积神经网络：关于训练的问题，为什么偏倚数组通过训练而不是权重矩阵演化？

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-11-06 17:43:40

解决方案1
3 已采纳 2017-11-06 17:43:40