在没有高级API的情况下重新训练CNN

Question

Summary : I am trying to retrain a simple CNN for MNIST without using a high-level API. 摘要：我正在尝试为MNIST重新训练一个简单的CNN，而不使用高级API。 I already succeeded doing so by retraining the entire network, but my current goal is to retrain only the last one or two Fully Connected layers. 我已经通过重新培训整个网络而成功地做到了这一点，但是我目前的目标是仅重新培训最后一层或两个完全连接层。

Work so far: Say I have a CNN with the following structure 到目前为止的工作：假设我有一个具有以下结构的CNN

Convolutional Layer 卷积层
RELU RELU
Pooling Layer 池化层
Convolutional Layer 卷积层
RELU RELU
Pooling Layer 池化层
Fully Connected Layer 全连接层
RELU RELU
Dropout Layer 辍学层
Fully Connected Layer to 10 output classes 完全连接的层到10个输出类别

My goal is to retrain either the last Fully Connected Layer or the last two Fully Connected Layers. 我的目标是重新训练最后一个全连接层或最后两个全连接层。

An example of a Convolutional layer: 卷积层的示例：

W_conv1 = tf.get_variable("W", [5, 5, 1, 32],
      initializer=tf.truncated_normal_initializer(stddev=np.sqrt(2.0 / 784)))
b_conv1 = tf.get_variable("b", initializer=tf.constant(0.1, shape=[32]))
z = tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME')
z += b_conv1
h_conv1 = tf.nn.relu(z + b_conv1)

An example of a Fully Connected Layer: 全连接层的示例：

input_size = 7 * 7 * 64
W_fc1 = tf.get_variable("W", [input_size, 1024], initializer=tf.truncated_normal_initializer(stddev=np.sqrt(2.0/input_size)))
b_fc1 = tf.get_variable("b", initializer=tf.constant(0.1, shape=[1024]))
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

My assumption : When performing the backpropagation on the new dataset, I simply make sure that my weights W and b (from W*x+b) are fixed in the non-fully connected layers. 我的假设 ：在新数据集上进行反向传播时，只需确保将权重W和b（来自W * x + b）固定在非完全连接的层中。

A first thought on how to do this : Save the W and b, perform a backpropagation step, and replace the new W and b with the old one in the layers I don't want changed. 关于如何执行此操作的第一个想法 ：保存W和b，执行向后传播步骤，然后在我不想更改的层中用旧的W和b替换新的W和b。

My thoughts on this first approach : 我对第一种方法的想法 ：

This is computational intensive and wastes memory. 这是计算密集型的并且浪费了内存。 The whole advantage of only doing the last layer is to not have to do the others 只做最后一层的全部好处就是不必做其他的事情
Backpropagation might function different if not applied on all layers? 如果不应用于所有图层，反向传播功能可能会有所不同？

My question : 我的问题 ：

How do I properly retrain particular layers in a Neural Network when not using high-level APIs. 不使用高级API时如何正确地训练神经网络中的特定层。 Both conceptual and coding answers are welcome. 无论是概念上的答案还是编码上的答案都是受欢迎的。

PS Fully aware how one can do it using high-level APIs. PS完全了解如何使用高级API做到这一点。 Example: https://towardsdatascience.com/how-to-train-your-model-dramatically-faster-9ad063f0f718 . 例如： https : //towardsdatascience.com/how-to-train-your-model-dramatically-faster-9ad063f0f718 。 Just don't want Neural Networks to be magic, I want to know what actually happens 只是不想让神经网络变得神奇，我想知道实际发生了什么

Answer 1

The minimize function of optimizers has an optional argument for choosing which variables to train, eg: 优化器的Minimal函数具有一个可选参数，用于选择要训练的变量，例如：

optimizer_step = tf.train.MomentumOptimizer(learning_rate, momentum, name='MomentumOptimizer').minimize(loss, var_list=training_variables)

You can get the variables for the layers you want to train by using tf.trainable_variables(): 您可以使用tf.trainable_variables（）获得要训练的图层的变量：

vars1 = tf.trainable_variables()

# FC Layer
input_size = 7 * 7 * 64
W_fc1 = tf.get_variable("W", [input_size, 1024], initializer=tf.truncated_normal_initializer(stddev=np.sqrt(2.0/input_size)))
b_fc1 = tf.get_variable("b", initializer=tf.constant(0.1, shape=[1024]))
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

vars2 = tf.trainable_variables()

training_variables = list(set(vars2) - set(vars1))

Edit: actually, using tf.trainable_variables is probably overkill in this case, since you have W_fc1 and b_fc1 directly. 编辑：实际上，在这种情况下，使用tf.trainable_variables可能会过大，因为您直接拥有W_fc1和b_fc1。 This would be useful for example if you had used tf.layers.dense to create a dense layer, where you would not have the variables explicitly. 例如，如果您使用tf.layers.dense来创建一个密集层（在该层中您没有明确的变量），这将非常有用。

在没有高级API的情况下重新训练CNN

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-01-22 08:58:10

在没有高级API的情况下重新训练CNN

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-01-22 08:58:10

解决方案1
1 已采纳 2019-01-22 08:58:10