简体   繁体   English

在没有高级API的情况下重新训练CNN

[英]Retraining a CNN without a high-level API

Summary : I am trying to retrain a simple CNN for MNIST without using a high-level API. 摘要 :我正在尝试为MNIST重新训练一个简单的CNN,而不使用高级API。 I already succeeded doing so by retraining the entire network, but my current goal is to retrain only the last one or two Fully Connected layers. 我已经通过重新培训整个网络而成功地做到了这一点,但是我目前的目标是仅重新培训最后一层或两个完全连接层。

Work so far: Say I have a CNN with the following structure 到目前为止的工作:假设我有一个具有以下结构的CNN

  • Convolutional Layer 卷积层
  • RELU RELU
  • Pooling Layer 池化层
  • Convolutional Layer 卷积层
  • RELU RELU
  • Pooling Layer 池化层
  • Fully Connected Layer 全连接层
  • RELU RELU
  • Dropout Layer 辍学层
  • Fully Connected Layer to 10 output classes 完全连接的层到10个输出类别

My goal is to retrain either the last Fully Connected Layer or the last two Fully Connected Layers. 我的目标是重新训练最后一个全连接层或最后两个全连接层。

An example of a Convolutional layer: 卷积层的示例:

W_conv1 = tf.get_variable("W", [5, 5, 1, 32],
      initializer=tf.truncated_normal_initializer(stddev=np.sqrt(2.0 / 784)))
b_conv1 = tf.get_variable("b", initializer=tf.constant(0.1, shape=[32]))
z = tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME')
z += b_conv1
h_conv1 = tf.nn.relu(z + b_conv1)

An example of a Fully Connected Layer: 全连接层的示例:

input_size = 7 * 7 * 64
W_fc1 = tf.get_variable("W", [input_size, 1024], initializer=tf.truncated_normal_initializer(stddev=np.sqrt(2.0/input_size)))
b_fc1 = tf.get_variable("b", initializer=tf.constant(0.1, shape=[1024]))
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

My assumption : When performing the backpropagation on the new dataset, I simply make sure that my weights W and b (from W*x+b) are fixed in the non-fully connected layers. 我的假设 :在新数据集上进行反向传播时,只需确保将权重W和b(来自W * x + b)固定在非完全连接的层中。

A first thought on how to do this : Save the W and b, perform a backpropagation step, and replace the new W and b with the old one in the layers I don't want changed. 关于如何执行此操作的第一个想法 :保存W和b,执行向后传播步骤,然后在我不想更改的层中用旧的W和b替换新的W和b。

My thoughts on this first approach : 我对第一种方法的想法

  • This is computational intensive and wastes memory. 这是计算密集型的并且浪费了内存。 The whole advantage of only doing the last layer is to not have to do the others 只做最后一层的全部好处就是不必做其他的事情
  • Backpropagation might function different if not applied on all layers? 如果不应用于所有图层,反向传播功能可能会有所不同?

My question : 我的问题

  • How do I properly retrain particular layers in a Neural Network when not using high-level APIs. 不使用高级API时如何正确地训练神经网络中的特定层。 Both conceptual and coding answers are welcome. 无论是概念上的答案还是编码上的答案都是受欢迎的。

PS Fully aware how one can do it using high-level APIs. PS完全了解如何使用高级API做到这一点。 Example: https://towardsdatascience.com/how-to-train-your-model-dramatically-faster-9ad063f0f718 . 例如: https : //towardsdatascience.com/how-to-train-your-model-dramatically-faster-9ad063f0f718 Just don't want Neural Networks to be magic, I want to know what actually happens 只是不想让神经网络变得神奇,我想知道实际发生了什么

The minimize function of optimizers has an optional argument for choosing which variables to train, eg: 优化器的Minimal函数具有一个可选参数,用于选择要训练的变量,例如:

optimizer_step = tf.train.MomentumOptimizer(learning_rate, momentum, name='MomentumOptimizer').minimize(loss, var_list=training_variables)

You can get the variables for the layers you want to train by using tf.trainable_variables(): 您可以使用tf.trainable_variables()获得要训练的图层的变量:

vars1 = tf.trainable_variables()

# FC Layer
input_size = 7 * 7 * 64
W_fc1 = tf.get_variable("W", [input_size, 1024], initializer=tf.truncated_normal_initializer(stddev=np.sqrt(2.0/input_size)))
b_fc1 = tf.get_variable("b", initializer=tf.constant(0.1, shape=[1024]))
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

vars2 = tf.trainable_variables()

training_variables = list(set(vars2) - set(vars1))

Edit: actually, using tf.trainable_variables is probably overkill in this case, since you have W_fc1 and b_fc1 directly. 编辑:实际上,在这种情况下,使用tf.trainable_variables可能会过大,因为您直接拥有W_fc1和b_fc1。 This would be useful for example if you had used tf.layers.dense to create a dense layer, where you would not have the variables explicitly. 例如,如果您使用tf.layers.dense来创建一个密集层(在该层中您没有明确的变量),这将非常有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Keras作为高级API在Tensorflow上实现批量规范化 - How to implement Batch Normalization on tensorflow with Keras as a high-level API 扭曲中协议行为的高级测试 - High-level test of Protocol behavior in twisted 是否有Python的高级分析模块? - Is there a high-level profiling module for Python? Python 的高级 IMAP 库 - High-Level IMAP library for Python Python中的高级文件操作库? - High-level file operation library in Python? 使用高级API tf.contrib.learn.DNNClassifier时Tensorflow批处理大小是多少 - what is the Tensorflow batch size when you use high-level API tf.contrib.learn.DNNClassifier 高阶函数在python中返回低阶函数? - High-level function returning low-level function in python? 为什么使用tensorflow的估计器高级API和原始API的mnist分类的交叉熵损失在规模上不同? - why the cross entropy loss of mnist classification using tensorflow's estimator high-level API and raw API are different in scale? 散景:难以在Jupyter Notebook中使用高级图表 - Bokeh: difficulty using High-level Chart in Jupyter Notebook 完全开源的实现与Spotify API兼容,从自定义音乐文件中提取高级音频功能? - Fully Open-sourced Implementation Compatible with the Spotify API to Extract High-level Audio Features from Custom Music Files?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM