如何在优化器中获得偏差和神经元权重？

Question

In a TensorFlow optimizer (python) the method apply_dense does get called for the neuron weights (layer connections) and the bias weights but I would like to use both in this method.在 TensorFlow 优化器（python）中，方法apply_dense确实被调用用于神经元权重（层连接）和偏置权重，但我想在此方法中同时使用这两种方法。

def _apply_dense(self, grad, weight):
    ...

For example: A fully connected neural network with two hidden layer with two neurons and a bias for each.例如：一个完全连接的神经网络，有两个隐藏层，有两个神经元，每个神经元都有一个偏置。

If we take a look at layer 2 we get in apply_dense a call for the neuron weights:如果我们看一下第 2 层，我们会在apply_dense中调用神经元权重：

and a call for the bias weights:以及对偏差权重的调用：

But I would either need both matrix in one call of apply_dense or a weight matrix like this:但是我要么在一次调用apply_dense时需要两个矩阵，要么像这样的权重矩阵：

X_2X_4, B_1X_4, ... is just a notation for the weight of the connection between the two neurons. X_2X_4, B_1X_4, ... 只是两个神经元之间连接权重的符号。 Therefore B_1X_4 ist only a placeholder for the weight between B_1 and X_4.因此 B_1X_4 只是 B_1 和 X_4 之间权重的占位符。

How to do this?这该怎么做？

MWE MWE

For an minimal working example here a stochastic gradient descent optimizer implementation with a momentum.对于这里的最小工作示例，具有动量的随机梯度下降优化器实现。 For every layer the momentum of all incoming connections from other neurons is reduced to the mean (see ndims == 2).对于每一层，来自其他神经元的所有传入连接的动量都减少到均值（参见 ndims == 2）。 What i need instead is the mean of not only the momentum values from the incoming neuron connections but also from the incoming bias connections (as described above).相反，我需要的不仅是来自传入神经元连接的动量值的平均值，还有来自传入偏置连接（如上所述）的动量值的平均值。

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
from tensorflow.python.training import optimizer


class SGDmomentum(optimizer.Optimizer):
    def __init__(self, learning_rate=0.001, mu=0.9, use_locking=False, name="SGDmomentum"):
        super(SGDmomentum, self).__init__(use_locking, name)
        self._lr = learning_rate
        self._mu = mu

        self._lr_t = None
        self._mu_t = None

    def _create_slots(self, var_list):
        for v in var_list:
            self._zeros_slot(v, "a", self._name)

    def _apply_dense(self, grad, weight):
        learning_rate_t = tf.cast(self._lr_t, weight.dtype.base_dtype)
        mu_t = tf.cast(self._mu_t, weight.dtype.base_dtype)
        momentum = self.get_slot(weight, "a")

        if momentum.get_shape().ndims == 2:  # neuron weights
            momentum_mean = tf.reduce_mean(momentum, axis=1, keep_dims=True)
        elif momentum.get_shape().ndims == 1:  # bias weights
            momentum_mean = momentum
        else:
            momentum_mean = momentum

        momentum_update = grad + (mu_t * momentum_mean)
        momentum_t = tf.assign(momentum, momentum_update, use_locking=self._use_locking)

        weight_update = learning_rate_t * momentum_t
        weight_t = tf.assign_sub(weight, weight_update, use_locking=self._use_locking)

        return tf.group(*[weight_t, momentum_t])

    def _prepare(self):
        self._lr_t = tf.convert_to_tensor(self._lr, name="learning_rate")
        self._mu_t = tf.convert_to_tensor(self._mu, name="momentum_term")

For a simple neural network: https://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/examples/3_NeuralNetworks/multilayer_perceptron.py (only change the optimizer to the custom SGDmomentum optimizer)对于一个简单的神经网络： https ://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/examples/3_NeuralNetworks/multilayer_perceptron.py（只把优化器改成自定义的SGDmomentum优化器）

Answer 1

Update : I'll try to give a better answer (or at least some ideas) now that I have some understanding of your goal, but, as you suggest in the comments, there is probably not infallible way of doing this in TensorFlow.更新：既然我对您的目标有了一些了解，我将尝试给出更好的答案（或至少一些想法），但是，正如您在评论中建议的那样，在 TensorFlow 中可能没有万无一失的方法来做到这一点。

Since TF is a general computation framework, there is no good way of determining what pairs of weights and biases are there in a model (or if it is a neural network at all).由于 TF 是一个通用的计算框架，因此没有很好的方法来确定模型中有哪些权重和偏差对（或者它是否是一个神经网络）。 Here are some possible approaches to the problem that I can think of:以下是我能想到的解决问题的一些可能方法：

Annotating the tensors.注释张量。 This is probably not practical since you already said you have no control over the model, but an easy option would be to add extra attributes to the tensors to signify the weight/bias relationships.这可能不切实际，因为您已经说过您无法控制模型，但一个简单的选择是向张量添加额外的属性以表示权重/偏差关系。 For example, you could do something like W.bias = B and B.weight = W , and then in _apply_dense check hasattr(weight, "bias") and hasattr(weight, "weight") (there may be some better designs in this sense).例如，您可以执行类似W.bias = B和B.weight = W的操作，然后在_apply_dense中检查hasattr(weight, "bias")和hasattr(weight, "weight") （在这种感觉）。
You can look into some framework built on top of TensorFlow where you may have better information about the model structure.您可以查看构建在 TensorFlow 之上的一些框架，您可能会在其中获得有关模型结构的更好信息。 For example, Keras is a layer-based framework that implements its own optimizer classes (based on TensorFlow or Theano).例如， Keras是一个基于层的框架，它实现了自己的优化器类（基于 TensorFlow 或 Theano）。 I'm not too familiar with the code or its extensibility, but probably you have more tools there to use.我不太熟悉代码或其可扩展性，但您可能有更多工具可以使用。
Detect the structure of the network yourself from the optimizer.从优化器中自行检测网络结构。 This is quite complicated, but theoretically possible.这相当复杂，但理论上是可行的。 from the loss tensor passed to the optimizer, it should be possible to "climb up" in the model graph to reach all of its nodes (taking the .op of the tensors and the .inputs of the ops).从传递给优化器的损失张量，应该可以在模型图中“向上爬”以到达其所有节点（采用张量的.op和操作的.inputs ）。 You could detect tensor multiplications and additions with variables and skip everything else (activations, loss computation, etc) to determine the structure of the network;您可以使用变量检测张量乘法和加法，并跳过其他一切（激活、损失计算等）以确定网络的结构； if the model does not match your expectations (eg there are no multiplications or there is a multiplication without a later addition) you can raise an exception indicating that your optimizer cannot be used for that model.如果模型不符合您的期望（例如，没有乘法或没有后来加法的乘法），您可以引发异常，表明您的优化器不能用于该模型。

Old answer, kept for the sake of keeping.旧答案，为了保留而保留。

I'm not 100% clear on what you are trying to do, so I'm not sure if this really answers your question.我不是 100% 清楚你想做什么，所以我不确定这是否真的回答了你的问题。

Let's say you have a dense layer transforming an input of size M to an output of size N .假设您有一个密集层，将大小为M的输入转换为大小为N的输出。 According to the convention you show, you'd have an N × M weights matrix W and a N -sized bias vector B .根据您展示的惯例，您将拥有一个N × M权重矩阵W和一个N大小的偏置向量B 。 Then, an input vector X of size M (or a batch of inputs of size M × K ) would be processed by the layer as W · X + B , and then applying the activation function (in the case of a batch, the addition would be a "broadcasted" operation).然后，大小为M的输入向量X （或大小为M × K的一批输入）将由层处理为W · X + B ，然后应用激活函数（在批处理的情况下，加法将是一个“广播”操作）。 In TensorFlow:在张量流中：

X = ...  # Input batch of size M x K
W = ...  # Weights of size N x M
B = ...  # Biases of size N

Y = tf.matmul(W, X) + B[:, tf.newaxis]  # Output of size N x K
# Activation...

If you want, you can always put W and B together in a single extended weights matrix W *, basically adding B as a new row in W , so W * would be ( N + 1) × M .如果需要，您始终可以将W和B放在一个扩展权重矩阵W * 中，基本上将B添加为W中的新行，因此W * 将为 ( N + 1) × M 。 Then you just need to add a new element to the input vector X containing a constant 1 (or a new row if it's a batch), so you would get X * with size N + 1 (or ( N + 1) × K for a batch).然后，您只需要向输入向量X添加一个新元素，其中包含一个常量 1（如果是批处理，则添加一个新行），因此您将获得大小为N + 1（或 ( N + 1) × K的X *一批）。 The product W * · X * would then give you the same result as before. W * · X * 的乘积会给你和以前一样的结果。 In TensorFlow:在张量流中：

X = ...  # Input batch of size M x K
W_star = ...  # Extended weights of size (N + 1) x M
# You can still have a "view" of the original W and B if you need it
W = W_star[:N]
B = W_star[-1]

X_star = tf.concat([X, tf.ones_like(X[:1])], axis=0)
Y = tf.matmul(W_star, X_star)  # Output of size N x K
# Activation...

Now you can compute gradients and updates for weights and biases together.现在您可以一起计算权重和偏差的梯度和更新。 A drawback of this approach is that if you want to apply regularization then you should be careful to apply it only on the weights part of the matrix, not on the biases.这种方法的一个缺点是，如果你想应用正则化，那么你应该小心地将它应用到矩阵的权重部分，而不是偏差。

如何在优化器中获得偏差和神经元权重？

问题描述

1 个解决方案

解决方案1
1 2017-07-19 09:33:39

如何在优化器中获得偏差和神经元权重？

问题描述

1 个解决方案

解决方案1 1 2017-07-19 09:33:39

解决方案1
1 2017-07-19 09:33:39