将tf.contrib.opt.ScipyOptimizerInterface与tf.keras.layers一起使用，损失不会改变

Question

I want to use the external optimizer interface within tensorflow, to use newton optimizers, as tf.train only has first order gradient descent optimizers. 我想在tensorflow中使用外部优化器接口，以使用牛顿优化器，因为tf.train仅具有一阶梯度下降优化器。 At the same time, i want to build my network using tf.keras.layers, as it is way easier than using tf.Variables when building large, complex networks. 同时，我想使用tf.keras.layers构建我的网络，因为在构建大型复杂网络时，它比使用tf.Variables更容易。 I will show my issue with the following, simple 1D linear regression example: 我将通过以下简单的一维线性回归示例展示我的问题：

import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np

#generate data
no = 100
data_x = np.linspace(0,1,no)
data_y = 2 * data_x + 2 + np.random.uniform(-0.5,0.5,no)
data_y = data_y.reshape(no,1)
data_x = data_x.reshape(no,1)

# Make model using keras layers and train
x = tf.placeholder(dtype=tf.float32, shape=[None,1])
y = tf.placeholder(dtype=tf.float32, shape=[None,1])

output = tf.keras.layers.Dense(1, activation=None)(x)

loss = tf.losses.mean_squared_error(data_y, output)
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method="L-BFGS-B")

sess = K.get_session()
sess.run(tf.global_variables_initializer())

tf_dict = {x : data_x, y : data_y}
optimizer.minimize(sess, feed_dict = tf_dict, fetches=[loss], loss_callback=lambda x: print("Loss:", x))

When running this, the loss just does not change at all. 运行此命令时，损失根本不会改变。 When using any other optimizer from tf.train, it works fine. 使用tf.train中的任何其他优化器时，它都可以正常工作。 Also, when using tf.layers.Dense() instead of tf.keras.layers.Dense() it does work using the ScipyOptimizerInterface. 另外，当使用tf.layers.Dense（）代替tf.keras.layers.Dense（）时，它确实可以使用ScipyOptimizerInterface起作用。 So really the question is what is the difference between tf.keras.layers.Dense() and tf.layers.Dense(). 所以真正的问题是tf.keras.layers.Dense（）和tf.layers.Dense（）之间有什么区别。 I saw that the Variables created by tf.layers.Dense() are of type tf.float32_ref while the Variables created by tf.keras.layers.Dense() are of type tf.float32. 我看到tf.layers.Dense（）创建的变量的类型为tf.float32_ref，而tf.keras.layers.Dense（）创建的变量的类型为tf.float32。 As far as I now, _ref indicates that this tensor is mutable. 就我现在而言，_ref表示该张量是可变的。 So maybe that's the issue? 所以也许就是这个问题？ But then again, any other optimizer from tf.train works fine with keras layers. 但是话又说回来，tf.train中的任何其他优化器都可以在keras层上正常工作。

Thanks 谢谢

Answer 1

I think the problem is with the line 我认为问题出在线路上

output = tf.keras.layers.Dense(1, activation=None)(x)

In this format output is not a layer but rather the output of a layer, which might be preventing the wrapper from collecting the weights and biases of the layer and feed them to the optimizer. 在这种格式下，输出不是图层，而是图层的输出，这可能会阻止包装程序收集图层的权重和偏差并将其馈送到优化器。 Try to write it in two lines eg 尝试写成两行，例如

output = tf.keras.layers.Dense(1, activation=None)
res = output(x)

If you want to keep the original format then you might have to manually collect all trainables and feed them to the optimizer via the var_list option 如果要保留原始格式，则可能必须手动收集所有可训练对象，然后通过var_list选项将它们提供给优化器

optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, var_list = [Trainables], method="L-BFGS-B")

Hope this helps. 希望这可以帮助。

Answer 2

After a lot of digging I was able to find a possible explanation. 经过大量的挖掘，我找到了可能的解释。

ScipyOptimizerInterface uses feed_dicts to simulate the updates of your variables during the optimization process. ScipyOptimizerInterface在优化过程中使用feed_dicts模拟变量的更新。 It only does an assign operation at the very end. 它仅在最后执行分配操作。 In contrast, tf.train optimizers always do assign operations. 相反，tf.train优化器始终会分配操作。 The code of ScipyOptimizerInterface is not that complex so you can verify this easily. ScipyOptimizerInterface的代码并不那么复杂，因此您可以轻松地进行验证。

Now the problem is that assigining variables with feed_dict is working mostly by accident. 现在的问题是，使用feed_dict辅助变量在大多数情况下是偶然的。 Here is a link where I learnt about this. 这是我了解到的链接。 In other words, assigning variables via feed dict, which is what ScipyOptimizerInterface does, is a hacky way of doing updates. 换句话说，通过feed dict分配变量（这是ScipyOptimizerInterface所做的）是一种执行更新的简单方法。

Now this hack mostly works, except when it does not. 现在，这种破解方法大多数都可以使用，除非无效。 tf.keras.layers.Dense uses ResourceVariables to model the weights of the model. tf.keras.layers.Dense使用ResourceVariables建模模型的权重。 This is an improved version of simple Variables that has cleaner read/write semantics. 这是简单变量的改进版本，具有更清晰的读/写语义。 The problem is that under the new semantics the feed dict update happens after the loss calculation. 问题在于，在新的语义下，提要字典更新会在损失计算之后发生。 The link above gives some explanations. 上面的链接给出了一些解释。

Now tf.layers is currently a thin wrapper around tf.keras.layer so I am not sure why it would work. 现在，tf.layers目前是tf.keras.layer的薄包装，因此我不确定为什么会起作用。 Maybe there is some compatibility check somewhere in the code. 也许在代码中的某些地方进行了兼容性检查。

The solutions to adress this are somewhat simple. 解决这个问题的方法有些简单。

Either avoid using components that use ResourceVariables. 要么避免使用使用ResourceVariables的组件。 This can be kind of difficult. 这可能有点困难。
Patch ScipyOptimizerInterface to do assignments for variables always. 修补ScipyOptimizerInterface以始终对变量进行赋值。 This is relatively easy since all the required code is in one file. 由于所有必需的代码都在一个文件中，因此这相对容易。

There was some effort to make the interface work with eager (that by default uses the ResourceVariables). 为了使该接口能够热切工作，已经做了一些努力（默认情况下使用ResourceVariables）。 Check out this link 查看此链接

将tf.contrib.opt.ScipyOptimizerInterface与tf.keras.layers一起使用，损失不会改变

问题描述

2 个解决方案

解决方案1
0 2019-04-22 06:20:00

解决方案2
0 2019-06-06 01:13:00

将tf.contrib.opt.ScipyOptimizerInterface与tf.keras.layers一起使用，损失不会改变

问题描述

2 个解决方案

解决方案1 0 2019-04-22 06:20:00

解决方案2 0 2019-06-06 01:13:00

解决方案1
0 2019-04-22 06:20:00

解决方案2
0 2019-06-06 01:13:00