[英]Use Scipy Optimizer with Tensorflow 2.0 for Neural Network training
After the introduction of Tensorflow 2.0 the scipy interface (tf.contrib.opt.ScipyOptimizerInterface) has been removed.在引入 Tensorflow 2.0 后,scipy 接口 (tf.contrib.opt.ScipyOptimizerInterface) 已被删除。 However, I would still like to use the scipy optimizer scipy.optimize.minimize(method='L-BFGS-B') to train a neural network ( keras model sequential ).
However, I would still like to use the scipy optimizer scipy.optimize.minimize(method='L-BFGS-B') to train a neural network ( keras model sequential ). In order for the optimizer to work, it requires as input a function fun(x0) with x0 being an array of shape (n,).
为了让优化器工作,它需要一个 function fun(x0)作为输入,其中x0是一个形状 (n,) 的数组。 Therefore, the first step would be to "flatten" the weights matrices to obtain a vector with the required shape.
因此,第一步是“展平”权重矩阵以获得具有所需形状的向量。 To this end, I modified the code provided by https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/ .
为此,我修改了https://pychao.com/2019/11/02/optimize-tensorflow-keras-models-with-l-bfgs-from-tensorflow-probability/提供的代码。 This provides a function factory meant to create such a function fun(x0) .
这提供了一个 function 工厂,旨在创建这样一个 function fun(x0) 。 However, the code does not seem to work and the loss function does not decrease.
但是,代码似乎不起作用,并且损失 function 并没有减少。 I would be really grateful if someone could help me work this out.
如果有人能帮我解决这个问题,我将不胜感激。
Here the piece of code I am using:这是我正在使用的一段代码:
func = function_factory(model, loss_function, x_u_train, u_train)
# convert initial model parameters to a 1D tf.Tensor
init_params = tf.dynamic_stitch(func.idx, model.trainable_variables)
init_params = tf.cast(init_params, dtype=tf.float32)
# train the model with L-BFGS solver
results = scipy.optimize.minimize(fun=func, x0=init_params, method='L-BFGS-B')
def loss_function(x_u_train, u_train, network):
u_pred = tf.cast(network(x_u_train), dtype=tf.float32)
loss_value = tf.reduce_mean(tf.square(u_train - u_pred))
return tf.cast(loss_value, dtype=tf.float32)
def function_factory(model, loss_f, x_u_train, u_train):
"""A factory to create a function required by tfp.optimizer.lbfgs_minimize.
Args:
model [in]: an instance of `tf.keras.Model` or its subclasses.
loss [in]: a function with signature loss_value = loss(pred_y, true_y).
train_x [in]: the input part of training data.
train_y [in]: the output part of training data.
Returns:
A function that has a signature of:
loss_value, gradients = f(model_parameters).
"""
# obtain the shapes of all trainable parameters in the model
shapes = tf.shape_n(model.trainable_variables)
n_tensors = len(shapes)
# we'll use tf.dynamic_stitch and tf.dynamic_partition later, so we need to
# prepare required information first
count = 0
idx = [] # stitch indices
part = [] # partition indices
for i, shape in enumerate(shapes):
n = np.product(shape)
idx.append(tf.reshape(tf.range(count, count+n, dtype=tf.int32), shape))
part.extend([i]*n)
count += n
part = tf.constant(part)
def assign_new_model_parameters(params_1d):
"""A function updating the model's parameters with a 1D tf.Tensor.
Args:
params_1d [in]: a 1D tf.Tensor representing the model's trainable parameters.
"""
params = tf.dynamic_partition(params_1d, part, n_tensors)
for i, (shape, param) in enumerate(zip(shapes, params)):
model.trainable_variables[i].assign(tf.cast(tf.reshape(param, shape), dtype=tf.float32))
# now create a function that will be returned by this factory
def f(params_1d):
"""
This function is created by function_factory.
Args:
params_1d [in]: a 1D tf.Tensor.
Returns:
A scalar loss.
"""
# update the parameters in the model
assign_new_model_parameters(params_1d)
# calculate the loss
loss_value = loss_f(x_u_train, u_train, model)
# print out iteration & loss
f.iter.assign_add(1)
tf.print("Iter:", f.iter, "loss:", loss_value)
return loss_value
# store these information as members so we can use them outside the scope
f.iter = tf.Variable(0)
f.idx = idx
f.part = part
f.shapes = shapes
f.assign_new_model_parameters = assign_new_model_parameters
return f
Here model is an object tf.keras.Sequential.这里model是 object tf.keras.Sequential。
Thank you in advance for any help!预先感谢您的任何帮助!
Changing from tf1 to tf2 I was exposed to the same question and after a little bit of experimenting I found the solution below that shows how to establish the interface between a function decorated with tf.function and a scipy optimizer.从 tf1 更改为 tf2 我遇到了同样的问题,经过一些实验后,我找到了下面的解决方案,该解决方案显示了如何在用 tf.function 修饰的函数和 scipy 优化器之间建立接口。 The important changes compared to the question are:
与问题相比,重要的变化是:
jac=True
jac=True
I provide an example of how this can be done for a toy problem here below.我在下面提供了一个如何解决玩具问题的示例。
import tensorflow as tf
import numpy as np
import scipy.optimize as sopt
def model(x):
return tf.reduce_sum(tf.square(x-tf.constant(2, dtype=tf.float32)))
@tf.function
def val_and_grad(x):
with tf.GradientTape() as tape:
tape.watch(x)
loss = model(x)
grad = tape.gradient(loss, x)
return loss, grad
def func(x):
return [vv.numpy().astype(np.float64) for vv in val_and_grad(tf.constant(x, dtype=tf.float32))]
resdd= sopt.minimize(fun=func, x0=np.ones(5),
jac=True, method='L-BFGS-B')
print("info:\n",resdd)
displays显示
info:
fun: 7.105427357601002e-14
hess_inv: <5x5 LbfgsInvHessProduct with dtype=float64>
jac: array([-2.38418579e-07, -2.38418579e-07, -2.38418579e-07, -2.38418579e-07,
-2.38418579e-07])
message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 3
nit: 2
status: 0
success: True
x: array([1.99999988, 1.99999988, 1.99999988, 1.99999988, 1.99999988])
For comparing speed I use the lbfgs optimizer for a style transfer problem (see here for the network).为了比较速度,我使用 lbfgs 优化器来解决样式转换问题(有关网络,请参见此处)。 Note, that for this problem the network parameters are fixed and the input signal is adapted.
请注意,对于这个问题,网络参数是固定的,输入信号是适应的。 As the optimized parameters (the input signal) are 1D the function factory is not needed.
由于优化的参数(输入信号)是一维的,因此不需要函数工厂。
I compare four implementations我比较了四种实现
For this comparison the optimization is stopped after 300 iterations (generally for convergence the problem requires 3000 iterations)对于这个比较,优化在 300 次迭代后停止(通常为了收敛问题需要 3000 次迭代)
Method runtime(300it) final loss
TF1.12 240s 0.045 (baseline)
TF2.0 (E) 299s 0.045
TF2.0 (G) 233s 0.045
TF2.0/TFP 226s 0.053
The TF2.0 eager mode (TF2.0(E)) works correctly but is about 20% slower than the TF1.12 baseline version. TF2.0 急切模式 (TF2.0(E)) 工作正常,但比 TF1.12 基线版本慢约 20%。 TF2.0(G) with tf.function works fine and is marginally faster than TF1.12, which is a good thing to know.
带有 tf.function 的 TF2.0(G) 工作正常,并且比 TF1.12 略快,这是一件好事。
The optimizer from tensorflow_probability (TF2.0/TFP) is slightly faster than TF2.0(G) using scipy's lbfgs but does not achieve the same error reduction.来自 tensorflow_probability (TF2.0/TFP) 的优化器比使用 scipy 的 lbfgs 的 TF2.0(G) 略快,但没有实现相同的错误减少。 In fact the decrease of the loss over time is not monotonous which seems a bad sign.
事实上,随着时间的推移损失的减少并不是单调的,这似乎是一个坏兆头。 Comparing the two implementations of lbfgs (scipy and tensorflow_probability=TFP) it is clear that the Fortran code in scipy is significantly more complex.
比较 lbfgs 的两种实现(scipy 和 tensorflow_probability=TFP),很明显 scipy 中的 Fortran 代码要复杂得多。 So either the simplification of the algorithm in TFP is harming here or even the fact that TFP is performing all calculations in float32 may also be a problem.
因此,TFP 中算法的简化在这里是有害的,甚至 TFP 在 float32 中执行所有计算的事实也可能是一个问题。
Here is a simple solution using a library ( autograd_minimize ) that I wrote building on the answer of Roebel:这是一个使用库 ( autograd_minimize ) 的简单解决方案,我根据 Roebel 的回答编写了该库:
import tensorflow as tf
from autograd_minimize import minimize
def rosen_tf(x):
return tf.reduce_sum(100.0*(x[1:] - x[:-1]**2.0)**2.0 + (1 - x[:-1])**2.0)
res = minimize(rosen_tf, np.array([0.,0.]))
print(res.x)
>>> array([0.99999912, 0.99999824])
It also works with keras models as shown with this naive example of linear regression:它也适用于 keras 模型,如这个简单的线性回归示例所示:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from autograd_minimize.tf_wrapper import tf_function_factory
from autograd_minimize import minimize
import tensorflow as tf
#### Prepares data
X = np.random.random((200, 2))
y = X[:,:1]*2+X[:,1:]*0.4-1
#### Creates model
model = keras.Sequential([keras.Input(shape=2),
layers.Dense(1)])
# Transforms model into a function of its parameter
func, params = tf_function_factory(model, tf.keras.losses.MSE, X, y)
# Minimization
res = minimize(func, params, method='L-BFGS-B')
print(res.x)
>>> [array([[2.0000016 ],
[0.40000062]]), array([-1.00000164])]
I guess SciPy does not know how to calculate gradients of TensorFlow objects.我猜 SciPy 不知道如何计算 TensorFlow 对象的梯度。 Try to use the original function factory (ie, the one also returns the gradients together after loss), and set
jac=True
in scipy.optimize.minimize
.尝试使用原始函数工厂(即损失后也一起返回梯度),并在
scipy.optimize.minimize
设置jac=True
。
I tested the python code from the original Gist and replaced tfp.optimizer.lbfgs_minimize
with SciPy optimizer.我测试了原始 Gist 中的 python 代码,并用 SciPy 优化器替换了
tfp.optimizer.lbfgs_minimize
。 It worked with BFGS
method:它与
BFGS
方法一起使用:
results = scipy.optimize.minimize(fun=func, x0=init_params, jac=True, method='BFGS')
jac=True
means SciPy knows that func
also returns gradients. jac=True
意味着 SciPy 知道func
也返回梯度。
For L-BFGS-B
, however, it's tricky.然而,对于
L-BFGS-B
来说,这很棘手。 After some effort, I finally made it work.经过一番努力,我终于成功了。 I have to comment out the
@tf.function
lines and let func
return grads.numpy()
instead of the raw TF Tensor.我必须注释掉
@tf.function
行并让func
返回grads.numpy()
而不是原始 TF Tensor。 I guess that's because the underlying implementation of L-BFGS-B
is a Fortran function, so there might be some issue converting data from tf.Tensor -> numpy array -> Fortran array.我猜这是因为
L-BFGS-B
的底层实现是一个 Fortran 函数,所以从 tf.Tensor -> numpy array -> Fortran array 转换数据可能会出现一些问题。 And forcing the function func
to return the ndarray
version of the gradients resolves the problem.并强制函数
func
返回梯度的ndarray
版本解决了问题。 But then it's not possible to use @tf.function
.但是这样就不可能使用
@tf.function
。
(Similar Question to: Is there a tf.keras.optimizers implementation for L-BFGS? ) (类似问题: L-BFGS 是否有 tf.keras.optimizers 实现? )
While this is not from anywhere as legit as tf.contrib
, it's an implementation L-BFGS (and any other scipy.optimize.minimize
solver) for your consideration in case it fits your use case:虽然这不像
tf.contrib
那样合法,但它是一个实现 L-BFGS(和任何其他scipy.optimize.minimize
求解器)供您考虑,以防它适合您的用例:
The package has models that extend keras.Model
and keras.Sequential
models, and can be compiled with `.compile(..., optimizer="L-BFGS") to use L-BFGS in TF2, or compiled with any of the other standard optimizers (because flipping between stochastic & deterministic should be easy:): The package has models that extend
keras.Model
and keras.Sequential
models, and can be compiled with `.compile(..., optimizer="L-BFGS") to use L-BFGS in TF2, or compiled with any of the其他标准优化器(因为在随机和确定性之间切换应该很容易:):
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.