[英]Why does a GPflow model not seem to learn anything with TensorFlow optimizers such as tf.optimizers.Adam?
My inducing points are set to trainable but do not change when I call opt.minimize()
.我的诱导点设置为可训练,但在我调用opt.minimize()
时不会改变。 Why is it and what does it mean?为什么会这样,这意味着什么? Does it mean, the model is not learning?这是否意味着 model 没有学习? What is the difference between tf.optimizers.Adam(lr)
and gpflow.optimizers.Scipy
? tf.optimizers.Adam(lr)
和gpflow.optimizers.Scipy
什么区别?
The following is the simple classification example adapted from the documentation.以下是根据文档改编的简单分类示例。 When I run this code example with gpflow's Scipy optimizer then I get the trained results and the values for inducing variables keep changing.当我使用 gpflow 的 Scipy 优化器运行此代码示例时,我得到了经过训练的结果,并且诱导变量的值不断变化。 But when I use Adam optimizer then I get only a straight line prediction, and the values for inducing points remain the same.但是当我使用 Adam 优化器时,我只能得到一条直线预测,并且诱导点的值保持不变。 It indicates that the model is not learning with Adam optimizer.它表明 model 没有使用 Adam 优化器进行学习。
plot of data before training训练前数据plot
plot of data after training with Adam使用 Adam 训练后的数据为 plot
plot of data after training with gpflow optimizer Scipy使用 gpflow 优化器训练后的数据 plot Scipy
The link for the example is https://gpflow.readthedocs.io/en/develop/notebooks/advanced/multiclass_classification.html该示例的链接是https://gpflow.readthedocs.io/en/develop/notebooks/advanced/multiclass_classification.html
import numpy as np
import tensorflow as tf
import warnings
warnings.filterwarnings('ignore') # ignore DeprecationWarnings from tensorflow
import matplotlib.pyplot as plt
import gpflow
from gpflow.utilities import print_summary, set_trainable
from gpflow.ci_utils import ci_niter
from tensorflow2_work.multiclass_classification import plot_posterior_predictions, colors
np.random.seed(0) # reproducibility
# Number of functions and number of data points
C = 3
N = 100
# RBF kernel lengthscale
lengthscale = 0.1
# Jitter
jitter_eye = np.eye(N) * 1e-6
# Input
X = np.random.rand(N, 1)
kernel_se = gpflow.kernels.SquaredExponential(lengthscale=lengthscale)
K = kernel_se(X) + jitter_eye
# Latents prior sample
f = np.random.multivariate_normal(mean=np.zeros(N), cov=K, size=(C)).T
# Hard max observation
Y = np.argmax(f, 1).reshape(-1,).astype(int)
print(Y.shape)
# One-hot encoding
Y_hot = np.zeros((N, C), dtype=bool)
Y_hot[np.arange(N), Y] = 1
data = (X, Y)
plt.figure(figsize=(12, 6))
order = np.argsort(X.reshape(-1,))
print(order.shape)
for c in range(C):
plt.plot(X[order], f[order, c], '.', color=colors[c], label=str(c))
plt.plot(X[order], Y_hot[order, c], '-', color=colors[c])
plt.legend()
plt.xlabel('$X$')
plt.ylabel('Latent (dots) and one-hot labels (lines)')
plt.title('Sample from the joint $p(Y, \mathbf{f})$')
plt.grid()
plt.show()
# sum kernel: Matern32 + White
kernel = gpflow.kernels.Matern32() + gpflow.kernels.White(variance=0.01)
# Robustmax Multiclass Likelihood
invlink = gpflow.likelihoods.RobustMax(C) # Robustmax inverse link function
likelihood = gpflow.likelihoods.MultiClass(C, invlink=invlink) # Multiclass likelihood
Z = X[::5].copy() # inducing inputs
#print(Z)
m = gpflow.models.SVGP(kernel=kernel, likelihood=likelihood,
inducing_variable=Z, num_latent_gps=C, whiten=True, q_diag=True)
# Only train the variational parameters
set_trainable(m.kernel.kernels[1].variance, True)
set_trainable(m.inducing_variable, True)
print(m.inducing_variable.Z)
print_summary(m)
training_loss = m.training_loss_closure(data)
opt.minimize(training_loss, m.trainable_variables)
print(m.inducing_variable.Z)
print_summary(m.inducing_variable.Z)
print(m.inducing_variable.Z)
# %%
plot_posterior_predictions(m, X, Y)
The example given in the question isn't copy&pastable, but it seems like you simply exchange opt = gpflow.optimizers.Scipy()
with opt = tf.optimizers.Adam()
.问题中给出的示例不可复制和粘贴,但您似乎只是将opt = gpflow.optimizers.Scipy()
与opt = tf.optimizers.Adam()
交换。 The minimize()
method of gpflow's Scipy optimizer runs one call of scipy.optimize.minimize , which by default runs to convergence (you can also specify a maximum number of iterations by passing, eg, options=dict(maxiter=100)
to the minimize() call). gpflow 的 Scipy 优化器的minimize()
方法运行一次scipy.optimize.minimize调用,它默认运行到收敛(您也可以通过将options=dict(maxiter=100)
传递给最小化()调用)。
In contrast, the minimize()
method of TensorFlow optimizers runs only a single optimization step.相比之下,TensorFlow 优化器的minimize()
方法只运行一个优化步骤。 To run more steps, say iter = 100
, you need to manually write a loop:要运行更多步骤,比如iter = 100
,您需要手动编写一个循环:
for _ in range(iter):
opt.minimize(model.training_loss, model.trainable_variables)
For this to actually run fast, you also need to wrap the optimization step in tf.function
:为了使其真正运行得更快,您还需要将优化步骤包装在tf.function
中:
@tf.function
def optimization_step():
opt.minimize(model.training_loss, model.trainable_variables)
for _ in range(iter):
optimization_step()
This runs exactly iter
steps - in TensorFlow you have to handle convergence checks yourself, your model may or may not be converged after this many steps.这运行完全iter
步骤 - 在 TensorFlow 中你必须自己处理收敛检查,你的 model 在这么多步骤之后可能会或可能不会收敛。
So in your usage, you only ran one step - this did change the parameters, but presumably too little to notice the difference.所以在您的使用中,您只运行了一个步骤——这确实改变了参数,但可能太少而无法注意到差异。 (You could see a larger effect in one step by making the learning rate much higher, though that would not be a good idea for actually optimizing the model with many steps.) (通过提高学习率,您可以在一步中看到更大的效果,尽管这对于通过许多步骤实际优化 model 并不是一个好主意。)
Usage of the Adam optimizer with GPflow models is demonstrated in the notebook on stochastic variational inference , though it also works for non-stochastic optimization. 随机变分推理笔记本中演示了 Adam 优化器与 GPflow 模型的用法,尽管它也适用于非随机优化。
Note that, in any case, all parameters such as inducing point locations are set trainable by default, so your call to set_trainable(..., True)
doesn't affect what's going on here.请注意,在任何情况下,所有参数(例如诱导点位置)默认设置为可训练,因此您对set_trainable(..., True)
的调用不会影响此处发生的事情。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.