简体   繁体   English

为什么 GPflow model 似乎无法使用 TensorFlow 优化器(例如 tf.optimizers.Adam)学习任何东西?

[英]Why does a GPflow model not seem to learn anything with TensorFlow optimizers such as tf.optimizers.Adam?

My inducing points are set to trainable but do not change when I call opt.minimize() .我的诱导点设置为可训练,但在我调用opt.minimize()时不会改变。 Why is it and what does it mean?为什么会这样,这意味着什么? Does it mean, the model is not learning?这是否意味着 model 没有学习? What is the difference between tf.optimizers.Adam(lr) and gpflow.optimizers.Scipy ? tf.optimizers.Adam(lr)gpflow.optimizers.Scipy什么区别?

The following is the simple classification example adapted from the documentation.以下是根据文档改编的简单分类示例。 When I run this code example with gpflow's Scipy optimizer then I get the trained results and the values for inducing variables keep changing.当我使用 gpflow 的 Scipy 优化器运行此代码示例时,我得到了经过训练的结果,并且诱导变量的值不断变化。 But when I use Adam optimizer then I get only a straight line prediction, and the values for inducing points remain the same.但是当我使用 Adam 优化器时,我只能得到一条直线预测,并且诱导点的值保持不变。 It indicates that the model is not learning with Adam optimizer.它表明 model 没有使用 Adam 优化器进行学习。

plot of data before training训练前数据plot

plot of data after training with Adam使用 Adam 训练后的数据为 plot

plot of data after training with gpflow optimizer Scipy使用 gpflow 优化器训练后的数据 plot Scipy

The link for the example is https://gpflow.readthedocs.io/en/develop/notebooks/advanced/multiclass_classification.html该示例的链接是https://gpflow.readthedocs.io/en/develop/notebooks/advanced/multiclass_classification.html

import numpy as np
import tensorflow as tf


import warnings
warnings.filterwarnings('ignore')  # ignore DeprecationWarnings from tensorflow

import matplotlib.pyplot as plt

import gpflow

from gpflow.utilities import print_summary, set_trainable
from gpflow.ci_utils import ci_niter

from tensorflow2_work.multiclass_classification import plot_posterior_predictions, colors

np.random.seed(0)  # reproducibility

# Number of functions and number of data points
C = 3
N = 100

# RBF kernel lengthscale
lengthscale = 0.1

# Jitter
jitter_eye = np.eye(N) * 1e-6

# Input
X = np.random.rand(N, 1)

kernel_se = gpflow.kernels.SquaredExponential(lengthscale=lengthscale)
K = kernel_se(X) + jitter_eye

# Latents prior sample
f = np.random.multivariate_normal(mean=np.zeros(N), cov=K, size=(C)).T

# Hard max observation
Y = np.argmax(f, 1).reshape(-1,).astype(int)
print(Y.shape)

# One-hot encoding
Y_hot = np.zeros((N, C), dtype=bool)
Y_hot[np.arange(N), Y] = 1

data = (X, Y)

plt.figure(figsize=(12, 6))
order = np.argsort(X.reshape(-1,))
print(order.shape)

for c in range(C):
    plt.plot(X[order], f[order, c], '.', color=colors[c], label=str(c))
    plt.plot(X[order], Y_hot[order, c], '-', color=colors[c])


plt.legend()
plt.xlabel('$X$')
plt.ylabel('Latent (dots) and one-hot labels (lines)')
plt.title('Sample from the joint $p(Y, \mathbf{f})$')
plt.grid()
plt.show()


# sum kernel: Matern32 + White
kernel = gpflow.kernels.Matern32() + gpflow.kernels.White(variance=0.01)

# Robustmax Multiclass Likelihood
invlink = gpflow.likelihoods.RobustMax(C)  # Robustmax inverse link function
likelihood = gpflow.likelihoods.MultiClass(C, invlink=invlink)  # Multiclass likelihood
Z = X[::5].copy()  # inducing inputs
#print(Z)

m = gpflow.models.SVGP(kernel=kernel, likelihood=likelihood,
    inducing_variable=Z, num_latent_gps=C, whiten=True, q_diag=True)

# Only train the variational parameters
set_trainable(m.kernel.kernels[1].variance, True)
set_trainable(m.inducing_variable, True)
print(m.inducing_variable.Z)
print_summary(m)


training_loss = m.training_loss_closure(data) 

opt.minimize(training_loss, m.trainable_variables)
print(m.inducing_variable.Z)
print_summary(m.inducing_variable.Z)


print(m.inducing_variable.Z)

# %%
plot_posterior_predictions(m, X, Y)

The example given in the question isn't copy&pastable, but it seems like you simply exchange opt = gpflow.optimizers.Scipy() with opt = tf.optimizers.Adam() .问题中给出的示例不可复制和粘贴,但您似乎只是将opt = gpflow.optimizers.Scipy()opt = tf.optimizers.Adam()交换。 The minimize() method of gpflow's Scipy optimizer runs one call of scipy.optimize.minimize , which by default runs to convergence (you can also specify a maximum number of iterations by passing, eg, options=dict(maxiter=100) to the minimize() call). gpflow 的 Scipy 优化器的minimize()方法运行一次scipy.optimize.minimize调用,它默认运行到收敛(您也可以通过将options=dict(maxiter=100)传递给最小化()调用)。

In contrast, the minimize() method of TensorFlow optimizers runs only a single optimization step.相比之下,TensorFlow 优化器的minimize()方法只运行一个优化步骤。 To run more steps, say iter = 100 , you need to manually write a loop:要运行更多步骤,比如iter = 100 ,您需要手动编写一个循环:

for _ in range(iter):
    opt.minimize(model.training_loss, model.trainable_variables)

For this to actually run fast, you also need to wrap the optimization step in tf.function :为了使其真正运行得更快,您还需要将优化步骤包装在tf.function

@tf.function
def optimization_step():
    opt.minimize(model.training_loss, model.trainable_variables)

for _ in range(iter):
    optimization_step()

This runs exactly iter steps - in TensorFlow you have to handle convergence checks yourself, your model may or may not be converged after this many steps.这运行完全iter步骤 - 在 TensorFlow 中你必须自己处理收敛检查,你的 model 在这么多步骤之后可能会或可能不会收敛。

So in your usage, you only ran one step - this did change the parameters, but presumably too little to notice the difference.所以在您的使用中,您只运行了一个步骤——这确实改变了参数,但可能太少而无法注意到差异。 (You could see a larger effect in one step by making the learning rate much higher, though that would not be a good idea for actually optimizing the model with many steps.) (通过提高学习率,您可以在一步中看到更大的效果,尽管这对于通过许多步骤实际优化 model 并不是一个好主意。)

Usage of the Adam optimizer with GPflow models is demonstrated in the notebook on stochastic variational inference , though it also works for non-stochastic optimization. 随机变分推理笔记本中演示了 Adam 优化器与 GPflow 模型的用法,尽管它也适用于非随机优化。

Note that, in any case, all parameters such as inducing point locations are set trainable by default, so your call to set_trainable(..., True) doesn't affect what's going on here.请注意,在任何情况下,所有参数(例如诱导点位置)默认设置为可训练,因此您对set_trainable(..., True)的调用不会影响此处发生的事情。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 tf.keras.optimizers.Adam和其他最小化的优化器 - tf.keras.optimizers.Adam and other optimizers with minimization 使用 tf.optimizers.Adam.minimize() 时,object 不可调用 - object is not callable, when using tf.optimizers.Adam.minimize() TensorFlow 优化器是否通过分配学习图中的梯度? - Do TensorFlow optimizers learn gradients in a graph with assignments? 无法从“tensorflow.python.keras.optimizers”导入 SGD 和 Adam - Unable to import SGD and Adam from 'tensorflow.python.keras.optimizers' 在优化器上使用tf.group后如何从tensorflow模型中获取预测 - How to get predictions out of tensorflow model after you've used tf.group on your optimizers ValueError:使用 tf.optimizers.Adam.minimize 方法时没有为任何变量提供梯度 - ValueError: No gradients provided for any variable when using tf.optimizers.Adam.minimize method 凯拉斯(Keras)Tensorflow,只有亚当(Adam)雾化器有效。 所有其他优化器均产生无值错误 - Keras Tensorflow, only Adam optomizer works. All other optimizers produce none value error tensorflow 二元概率分类:使用哪些损失函数/优化器和模型? - tensorflow binary probability classification: which loss functions/optimizers and model to use? 如何在使用Tensorflow-2.0 tf.optimizers时修复“给定对象不是优化器实例”? - How fix 'The given object is not an Optimizer instance' when using Tensorflow-2.0 tf.optimizers? 如何解决 AttributeError: module 'keras.optimizers' has no attribute 'Adam' - how to solve AttributeError: module 'keras.optimizers' has no attribute 'Adam'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM