[英]tf.where causes optimiser to fail in tensorflow
I want to check if I can solve this problem with tensorflow instead of pymc3. 我想检查我是否可以使用tensorflow而不是pymc3解决此问题。 The experimental idea is that I am going to define a probibalistic system that contains a switchpoint. 实验性的想法是,我将定义一个包含切换点的概率系统。 I can use sampling as a method of inference but I started wondering why I couldn't just do this with a gradient descent instead. 我可以将采样用作推论方法,但我开始怀疑为什么我不能只使用梯度下降来做到这一点。
I decided to do the gradient search in tensorflow but it seems like tensorflow is having a hard time performing a gradient search when tf.where
is involved. 我决定做tensorflow梯度搜索,但它似乎像tensorflow是不好受进行梯度搜索时tf.where
参与。
You can find the code below. 您可以在下面找到代码。
import tensorflow as tf
import numpy as np
x1 = np.random.randn(50)+1
x2 = np.random.randn(50)*2 + 5
x_all = np.hstack([x1, x2])
len_x = len(x_all)
time_all = np.arange(1, len_x + 1)
mu1 = tf.Variable(0, name="mu1", dtype=tf.float32)
mu2 = tf.Variable(5, name = "mu2", dtype=tf.float32)
sigma1 = tf.Variable(2, name = "sigma1", dtype=tf.float32)
sigma2 = tf.Variable(2, name = "sigma2", dtype=tf.float32)
tau = tf.Variable(10, name = "tau", dtype=tf.float32)
mu = tf.where(time_all < tau,
tf.ones(shape=(len_x,), dtype=tf.float32) * mu1,
tf.ones(shape=(len_x,), dtype=tf.float32) * mu2)
sigma = tf.where(time_all < tau,
tf.ones(shape=(len_x,), dtype=tf.float32) * sigma1,
tf.ones(shape=(len_x,), dtype=tf.float32) * sigma2)
likelihood_arr = tf.log(tf.sqrt(1/(2*np.pi*tf.pow(sigma, 2)))) -tf.pow(x_all - mu, 2)/(2*tf.pow(sigma, 2))
total_likelihood = tf.reduce_sum(likelihood_arr, name="total_likelihood")
optimizer = tf.train.RMSPropOptimizer(0.01)
opt_task = optimizer.minimize(-total_likelihood)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
print("these variables should be trainable: {}".format([_.name for _ in tf.trainable_variables()]))
for step in range(10000):
_lik, _ = sess.run([total_likelihood, opt_task])
if step % 1000 == 0:
variables = {_.name:_.eval() for _ in [mu1, mu2, sigma1, sigma2, tau]}
print("step: {}, values: {}".format(str(step).zfill(4), variables))
You'll notice that the tau parameter does not change even though tensorflow seems to be aware of the variable and it's gradient. 您会注意到,即使tensorflow似乎知道该变量及其渐变,tau参数也不会更改。 Any clue on what is going wrong? 关于出什么问题的任何线索? Is this something that can be calculated in tensorflow or do I need a different pattern? 这是可以在tensorflow中计算的东西吗,或者我需要其他模式吗?
tau
is only used in the condition
argument to where
: ( tf.where(time_all < tau, ...
) , which is a boolean tensor. Since calculating gradients only makes sense for continuous values, the gradient of the output with respect to tau
will be zero. tau
,仅在使用condition
参数where
:( tf.where(time_all < tau, ...
。),这是一个布尔张量由于计算梯度才有意义为连续值时,输出的斜率相对于tau
将为零。
Even ignoring tf.where
, you used tau
in the expression time_all < tau
, which is constant almost everywhere, so has a gradient of zero. 即使忽略tf.where
中,使用tau
在表达式time_all < tau
,其是恒定的几乎无处不在,所以具有零梯度。
Due to the gradient of zero, there is no way to learn tau
with gradient descent methods. 由于梯度为零,因此无法使用梯度下降法学习tau
。
Depending on your problem, maybe instead of a hard switch between two values, you can use a weighted sum instead p*val1 + (1-p)*val2
, where p
depends on tau
in a continuous manner. 根据您的问题,也许可以使用加权和代替p*val1 + (1-p)*val2
来代替在两个值之间进行硬切换,其中p
连续依赖于tau
。
The assigned solution is the correct answer, but doesn't contain the code solution to my problem. 分配的解决方案是正确的答案,但不包含解决我的问题的代码。 The following snippet does; 以下代码片段可以;
import tensorflow as tf
import numpy as np
import os
import uuid
TENSORBOARD_PATH = "/tmp/tensorboard-switchpoint"
# tensorboard --logdir=/tmp/tensorboard-switchpoint
x1 = np.random.randn(35)-1
x2 = np.random.randn(35)*2 + 5
x_all = np.hstack([x1, x2])
len_x = len(x_all)
time_all = np.arange(1, len_x + 1)
mu1 = tf.Variable(0, name="mu1", dtype=tf.float32)
mu2 = tf.Variable(0, name = "mu2", dtype=tf.float32)
sigma1 = tf.Variable(2, name = "sigma1", dtype=tf.float32)
sigma2 = tf.Variable(2, name = "sigma2", dtype=tf.float32)
tau = tf.Variable(15, name = "tau", dtype=tf.float32)
switch = 1./(1+tf.exp(tf.pow(time_all - tau, 1)))
mu = switch*mu1 + (1-switch)*mu2
sigma = switch*sigma1 + (1-switch)*sigma2
likelihood_arr = tf.log(tf.sqrt(1/(2*np.pi*tf.pow(sigma, 2)))) - tf.pow(x_all - mu, 2)/(2*tf.pow(sigma, 2))
total_likelihood = tf.reduce_sum(likelihood_arr, name="total_likelihood")
optimizer = tf.train.AdamOptimizer()
opt_task = optimizer.minimize(-total_likelihood)
init = tf.global_variables_initializer()
tf.summary.scalar("mu1", mu1)
tf.summary.scalar("mu2", mu2)
tf.summary.scalar("sigma1", sigma1)
tf.summary.scalar("sigma2", sigma2)
tf.summary.scalar("tau", tau)
tf.summary.scalar("likelihood", total_likelihood)
merged_summary_op = tf.summary.merge_all()
with tf.Session() as sess:
sess.run(init)
print("these variables should be trainable: {}".format([_.name for _ in tf.trainable_variables()]))
uniq_id = os.path.join(TENSORBOARD_PATH, "switchpoint-" + uuid.uuid1().__str__()[:4])
summary_writer = tf.summary.FileWriter(uniq_id, graph=tf.get_default_graph())
for step in range(40000):
lik, opt, summary = sess.run([total_likelihood, opt_task, merged_summary_op])
if step % 100 == 0:
variables = {_.name:_.eval() for _ in [total_likelihood]}
summary_writer.add_summary(summary, step)
print("i{}: {}".format(str(step).zfill(5), variables))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.