简体   繁体   English

tf.where使张量流中的优化器失败

[英]tf.where causes optimiser to fail in tensorflow

I want to check if I can solve this problem with tensorflow instead of pymc3. 我想检查我是否可以使用tensorflow而不是pymc3解决问题。 The experimental idea is that I am going to define a probibalistic system that contains a switchpoint. 实验性的想法是,我将定义一个包含切换点的概率系统。 I can use sampling as a method of inference but I started wondering why I couldn't just do this with a gradient descent instead. 我可以将采样用作推论方法,但我开始怀疑为什么我不能只使用梯度下降来做到这一点。

I decided to do the gradient search in tensorflow but it seems like tensorflow is having a hard time performing a gradient search when tf.where is involved. 我决定做tensorflow梯度搜索,但它似乎像tensorflow是不好受进行梯度搜索时tf.where参与。

You can find the code below. 您可以在下面找到代码。

import tensorflow as tf
import numpy as np

x1 = np.random.randn(50)+1
x2 = np.random.randn(50)*2 + 5
x_all = np.hstack([x1, x2])
len_x = len(x_all)
time_all = np.arange(1, len_x + 1)

mu1 = tf.Variable(0, name="mu1", dtype=tf.float32)
mu2 = tf.Variable(5, name = "mu2", dtype=tf.float32)
sigma1 = tf.Variable(2, name = "sigma1", dtype=tf.float32)
sigma2 = tf.Variable(2, name = "sigma2", dtype=tf.float32)
tau = tf.Variable(10, name = "tau", dtype=tf.float32)

mu = tf.where(time_all < tau,
              tf.ones(shape=(len_x,), dtype=tf.float32) * mu1,
              tf.ones(shape=(len_x,), dtype=tf.float32) * mu2)
sigma = tf.where(time_all < tau,
              tf.ones(shape=(len_x,), dtype=tf.float32) * sigma1,
              tf.ones(shape=(len_x,), dtype=tf.float32) * sigma2)

likelihood_arr = tf.log(tf.sqrt(1/(2*np.pi*tf.pow(sigma, 2)))) -tf.pow(x_all - mu, 2)/(2*tf.pow(sigma, 2))
total_likelihood = tf.reduce_sum(likelihood_arr, name="total_likelihood")

optimizer = tf.train.RMSPropOptimizer(0.01)
opt_task = optimizer.minimize(-total_likelihood)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    print("these variables should be trainable: {}".format([_.name for _ in tf.trainable_variables()]))
    for step in range(10000):
        _lik, _ = sess.run([total_likelihood, opt_task])
        if step % 1000 == 0:
            variables = {_.name:_.eval() for _ in [mu1, mu2, sigma1, sigma2, tau]}
            print("step: {}, values: {}".format(str(step).zfill(4), variables))

You'll notice that the tau parameter does not change even though tensorflow seems to be aware of the variable and it's gradient. 您会注意到,即使tensorflow似乎知道该变量及其渐变,tau参数也不会更改。 Any clue on what is going wrong? 关于出什么问题的任何线索? Is this something that can be calculated in tensorflow or do I need a different pattern? 这是可以在tensorflow中计算的东西吗,或者我需要其他模式吗?

tau is only used in the condition argument to where : ( tf.where(time_all < tau, ... ) , which is a boolean tensor. Since calculating gradients only makes sense for continuous values, the gradient of the output with respect to tau will be zero. tau ,仅在使用condition参数where :( tf.where(time_all < tau, ... 。),这是一个布尔张量由于计算梯度才有意义为连续值时,输出的斜率相对于tau将为零。

Even ignoring tf.where , you used tau in the expression time_all < tau , which is constant almost everywhere, so has a gradient of zero. 即使忽略tf.where中,使用tau在表达式time_all < tau ,其是恒定的几乎无处不在,所以具有零梯度。

Due to the gradient of zero, there is no way to learn tau with gradient descent methods. 由于梯度为零,因此无法使用梯度下降法学习tau

Depending on your problem, maybe instead of a hard switch between two values, you can use a weighted sum instead p*val1 + (1-p)*val2 , where p depends on tau in a continuous manner. 根据您的问题,也许可以使用加权和代替p*val1 + (1-p)*val2来代替在两个值之间进行硬切换,其中p连续依赖于tau

The assigned solution is the correct answer, but doesn't contain the code solution to my problem. 分配的解决方案是正确的答案,但不包含解决我的问题的代码。 The following snippet does; 以下代码片段可以;

import tensorflow as tf
import numpy as np
import os
import uuid

TENSORBOARD_PATH = "/tmp/tensorboard-switchpoint"
# tensorboard --logdir=/tmp/tensorboard-switchpoint

x1 = np.random.randn(35)-1
x2 = np.random.randn(35)*2 + 5
x_all = np.hstack([x1, x2])
len_x = len(x_all)
time_all = np.arange(1, len_x + 1)

mu1 = tf.Variable(0, name="mu1", dtype=tf.float32)
mu2 = tf.Variable(0, name = "mu2", dtype=tf.float32)
sigma1 = tf.Variable(2, name = "sigma1", dtype=tf.float32)
sigma2 = tf.Variable(2, name = "sigma2", dtype=tf.float32)
tau = tf.Variable(15, name = "tau", dtype=tf.float32)
switch = 1./(1+tf.exp(tf.pow(time_all - tau, 1)))

mu = switch*mu1 + (1-switch)*mu2
sigma = switch*sigma1 + (1-switch)*sigma2

likelihood_arr = tf.log(tf.sqrt(1/(2*np.pi*tf.pow(sigma, 2)))) - tf.pow(x_all - mu, 2)/(2*tf.pow(sigma, 2))
total_likelihood = tf.reduce_sum(likelihood_arr, name="total_likelihood")

optimizer = tf.train.AdamOptimizer()
opt_task = optimizer.minimize(-total_likelihood)
init = tf.global_variables_initializer()

tf.summary.scalar("mu1", mu1)
tf.summary.scalar("mu2", mu2)
tf.summary.scalar("sigma1", sigma1)
tf.summary.scalar("sigma2", sigma2)
tf.summary.scalar("tau", tau)
tf.summary.scalar("likelihood", total_likelihood)
merged_summary_op = tf.summary.merge_all()

with tf.Session() as sess:
    sess.run(init)
    print("these variables should be trainable: {}".format([_.name for _ in tf.trainable_variables()]))
    uniq_id = os.path.join(TENSORBOARD_PATH, "switchpoint-" + uuid.uuid1().__str__()[:4])
    summary_writer = tf.summary.FileWriter(uniq_id, graph=tf.get_default_graph())
    for step in range(40000):
        lik, opt, summary = sess.run([total_likelihood, opt_task, merged_summary_op])
        if step % 100 == 0:
            variables = {_.name:_.eval() for _ in [total_likelihood]}
            summary_writer.add_summary(summary, step)
            print("i{}: {}".format(str(step).zfill(5), variables))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM