实现具有负对数似然损失的简单概率模型

Question

首先快速免责声明是我在 Reddit 上发布了这个问题，首先是深度学习和机器学习，但我想我也可以在这里请求您的专业知识。 无需再费周折：

我目前正在伯克利大学今年的深度无监督学习课程中挑战自己，虽然我刚刚开始第 1 周的热身练习，但我已经遇到了“技术”困难。

有问题的练习是以下文档中的“1. 热身”：第 1 周练习。 （我很抱歉，因为我对 Reddit 格式不够熟悉，无法无缝地包含图像。

根据我的理解，我们有一个变量x ，它可以从1..100取值，这是一个特定的采样概率（在sample_data()函数中定义）。 因此，任务是拟合传递给 softmax 函数的参数向量theta ，并且应该给出要采样的特定元素x_i的可能性。 即， theta_1应该是“提高”与变量x = 1对应的软最大值的参数，依此类推。

使用 Tensorflow，我认为我能够创建这样一个模型，但是在训练方面，我相信我错过了一个关键点，因为该程序无法计算关于theta参数的梯度。

我想知道我是不是对任务有误解，是否有更好的方法来实现练习的结果。

这是代码，其中失败的 par 来自# Computing gradients 。

import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

if __name__ == "__main__":
    # Sampling function of the x variable provided in the exercise
    def sample_data():
        count = 10000
        rand = np.random.RandomState(0)
        a = 0.3 + 0.1 * rand.randn(count)
        b = 0.8 + 0.05 * rand.randn(count)
        mask = rand.rand(count) < 0.5
        samples = np.clip(a * mask + b * (1 - mask), 0.0, 1.0)
        return np.digitize(samples, np.linspace(0.0, 1.0, 100))

    full_data = sample_data()
    train_ds = full_data[:int(.8*len( full_data))]
    val_ds = full_data[int(.8*len( full_data)):]

    # Declaring parameters theta
    w_init = tf.zeros_initializer()
    params = tf.Variable(
        initial_value=w_init(shape=(1, 100),
        dtype='float32'), trainable=True, name='params')


    softmax = tf.squeeze( tf.nn.softmax( params, axis=1))

    #Should materialize the loss of the model
    def get_neg_log_likelihood( inputs):
        return - tf.math.log( softmax)

    neg_log_likelihoods = get_neg_log_likelihood( softmax)

    dist = tfp.distributions.Categorical( probs=softmax, dtype=tf.int32)

    optimizer = tf.keras.optimizers.Adam()

    for epoch in range( 100):
        minibatch_size = 200
        n_minibatches = len( train_ds) // minibatch_size

        # Running over minibatches of the data
        for minibatch in range( n_minibatches):
            # Minibatching
            start_index = (minibatch*minibatch_size)
            end_index = (minibatch_size*minibatch + minibatch_size)

            x = train_ds[start_index:end_index]

            with tf.GradientTape() as tape:
                tape.watch( params)
                loss = tf.reduce_mean( - dist.log_prob( x))

            # Computing gradients
            grads = tape.gradient( loss, params)
            print( grads) # Result: None
            # input()
            optimizer.apply_gradients( zip( grads, params))

提前感谢您的时间。

PS：我主要有深度强化学习的背景，因此我可以理解那里使用的各种模型（策略、价值函数......），但我试图完善我对模型本身内部的理解，即生成概率模型（GAN、VAE）和其他无监督学习模型（RealNVP、Norm Flows...）

Answer 1

很确定没有人会看到这个，但我想我不妨就此结束。

首先，我通过直接从 soft-max 值的负对数似然导出它的表达式来计算梯度，从而在相同的情况下放弃了 Tensorflow 框架。

虽然结果有点出乎我的意料，但该程序能够将模型拟合到与采样数据的经验分布有些相似的分布。 我想这是因为仅一维 theta 参数向量不足以完全模拟真实数据分布以及有限数量的采样数据。

代码的更新版本：

import numpy as np
from matplotlib import pyplot as plt

np.random.seed( 42)

def softmax(X, theta = 1.0, axis = None):
    # Shamefull copy paste from SO
    y = np.atleast_2d(X)
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)
    y = y * float(theta)
    y = y - np.expand_dims(np.max(y, axis = axis), axis)
    y = np.exp(y)
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)
    p = y / ax_sum
    if len(X.shape) == 1: p = p.flatten()

    return p

if __name__ == "__main__":
    def sample_data():
        count = 10000
        rand = np.random.RandomState(0)
        a = 0.3 + 0.1 * rand.randn(count)
        b = 0.8 + 0.05 * rand.randn(count)
        mask = rand.rand(count) < 0.5
        samples = np.clip(a * mask + b * (1 - mask), 0.0, 1.0)
        return np.digitize(samples, np.linspace(0.0, 1.0, 100))

    full_data = sample_data()
    train_ds = full_data[:int(.8*len( full_data))]
    val_ds = full_data[int(.8*len( full_data)):]

    # Declaring parameters
    params = np.zeros(100)

    # Use for loss computation
    def get_neg_log_likelihood( softmax):
        return - np.log( softmax)

    def get_loss( params, x):
        return np.mean( [get_neg_log_likelihood( softmax( params))[i-1] for i in x])

    lr = .0005

    for epoch in range( 1000):
        # Shuffling training data
        np.random.shuffle( train_ds)

        minibatch_size = 100
        n_minibatches = len( train_ds) // minibatch_size

        # Running over minibatches of the data
        for minibatch in range( n_minibatches):
            smax = softmax( params)

            # Jacobian of neg log likelishood
            jacobian = [[ smax[j] - 1 if i == j else
                smax[j] for j in range(100)] for i in range(100)]

            # Minibatching
            start_index = (minibatch*minibatch_size)
            end_index = (minibatch_size*minibatch + minibatch_size)

            x = train_ds[start_index:end_index]

            # Compute the gradient matrix for each sample data and mean over it
            grad_matrix = np.vstack( [jacobian[i] for i in x])
            grads = np.sum( grad_matrix, axis=0)

            params -= lr * grads

        print( "Epoch %d -- Train loss: %.4f , Val loss: %.4f" %(epoch, get_loss( params, train_ds), get_loss( params, val_ds)))

        # Plotting each ~100 epochs
        if epoch % 100 == 0:
            counters = { i+1: 0 for i in range(100)}
            for x in full_data:
                counters[x]+= 1

            histogram = np.array( [ counters[i+1] / len( full_data) for i in range( 100)])
            fsmax = softmax( params)

            fig, ax = plt.subplots()
            ax.set_title('Dist. Comp. after %d epochs of training (from scratch)' % epoch)
            x = np.arange( 1,101)
            width = 0.35
            rects1 = ax.bar(x - width/2, fsmax, width, label='Model')
            rects2 = ax.bar(x + width/2, histogram, width, label='Empirical')
            ax.set_ylabel('Likelihood')
            ax.set_xlabel('Variable x\s values')
            ax.legend()

            def autolabel(rects):
                for rect in rects:
                    height = rect.get_height()

            autolabel(rects1)
            autolabel(rects2)

            fig.tight_layout()
            plt.savefig( 'plots/results_after_%d_epochs.png' % epoch)

为了完整性，包含了最终模型分布的图片。 建模与经验分布

实现具有负对数似然损失的简单概率模型

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-11-26 08:09:18

实现具有负对数似然损失的简单概率模型

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-11-26 08:09:18

解决方案1
0 已采纳 2019-11-26 08:09:18