简体   繁体   English

多元法线的Numpy向量化

[英]Numpy vectorization of multivariate normal

I have two 2D numpy arrays A, B. I want to use scipy.stats.multivariate_normal to calculate the joint logpdf of each row in A, using each row in B as the covariance matrix.我有两个 2D numpy 数组 A、B。我想使用 scipy.stats.multivariate_normal 来计算 A 中每一行的联合 logpdf,使用 B 中的每一行作为协方差矩阵。 Is there some way of doing this without explicitly looping over rows?是否有某种方法可以在不显式循环行的情况下执行此操作? A straightforward application of scipy.stats.multivariate_normal to A and B does calculate the logpdf of each row in A (which is what I want), but uses the entire 2D array A as the covariance matrix which is not what I want (I need each row of B to create a different covariance matrix). scipy.stats.multivariate_normal 对 A 和 B 的直接应用确实计算了 A 中每一行的 logpdf(这是我想要的),但使用整个二维数组 A 作为协方差矩阵,这不是我想要的(我需要B 的每一行以创建不同的协方差矩阵)。 I am looking for a solution that uses numpy vectorization and avoids explicitly looping over both arrays.我正在寻找一种使用 numpy 向量化并避免显式循环遍历两个数组的解决方案。

I was also trying to accomplish something similar.我也试图完成类似的事情。 Here's my code which takes in three NxD matrices.这是我的代码,它包含三个 NxD 矩阵。 Each row of X is a data point, each row of means is a mean vector, each row of covariances is the diagonal vector of a diagonal covariance matrix. X每一行是一个数据点,每行means是一个均值向量,每行covariances是一个对角协方差矩阵的对角向量。 The result is a length-N vector of log probabilities.结果是一个长度为 N 的对数概率向量。

def vectorized_gaussian_logpdf(X, means, covariances):
    """
    Compute log N(x_i; mu_i, sigma_i) for each x_i, mu_i, sigma_i
    Args:
        X : shape (n, d)
            Data points
        means : shape (n, d)
            Mean vectors
        covariances : shape (n, d)
            Diagonal covariance matrices
    Returns:
        logpdfs : shape (n,)
            Log probabilities
    """
    _, d = X.shape
    constant = d * np.log(2 * np.pi)
    log_determinants = np.log(np.prod(covariances, axis=1))
    deviations = X - means
    inverses = 1 / covariances
    return -0.5 * (constant + log_determinants +
        np.sum(deviations * inverses * deviations, axis=1))

Note that this code only works for diagonal covariance matrices.请注意,此代码仅适用于对角协方差矩阵。 In this special case, the mathematical definition below is simplified: Determinant becomes product over the elements, inverse becomes element-wise reciprocal, and matrix multiplication becomes element-wise multiplication.在这种特殊情况下,下面的数学定义被简化:行列式变成元素的乘积,逆变成元素倒数,矩阵乘法变成元素乘法。

多元正态pdf

A quick test for correctness and running time:正确性和运行时间的快速测试:

def test_vectorized_gaussian_logpdf():
    n = 128**2
    d = 64

    means = np.random.uniform(-1, 1, (n, d))
    covariances = np.random.uniform(0, 2, (n, d))
    X = np.random.uniform(-1, 1, (n, d))

    refs = []

    ref_start = time.time()
    for x, mean, covariance in zip(X, means, covariances):
        refs.append(scipy.stats.multivariate_normal.logpdf(x, mean, covariance))
    ref_time = time.time() - ref_start

    fast_start = time.time()
    results = vectorized_gaussian_logpdf(X, means, covariances)
    fast_time = time.time() - fast_start

    print("Reference time:", ref_time)
    print("Vectorized time:", fast_time)
    print("Speedup:", ref_time / fast_time)

    assert np.allclose(results, refs)

I get about 250x speedup.我得到了大约 250 倍的加速。 (And yes, my application requires me to calculate 16384 different Gaussians.) (是的,我的应用程序要求我计算 16384 个不同的高斯分布。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM