联合正态先验分布的后验

Question

I have some basic questions about gaussian inference.我有一些关于高斯推理的基本问题。

I have following data:我有以下数据：

(Log) dose, Number of animals, Number of deaths
-0.86, 5, 0
-0.30, 5, 1
-0.05, 5, 3
0.73, 5, 5

EDIT: I'm assuming a simple regression model for the dose response logit(θ) = α + βx where logit(θ) = log(θ / (1-θ)).编辑：我假设剂量反应 logit(θ) = α + βx 的简单回归模型，其中 logit(θ) = log(θ / (1-θ))。 θ stands for a probability of death given dose x. θ 代表给定剂量 x 的死亡概率。

I want to create a joint normal prior distribution on (α,β), with α ∼ N(0,22),β ∼ N(10,102), and corr(α,β) = 0.5 and then calculate the posterior density in a grid of points around the prior (α: 0 ± 4, β: 10 ± 20).我想在 (α,β) 上创建一个联合正态先验分布，其中 α ∼ N(0,22)、β ∼ N(10,102) 和 corr(α,β) = 0.5，然后计算 a 中的后验密度先验周围的点网格（α：0±4，β：10±20）。

First, I have created joint normal prior distribution following:首先，我创建了以下联合正态先验分布：

import numpy as np
from scipy import stats
x = np.array([-0.86, -0.30, -0.05, 0.73])
n = np.array([5, 5, 5, 5])
y = np.array([0, 1, 3, 5])
prior = stats.multivariate_normal([0, 10], [[0.5, 0], [0, 0.5]])

Is this right?这是正确的吗？

Second, how do I calculate posterior density in a grid?其次，如何计算网格中的后验密度？

Answer 1

Parameterizing Gaussian参数化高斯

To answer the first question, you are parameterizing the normal distribution incorrectly.要回答第一个问题，您对正态分布的参数化不正确。 In particular your covariance matrix is not specified according to your description.特别是您的协方差矩阵未根据您的描述指定。

Given the standard deviations, s_1 = 22 and s_2 = 102 , and the desired correlation of 0.5, the correct covariance matrix is:给定标准偏差s_1 = 22和s_2 = 102以及所需的相关性 0.5，正确的协方差矩阵为：

 ---                    ---
| s_1*s_1      0.5*s_1*s_2 |
|                          |
| 0.5*s_1*s_2      s_2*s_2 |
 ---                    ---

That is, variances on the diagonal and covariances off the diagonal.也就是说，对角线上的方差和对角线外的协方差。 In Numpy/Scipy, that would be在 Numpy/Scipy 中，这将是

mu = np.array([0, 10])
s = np.array([[22, 102]])
Rho = np.array([[1, 0.5], [0.5, 1]])
Sigma = Rho * np.outer(s, s)

prior = stats.multivariate_normal(mean=mu, cov=Sigma)

Computing Grid Values, or Not计算网格值，或不

Getting an appropriately normalized posterior density requires marginalizing (integrating) over continuous variables (eg, θ), and this is only analytically solvable in special cases, which I don't think yours is.获得适当归一化的后验密度需要对连续变量（例如 θ）进行边缘化（积分），这仅在特殊情况下才能解析解决，我认为您的情况不是这样。 So, you could either work out the integrals and compute numerical approximations, or use some approximate inference method, such as MCMC or variational inference.因此，您可以计算积分并计算数值近似值，或者使用一些近似推理方法，例如 MCMC 或变分推理。 There are great tools for this, like PyMC3 and PyStan.有很多很棒的工具，比如 PyMC3 和 PyStan。

Getting posterior values for only discrete points on a grid requires imposing conditional values on your model variables.仅获取网格上离散点的后验值需要对模型变量施加条件值。 However, most probabilistic programming tools are so expressive these days that it's going to be easier to just infer the full posterior, and if you really have some special interest grid values, to inspect them afterward.然而，现在大多数概率编程工具都具有很强的表现力，因此只需推断完整的后验值会更容易，如果您确实有一些特殊的兴趣网格值，则在之后检查它们。

PyMC3 Example PyMC3 示例

Here is a full posterior inference in PyMC3 with your strong prior:这是 PyMC3 中具有强大先验的完整后验推断：

import numpy as np
import pymc3 as pm
import theano
import theano.tensor as tt
import matplotlib.pyplot as plt
import arviz as az

# Data
X = np.array([-0.86, -0.30, -0.05, 0.73])
N = np.array([5, 5, 5, 5])
Y = np.array([0, 1, 3, 5])

# augment X for simpler regression expression
X_aug = tt.stack(np.ones_like(X), X).T

# Prior params
mu = np.array([0, 10])
sd = np.array([22, 102])
Rho = np.array([[1, 0.5],[0.5, 1]])
Sigma = np.outer(sd, sd) * Rho

with pm.Model() as binomial_regression:
    # regression coefficients (strong prior)
    beta = pm.MvNormal('beta', mu=mu, cov=Sigma, shape=2)

    # death probability
    theta_i = pm.Deterministic('theta_i', pm.invlogit(X_aug.dot(beta)))

    # outcomes
    y_i = pm.Binomial('y_i', n=N, p=theta_i, observed=Y)

    trace = pm.sample(10000, tune=100000, target_accept=0.8, random_seed=2018)

This does okay sampling, but requires a high number of tuning steps to reduce divergences:这可以很好地采样，但需要大量的调整步骤来减少分歧：

Auto-assigning NUTS sampler...自动分配 NUTS 采样器...

Initializing NUTS using jitter+adapt_diag...使用 jitter+adapt_diag 初始化 NUTS...

Multiprocess sampling (2 chains in 2 jobs) NUTS:多进程采样（2 个工作中的 2 个链） NUTS：

[beta] Sampling 2 chains: 100%|██████████| [beta] 采样 2 条链：100%|██████████| 220000/220000 [03:52<00:00, 947.57draws/s] 220000/220000 [03:52<00:00, 947.57draws/s]

There were 1 divergences after tuning.调整后有 1 个分歧。 Increase target_accept or reparameterize.增加target_accept或重新参数化。

The number of effective samples is smaller than 25% for some parameters.对于某些参数，有效样本数小于 25%。

Trace Plots跟踪图

在此处输入图片说明

Joint Plot联合图

ax, _, _ = az.jointplot(trace, var_names=['beta'], kind='hexbin')
ax.set_xlabel("Intercept Coefficient ($\\beta_0$)")
ax.set_ylabel("Slope Coefficient ($\\beta_1$)")
plt.show()

在此处输入图片说明

Answer 2

Based on merv's good answer, to answer to myself, I think that closed solution is:基于 merv 的好答案，回答我自己，我认为封闭的解决方案是：

p(yi|α,β,ni,xi)∝ [logit ⁻¹ (α+βxi)] ^y * [1 − logit ⁻¹ (α+βx) ^n−y ] p(yi|α,β,ni,xi)∝ [logit ⁻¹ (α+βxi)] ^y * [1 − logit ⁻¹ (α+βx) ^n−y ]

Thus posterior can be calculated following:因此后验可以计算如下：

import numpy as np
from scipy import optimize, stats
import matplotlib.pyplot as plt
x = np.array([-0.86, -0.30, -0.05, 0.73])
n = np.array([5, 5, 5, 5])
y = np.array([0, 1, 3, 5])

ngrid = 100
mu_1, mu_2, sd_1, sd_2 = 0, 10, 2**2, 10**2
A = np.linspace(-4, 4, ngrid)
B = np.linspace(-10, 30, ngrid)

mu = np.array([0, 10])
s = np.array([[22, 102]])
Rho = np.array([[1, 0.5], [0.5, 1]])
Sigma = Rho * np.outer(s, s)
prior = stats.multivariate_normal([mu_1, mu_2], Sigma)

def prop_likelihood(input_values):
    ilogit_abx = 1 / (np.exp(-(input_values[...,0][...,None]*np.ones(x.shape) + input_values[...,1][...,None] * x)) + 1)
    return np.prod(ilogit_abx**y * (1 - ilogit_abx)**(n - y), axis=ilogit_abx.ndim -1)

grid_a , grid_b = np.meshgrid(A,B)
grid = np.empty(grid_a.shape + (2,)) 
grid[:, :, 0] = grid_a
grid[:, :, 1] = grid_b

posterior_density = prior.pdf(grid)*prop_likelihood(grid)

Which then can be illustrated:然后可以说明：

fig, ax = plt.subplots(figsize=(10, 5)
ax.imshow(
    posterior_density,
    origin='lower',
    aspect='auto',
    extent=(A[0], A[-1], B[0], B[-1])
)
ax.set_xlim([-4, 4])
ax.set_ylim([-10, 30])
ax.set_xlabel(r'$\alpha$')
ax.set_ylabel(r'$\beta$')
ax.set_title('Posterior heatmap')
ax.grid('off')

后验密度热图

Analytical solution:解析解：

def opt(params):
    a, b  = params[0], params[1]
    z = np.exp(a + b * x) / (1 +  np.exp(a + b * x))
    e = - np.sum(y * np.log(z) + (n - y) * np.log(1 - z))
    return e

optim_res = optimize.minimize(opt, np.array([0.0, 0.0]))
mu_opt = optim_res['x']
sigma_opt = optim_res['hess_inv']
posterior_optimized = stats.multivariate_normal(mean=mu_opt, cov=sigma_opt)

Which can then be plotted然后可以绘制

fig, ax = plt.subplots(figsize=(10, 5)
ax.imshow(
    posterior_optimized.pdf(grid),
    origin='lower',
    aspect='auto',
    extent=(A[0], A[-1], B[0], B[-1])
)
ax.set_xlim([-4, 4])
ax.set_ylim([-10, 30])
ax.set_xlabel(r'$\alpha$')
ax.set_ylabel(r'$\beta$')
ax.set_title('Posterior heatmap from analytical solution')
ax.grid('off')

解析求解后验

There are some differences.有一些差异。 Not sure if analytical optimization function is correct.不确定分析优化函数是否正确。

Hopefully this helps others.希望这对其他人有帮助。

联合正态先验分布的后验

问题描述

2 个解决方案

解决方案1
1 2018-10-05 22:27:43

Parameterizing Gaussian参数化高斯

Computing Grid Values, or Not计算网格值，或不

PyMC3 Example PyMC3 示例

Trace Plots跟踪图

Joint Plot联合图

解决方案2
1 2018-10-11 11:38:12

联合正态先验分布的后验

问题描述

2 个解决方案

解决方案1 1 2018-10-05 22:27:43

Parameterizing Gaussian参数化高斯

Computing Grid Values, or Not计算网格值，或不

PyMC3 Example PyMC3 示例

Trace Plots跟踪图

Joint Plot联合图

解决方案2 1 2018-10-11 11:38:12

解决方案1
1 2018-10-05 22:27:43

解决方案2
1 2018-10-11 11:38:12