Scipy.stats gaussian_kde 从条件分布中重新采样

Question

I am using gaussian_kde from scipy.stats to fit a joint PDF from a multivariate data on, let's say, X and Y.我正在使用来自 scipy.stats 的 gaussian_kde 来拟合来自多元数据的联合 PDF，比如 X 和 Y。

Now I want to resample from this PDF conditionally on a value of X. For example, once my X=x, generate Y from its conditional distribution.现在我想根据 X 的值有条件地从这个 PDF 重新采样。例如，一旦我的 X=x，从它的条件分布中生成 Y。

Let's use the example from the documentation here .让我们使用此处文档中的示例。 kernel.resample(1) would generate a pair of (X,Y) over all of the distribution. kernel.resample(1)将在所有分布上生成一对 (X,Y)。 How could I generate Y once X is, for example, 0?例如，一旦 X 为 0，我如何生成 Y？

Answer 1

An approach could be to create a custom continuous distribution from a pdf.一种方法可能是从 pdf 创建自定义连续分布。 The pdf can be created from the kernel function. pdf 可以从kernel function 创建。 As the pdf needs an area of 1, the kernel limited to a given x0 should be scaled by the area.由于 pdf 需要面积为 1，因此限制为给定x0的 kernel 应按面积缩放。

The custom distribution seems to be quite slow though.不过，自定义分发似乎很慢。 A faster solution could be to create a histogram from ys = np.linspace(-10, 10, 1000); kernel(np.vstack([np.full_like(ys, x0), ys]))更快的解决方案可能是从ys = np.linspace(-10, 10, 1000); kernel(np.vstack([np.full_like(ys, x0), ys]))创建一个直方图。 ys = np.linspace(-10, 10, 1000); kernel(np.vstack([np.full_like(ys, x0), ys])) and use rv_histogram . ys = np.linspace(-10, 10, 1000); kernel(np.vstack([np.full_like(ys, x0), ys]))并使用rv_histogram 。 Still faster (but much less random) would be to use np.random.choice(..., p=...) with p calculated from the constrained kernel.更快（但随机性要小得多）将使用np.random.choice(..., p=...)和 p 从受约束的 kernel 计算。

The following code starts from an adoption of the linked example code of a 2D kde.以下代码从采用 2D kde 的链接示例代码开始。

import matplotlib.pyplot as plt
from scipy import stats
import numpy as np

def measure(n):
    m1 = np.random.normal(size=n)
    m2 = np.random.normal(scale=0.5, size=n)
    return m1 + m2, m1 - m2 ** 2

m1, m2 = measure(2000)
xmin = m1.min()
xmax = m1.max()
ymin = m2.min()
ymax = m2.max()

X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([X.ravel(), Y.ravel()])
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values)
Z = np.reshape(kernel(positions).T, X.shape)

x0 = 0.678

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(10, 4))
ax1.imshow(np.rot90(Z), cmap=plt.cm.magma_r, alpha=0.4, extent=[xmin, xmax, ymin, ymax])
ax1.plot(m1, m2, 'k.', markersize=2)
ax1.axvline(x0, color='dodgerblue', ls=':')
ax1.set_xlim([xmin, xmax])
ax1.set_ylim([ymin, ymax])

# create a distribution given the kernel function limited to x=x0
class Special_distrib(stats.rv_continuous):
    def _pdf(self, y, x0, area_x0):
        return kernel(np.vstack([np.full_like(y, x0), y])) / area_x0

ys = np.linspace(-10, 10, 1000)
area_x0 = np.trapz(kernel(np.vstack([np.full_like(ys, x0), ys])), ys)

special_distr = Special_distrib(name="special")

vals = special_distr.rvs(x0, area_x0, size=500)
ax2.hist(vals, bins=20, color='dodgerblue')

plt.show()

Scipy.stats gaussian_kde 从条件分布中重新采样

问题描述

1 个解决方案

解决方案1
0 2020-07-31 12:34:00

Scipy.stats gaussian_kde 从条件分布中重新采样

问题描述

1 个解决方案

解决方案1 0 2020-07-31 12:34:00

解决方案1
0 2020-07-31 12:34:00