简体   繁体   English

多维置信区间

[英]multidimensional confidence intervals

I have numerous tuples (par1,par2), ie points in a 2 dimensional parameter space obtained from repeating an experiment multiple times. 我有许多元组(par1,par2),即通过多次重复实验获得的二维参数空间中的点。

I'm looking for a possibility to calculate and visualize confidence ellipses (not sure if thats the correct term for this). 我正在寻找计算和可视化置信椭圆的可能性(不确定这是否是正确的术语)。 Here an example plot that I found in the web to show what I mean: 这是我在网上找到的一个示例图,用于显示我的意思:

在此输入图像描述

source: blogspot.ch/2011/07/classification-and-discrimination-with.html 来源:blogspot.ch/2011/07/classification-and-discrimination-with.html

So in principle one has to fit a multivariate normal distribution to a 2D histogram of data points I guess. 所以原则上我必须将多元正态分布拟合到数据点的二维直方图。 Can somebody help me with this? 有人可以帮我这个吗?

It sounds like you just want the 2-sigma ellipse of the scatter of points? 听起来你只是想要分散点的2-sigma椭圆?

If so, consider something like this (From some code for a paper here: https://github.com/joferkington/oost_paper_code/blob/master/error_ellipse.py ): 如果是这样,请考虑这样的事情(从这里的一些代码: https//github.com/joferkington/oost_paper_code/blob/master/error_ellipse.py ):

import numpy as np

import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

def plot_point_cov(points, nstd=2, ax=None, **kwargs):
    """
    Plots an `nstd` sigma ellipse based on the mean and covariance of a point
    "cloud" (points, an Nx2 array).

    Parameters
    ----------
        points : An Nx2 array of the data points.
        nstd : The radius of the ellipse in numbers of standard deviations.
            Defaults to 2 standard deviations.
        ax : The axis that the ellipse will be plotted on. Defaults to the 
            current axis.
        Additional keyword arguments are pass on to the ellipse patch.

    Returns
    -------
        A matplotlib ellipse artist
    """
    pos = points.mean(axis=0)
    cov = np.cov(points, rowvar=False)
    return plot_cov_ellipse(cov, pos, nstd, ax, **kwargs)

def plot_cov_ellipse(cov, pos, nstd=2, ax=None, **kwargs):
    """
    Plots an `nstd` sigma error ellipse based on the specified covariance
    matrix (`cov`). Additional keyword arguments are passed on to the 
    ellipse patch artist.

    Parameters
    ----------
        cov : The 2x2 covariance matrix to base the ellipse on
        pos : The location of the center of the ellipse. Expects a 2-element
            sequence of [x0, y0].
        nstd : The radius of the ellipse in numbers of standard deviations.
            Defaults to 2 standard deviations.
        ax : The axis that the ellipse will be plotted on. Defaults to the 
            current axis.
        Additional keyword arguments are pass on to the ellipse patch.

    Returns
    -------
        A matplotlib ellipse artist
    """
    def eigsorted(cov):
        vals, vecs = np.linalg.eigh(cov)
        order = vals.argsort()[::-1]
        return vals[order], vecs[:,order]

    if ax is None:
        ax = plt.gca()

    vals, vecs = eigsorted(cov)
    theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))

    # Width and height are "full" widths, not radius
    width, height = 2 * nstd * np.sqrt(vals)
    ellip = Ellipse(xy=pos, width=width, height=height, angle=theta, **kwargs)

    ax.add_artist(ellip)
    return ellip

if __name__ == '__main__':
    #-- Example usage -----------------------
    # Generate some random, correlated data
    points = np.random.multivariate_normal(
            mean=(1,1), cov=[[0.4, 9],[9, 10]], size=1000
            )
    # Plot the raw points...
    x, y = points.T
    plt.plot(x, y, 'ro')

    # Plot a transparent 3 standard deviation covariance ellipse
    plot_point_cov(points, nstd=3, alpha=0.5, color='green')

    plt.show()

在此输入图像描述

Refer the post How to draw a covariance error ellipse . 请参阅如何绘制协方差误差椭圆

Here's the python realization: 这是python的实现:

import numpy as np
from scipy.stats import norm, chi2

def cov_ellipse(cov, q=None, nsig=None, **kwargs):
    """
    Parameters
    ----------
    cov : (2, 2) array
        Covariance matrix.
    q : float, optional
        Confidence level, should be in (0, 1)
    nsig : int, optional
        Confidence level in unit of standard deviations. 
        E.g. 1 stands for 68.3% and 2 stands for 95.4%.

    Returns
    -------
    width, height, rotation :
         The lengths of two axises and the rotation angle in degree
    for the ellipse.
    """

    if q is not None:
        q = np.asarray(q)
    elif nsig is not None:
        q = 2 * norm.cdf(nsig) - 1
    else:
        raise ValueError('One of `q` and `nsig` should be specified.')
    r2 = chi2.ppf(q, 2)

    val, vec = np.linalg.eigh(cov)
    width, height = 2 * sqrt(val[:, None] * r2)
    rotation = np.degrees(arctan2(*vec[::-1, 0]))

    return width, height, rotation

The meaning of standard deviation is wrong in the answer of Joe Kington. 在Joe Kington的回答中, 标准偏差的含义是错误的。 Usually we use 1, 2 sigma for 68%, 95% confidence levels, but the 2 sigma ellipse in his answer does not contain 95% probability of the total distribution. 通常我们使用1,2西格玛为68%,95%置信水平,但他的答案中的2西格玛椭圆不包含95%的总分布概率。 The correct way is using a chi square distribution to esimate the ellipse size as shown in the post . 正确的方法是使用卡方分布来估算椭圆大小,如帖子所示。

I slightly modified one of the examples above that plots the error or confidence region contours. 我略微修改了上面的一个示例,它绘制了错误或置信区域轮廓。 Now I think it gives the right contours. 现在我认为它给出了正确的轮廓。

It was giving the wrong contours because it was applying the scoreatpercentile method to the joint dataset (blue + red points) when it should be applied separately to each dataset. 它给出了错误的轮廓,因为当它应该分别应用于每个数据集时,它将scoreatpercentile方法应用于联合数据集(蓝色+红色点)。

The modified code can be found below: 修改后的代码可以在下面找到:

import numpy
import scipy
import scipy.stats
import matplotlib.pyplot as plt

# generate two normally distributed 2d arrays
x1=numpy.random.multivariate_normal((100,420),[[120,80],[80,80]],400)
x2=numpy.random.multivariate_normal((140,340),[[90,-70],[-70,80]],400)

# fit a KDE to the data
pdf1=scipy.stats.kde.gaussian_kde(x1.T)
pdf2=scipy.stats.kde.gaussian_kde(x2.T)

# create a grid over which we can evaluate pdf
q,w=numpy.meshgrid(range(50,200,10), range(300,500,10))
r1=pdf1([q.flatten(),w.flatten()])
r2=pdf2([q.flatten(),w.flatten()])

# sample the pdf and find the value at the 95th percentile
s1=scipy.stats.scoreatpercentile(pdf1(pdf1.resample(1000)), 5)
s2=scipy.stats.scoreatpercentile(pdf2(pdf2.resample(1000)), 5)

# reshape back to 2d
r1.shape=(20,15)
r2.shape=(20,15)

# plot the contour at the 95th percentile
plt.contour(range(50,200,10), range(300,500,10), r1, [s1],colors='b')
plt.contour(range(50,200,10), range(300,500,10), r2, [s2],colors='r')

# scatter plot the two normal distributions
plt.scatter(x1[:,0],x1[:,1],alpha=0.3)
plt.scatter(x2[:,0],x2[:,1],c='r',alpha=0.3)

I guess what you are looking for is to compute the Confidence Regions . 我想你要找的是计算置信区域

I don't know much how about it, but as a starting point, I would check the sherpa application for python. 我不知道怎么样,但作为一个起点,我会检查python的sherpa应用程序。 At least, in their Scipy 2011 talk, authors mention that you can determine and obtain confidence regions with it (you may need to have a model for your data though). 至少,在他们的Scipy 2011演讲中,作者提到你可以用它确定并获得置信区域(你可能需要为你的数据建立一个模型)。

See the video and corresponding slides of the Sherpa talk. 请参阅夏尔巴谈话的视频和相应幻灯片

HTH HTH

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM