简体   繁体   中英

Multivariate normal CDF in Python

I am looking for a function to compute the CDF for a multivariate normal distribution. I have found that scipy.stats.multivariate_normal have only a method to compute the PDF (for a sample x ) but not the CDF multivariate_normal.pdf(x, mean=mean, cov=cov)

I am looking for the same thing but to compute the cdf, something like: multivariate_normal.cdf(x, mean=mean, cov=cov) , but unfortunately multivariate_normal doesn't have a cdf method.

The only thing that I found is this: Multivariate Normal CDF in Python using scipy but the presented method scipy.stats.mvn.mvnun(lower, upper, means, covar) doesn't take a sample x as a parameter, so I don't really see how to use it to have something similar to what I said above.

This is just a clarification of the points that @sascha made above in the comments for the answer. The relevant function can be found here :

As an example, in a multivariate normal distribution with diagonal covariance the cfd should give (1/4) * Total area = 0.25 (look at the scatterplot below if you don't understand why) The following example will allow you to play with it:

from statsmodels.sandbox.distributions.extras import mvnormcdf
from scipy.stats import mvn

for i in range(1, 20, 2):
    cov_example = np.array(((i, 0), (0, i)))
    mean_example = np.array((0, 0))
    print(mvnormcdf(upper=upper, mu=mean_example, cov=cov_example))

The output of this is 0.25, 0.25, 0.25, 0.25...


在此输入图像描述

The CDF of some distribution is actually an integral over the PDF of that distribution. That being so, you need to provide the function with the boundaries of the integral.

What most people mean when they ask for a p_value of some point in relation to some distribution is:

what is the chance of getting these values or higher given this distribution?

Note the area marked in red - it is not a point, but rather an integral from some point onwards:

在此输入图像描述

Accordingly, you need to set your point as the lower boundary, +inf (or some arbitrarily high enough value) as the upper boundary and provide the means and covariance matrix you already have:

from sys import maxsize

def mvn_p_value(x, mu, cov_matrix):
    upper_bounds = np.array([maxsize] * x.size)  # make an upper bound the size of your vector
    p_value = scipy.stats.mvn.mvnun(x, upper_bounds, mu, cov_matrix)[1]
    if 0.5 < p_value:  # this inversion is used for two-sided statistical testing
        p_value = 1 - p_value
    return p_value

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM