简体   繁体   中英

How to sample from a distribution given the CDF in Python

I would like to draw samples from a probability distribution with CDF 1 - e^(-x^2) .

Is there a method in python/scipy/etc. to enable you to sample from a probability distribution given only its CDF?

To create a custom random variable class given a CDF you could subclass scipy.rv_continuous and override rv_continuous._cdf . This will then automatically generate the corresponding PDF and other statistical information about your distribution, eg

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

class MyRandomVariableClass(stats.rv_continuous):
    def __init__(self, xtol=1e-14, seed=None):
        super().__init__(a=0, xtol=xtol, seed=seed)

    def _cdf(self, x):
        return 1-np.exp(-x**2)


if __name__ == "__main__":
    my_rv = MyRandomVariableClass()

    # sample distribution
    samples = my_rv.rvs(size = 1000)

    # plot histogram of samples
    fig, ax1 = plt.subplots()
    ax1.hist(list(samples), bins=50)

    # plot PDF and CDF of distribution
    pts = np.linspace(0, 5)
    ax2 = ax1.twinx()
    ax2.set_ylim(0,1.1)
    ax2.plot(pts, my_rv.pdf(pts), color='red')
    ax2.plot(pts, my_rv.cdf(pts), color='orange')

    fig.tight_layout()
    plt.show()

Inverse Transform Sampling

To add on to the solution by Heike, you could use Inverse Transform Sampling to sample via the CDF:

import math, random
import matplotlib.pyplot as plt

def inverse_cdf(y):
    # Computed analytically
    return math.sqrt(math.log(-1/(y - 1)))

def sample_distribution():
    uniform_random_sample = random.random()
    return inverse_cdf(uniform_random_sample)

x = [sample_distribution() for i in range(10000)]
plt.hist(x, bins=50)
plt.show()

How SciPy Does It

I was very curious to see how this worked in SciPy, too. It actually looks like it does something very similar to the above. Based on the SciPy docs :

The default method _rvs relies on the inverse of the cdf, _ppf, applied to a uniform random variate. In order to generate random variates efficiently, either the default _ppf needs to be overwritten (eg if the inverse cdf can expressed in an explicit form) or a sampling method needs to be implemented in a custom _rvs method.

And based on the SciPy source code , the _ppf (ie, the inverse of the CDF) does in fact look to be approximated numerically if not specified explicitly. Very cool!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM