简体   繁体   中英

numpy.random.normal different distribution: selecting values from distribution

I have a power-law distribution of energies and I want to pick n random energies based on the distribution. I tried doing this manually using random numbers but it is too inefficient for what I want to do. I'm wondering is there a method in numpy (or other) that works like numpy.random.normal , except instead of a using normal distribution, the distribution may be specified. So in my mind an example might look like (similar to numpy.random.normal):

import numpy as np

# Energies from within which I want values drawn
eMin = 50.
eMax = 2500.

# Amount of energies to be drawn
n = 10000

photons = []

for i in range(n):

    # Method that I just made up which would work like random.normal,
    # i.e. return an energy on the distribution based on its probability,
    # but take a distribution other than a normal distribution
    photons.append(np.random.distro(eMin, eMax, lambda e: e**(-1.)))


Printing photons should give me a list of length 10000 populated by energies in this distribution. If I were to histogram this it would have much greater bin values at lower energies.

I am not sure if such a method exists but it seems like it should. I hope it is clear what I want to do.


I have seen numpy.random.power but my exponent is -1 so I don't think this will work.

Sampling from arbitrary PDFs well is actually quite hard. There are large and dense books just about how to efficiently and accurately sample from the standard families of distributions.

It looks like you could probably get by with a custom inversion method for the example that you gave.

If you want to sample from an arbitrary distribution you need the inverse of the cumulative density function (not the pdf).

You then sample a probability uniformly from range [0,1] and feed this into the inverse of the cdf to get the corresponding value.

It is often not possible to obtain the cdf from the pdf analytically. However, if you're happy to approximate the distribution, you could do so by calculating f(x) at regular intervals over its domain, then doing a cumsum over this vector to get an approximation of the cdf and from this approximate the inverse.

Rough code snippet:

import matplotlib.pyplot as plt
import numpy as np
import scipy.interpolate

def f(x):
   substitute this function with your arbitrary distribution
   must be positive over domain
   return 1/float(x)

#you should vary inputVals to cover the domain of f (for better accurracy you can
#be clever about spacing of values as well). Here i space them logarithmically
#up to 1 then at regular intervals but you could definitely do better
inputVals = np.hstack([1.**np.arange(-1000000,0,100),range(1,10000)])

#everything else should just work
funcVals = np.array([f(x) for x in inputVals])
cdf = np.zeros(len(funcVals))
diff = np.diff(funcVals)
for i in xrange(1,len(funcVals)):
   cdf[i] = cdf[i-1]+funcVals[i-1]*diff[i-1]
cdf /= cdf[-1]

#you could also improve the approximation by choosing appropriate interpolator
inverseCdf = scipy.interpolate.interp1d(cdf,inputVals)

#grab 10k samples from distribution
samples = [inverseCdf(x) for x in np.random.uniform(0,1,size = 100000)]


Why don't you use eval and put the distribution in a string?

>>> cmd = "numpy.random.normal(500)"
>>> eval(cmd)

you can manipulate the string as you wish to set the distribution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM