[英]Using scipy.stats library or another method to generate data follows a distribution in a specific boundary
I want to sample with scipy.stats
library, using an upper and a lower boundary for the sampled data. 我想使用
scipy.stats
库进行采样,对采样数据使用上下边界。 I am interested to use scipy.stats.lognorm
and scipy.stats.expon
and set a constrain (low<=x<=up)
on the limits of generated data points and also estimate logp
with considering these limits. 我有兴趣使用
scipy.stats.lognorm
和scipy.stats.expon
并在生成的数据点的限制上设置一个约束(low<=x<=up)
,并在考虑这些限制的情况下估计logp
。 For instance, I can not do 例如我做不到
LogNormal=scipy.stats.lognorm(q=[0,5],scale=[0.25],loc=0.0) #q:upper and lower limits, scale=sigma, loc=mean
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/vol/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 739, in __call__
return self.freeze(*args, **kwds)
File "/vol/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 736, in freeze
return rv_frozen(self, *args, **kwds)
File "/vol/anaconda/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 434, in __init__
shapes, _, _ = self.dist._parse_args(*args, **kwds)
TypeError: _parse_args() got an unexpected keyword argument 'q'
The documentation is a bit confusing, which one is sigma
and which input parameter is mean
? 文档有点混乱,哪个是
sigma
,哪个输入参数是mean
? Could anybody give an example, how they should be set with boundaries? 谁能举一个例子,如何设置边界?
There are several problems in your implementation 您的实施中存在几个问题
1, your pdf can not be evaluated at x=0 1,无法在x = 0处评估您的pdf
2, -log(1./sqrt(2*pi)/self.sigma*exp(-0.5*((log(value)-self.mu)/self.sigma)**2))
should be: -log(1./sqrt(2*pi)/self.sigma/value*exp(-0.5*((log(value)-self.mu)/self.sigma)**2))
2,
-log(1./sqrt(2*pi)/self.sigma*exp(-0.5*((log(value)-self.mu)/self.sigma)**2))
应该是: -log(1./sqrt(2*pi)/self.sigma/value*exp(-0.5*((log(value)-self.mu)/self.sigma)**2))
(And there may be more) (可能还有更多)
Another consideration is that you may want to keep the parameterization the same as scipy
to avoid future confusion. 另一个考虑因素是您可能希望将参数化设置与
scipy
相同,以避免将来造成混淆。
Therefore, a minimal implementation: 因此,一个最小的实现:
In [112]:
import scipy.stats as ss
import scipy.optimize as so
import numpy as np
class bounded_distr(object):
def __init__(self, parent_dist):
self.parent = parent_dist
def bnd_lpdf(self, x, limits=None, *args, **kwargs):
if limits and np.diff(limits)<=0:
return -np.inf #nan may be better idea
else:
_v = -log(self.parent.pdf(x, *args, **kwargs))
_v[x<=limits[0]] = -np.inf
_v[x>=limits[1]] = -np.inf
return _v
def bnd_cdf(self, x, limits=None, *args, **kwargs):
if limits and np.diff(limits)<=0:
return 0 #nan may be better idea
elif limits:
_v1 = self.parent.cdf(x, *args, **kwargs)
_v2 = self.parent.cdf(limits[0], *args, **kwargs)
_v3 = self.parent.cdf(limits[1], *args, **kwargs)
_v4 = (_v1-_v2)/(_v3-_v2)
_v4[_v4<0] = np.nan
_v4[_v4>1] = np.nan
return _v4
else:
return self.parent.cdf(x, *args, **kwargs)
def bnd_rvs(self, size, limits=None, *args, **kwargs):
if limits and np.diff(limits)<=0:
return np.repeat(np.nan, size) #nan may be better idea
elif limits:
low, high = limits
rnd_cdf = np.random.uniform(self.parent.cdf(x=low, *args, **kwargs),
self.parent.cdf(x=high, *args, **kwargs),
size=size)
return self.parent.ppf(q=rnd_cdf, *args, **kwargs)
else:
return self.parent.rvs(size=size, *args, **kwargs)
In [113]:
bnd_logn = bounded_distr(ss.lognorm)
In [114]:
bnd_logn.bnd_rvs(10, limits=(0.1, 0.9), s=1, loc=0)
Out[114]:
array([ 0.23167598, 0.43185726, 0.34763109, 0.71020467, 0.5216074 ,
0.60883528, 0.34353607, 0.84530444, 0.64145739, 0.82082447])
In [115]:
bnd_logn.bnd_lpdf(np.linspace(0,1,10), limits=(0.1, 0.9), s=1, loc=0)
Out[115]:
array([ inf, 1.13561188, 0.54598554, 0.42380072, 0.43681222,
0.50389845, 0.5956744 , 0.69920358, 0.80809192, 0.91893853])
In [116]:
bnd_logn.bnd_cdf(np.linspace(0,1,10), limits=(0.1, 0.9), s=1, loc=0)
Out[116]:
array([ nan, 0.00749028, 0.12434152, 0.28010562, 0.44267888,
0.59832448, 0.74188947, 0.87201574, 0.98899161, nan])
I could finally write two classes of prior, which can also sample data based on the given distribution in the given limits. 最后,我可以写出两类prior,它们也可以根据给定范围内的给定分布对数据进行采样。 I used the inverse sampling method to sample data.
我使用逆采样方法对数据进行采样。 My classes are given as following:
我的课程如下:
import os, sys
import logging
import scipy.stats
from numpy import exp, sqrt, log, isfinite, inf, pi
import scipy.special
import scipy.optimize
class LogPrior(object):
def eval(self, value):
return 0.
def __call__(self, value):
return self.eval(value)
def sample(self, n=None):
""" Sample from this prior. The returned array axis=0 is the
sample axis.
Parameters
----------
n : int (optional)
Number of samples to draw
"""
raise ValueError("Cannot sample from a LogPrior object.")
def __str__(self):
return "<LogPrior>"
def __repr__(self):
return self.__str__()
Update: The class of Lognormal distribution : 更新:对数正态分布的类:
class LognormalPrior(LogPrior):
"""
Log-normal log-likelihood.
Distribution of any random variable whose logarithm is normally
distributed. A variable might be modeled as log-normal if it can
be thought of as the multiplicative product of many small
independent factors.
.. math::
f(x \mid \mu, \tau) = \sqrt{\frac{\tau}{2\pi}}\frac{
\exp\left\{ -\frac{\tau}{2} (\ln(x)-\mu)^2 \right\}}{x}
:Parameters:
- `x` : x > 0
- `mu` : Location parameter.
- `tau` : Scale parameter (tau > 0).
.. note::
:math:`E(X)=e^{\mu+\frac{1}{2\tau}}`
:math:`Var(X)=(e^{1/\tau}-1)e^{2\mu+\frac{1}{\tau}}`
"""
def __init__(self, mu, tau, *args, **kwargs):
super(LognormalPrior, self).__init__(*args, **kwargs)
self.mu = mu
self.tau = tau
self.mean = exp(mu + 1./(2*tau))
self.median = exp(mu)
self.mode = exp(mu - 1./tau)
self.variance = (exp(1./tau) - 1) * exp(2*mu + 1./tau)
self.sigma=1./sqrt(tau)
def logp(self, value, limits=None):
if limits:
lower,upper=limits
"""Log of lognormal prior probability with hard limits."""
if value >= lower and value <= upper:
return -log(1./sqrt(2*pi)/value/self.sigma*exp(-0.5*((log(value)-self.mu)/self.sigma)**2))
else:
return -inf
else:
"""Log of normal prior probability."""
return -log(1./sqrt(2*pi)/value/self.sigma*exp(-0.5*((log(value)-self.mu)/self.sigma)**2))
#Cumulative distribution function of lognormal distribution
def cdf(self, value):
if not isinstance(value, float):
res=np.empty_like(value)
for i in range(res.shape[0]):
if value[i]==0.0:
res[i]=0.0
else:
res[i]=0.5+0.5*scipy.special.erf((log(value[i])-self.mu)/(sqrt(2)*self.sigma))
return res
else:
if value==0.0:
return 0.0
else:
return 0.5+0.5*scipy.special.erf((log(value)-self.mu)/(sqrt(2)*self.sigma))
#sampling data with the given distribution
def sample(self, n, limits=None):
res=np.empty(n)
if limits:
lower,upper=limits
j=0
while (j<n):
def f(x):
return self.cdf(x)-np.random.uniform(low=0,high=1,size=1)
s=scipy.optimize.brenth(f,0,20)
if s >= lower and s <= upper:
res[j]=s
j+=1
else:
r=np.random.uniform(low=0,high=1,size=n)
for j in range(n):
def f(x):
return self.cdf(x)-r[j]
s=scipy.optimize.brenth(f,0,20)
res[j]=s
return res
The class of Exponential distribution 指数分布的类别
class ExponentialPrior(LogPrior):
"""
Exponential distribution
Parameters
----------
lam : float
lam > 0
rate or inverse scale
"""
def __init__(self, lam, *args, **kwargs):
super(ExponentialPrior, self).__init__(*args, **kwargs)
self.lam = lam
self.mean = 1. / lam
self.median = self.mean * log(2)
self.mode = 0
self.variance = lam ** -2
def logp(self, value, limits=None):
if limits:
lower,upper=limits
"""Log of lognormal prior probability with hard limits."""
if value >= lower and value <= upper:
return -log(self.lam)+self.lam*value
else:
return -inf
else:
"""Log of normal prior probability."""
return -log(self.lam)+self.lam*value
def cdf(self, value):
"""Cumulative distribution function lognormal function"""
return (1-exp(-self.lam*value))
#sampling data with the given distribution
def sample(self, n, limits=None):
res=np.empty(n)
if limits:
lower,upper=limits
j=0
while (j<n):
def f(x):
return self.cdf(x)-np.random.uniform(low=0,high=1,size=1)
s=scipy.optimize.brenth(f,0,100)
if s >= lower and s <= upper:
res[j]=s
j+=1
else:
r=np.random.uniform(low=0,high=1,size=n)
for j in range(n):
def f(x):
return self.cdf(x)-r[j]
s=scipy.optimize.brenth(f,0,100)
res[j]=s
return res
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm
mean = 4.0 # Geometric mean == median
standard_deviation = 2.0 # Geometric standard deviation
sigma = np.log(standard_deviation) # Standard deviation of log(X)
x = np.linspace(0.1, 25, num=400) # values for x-axis
pdf = lognorm.pdf(x, sigma, loc=0, scale=mean) # probability distribution
plt.plot(x,pdf)
plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.