简体   繁体   中英

Issues creating a skew normal distribution by subclassing scipy.stats.rv_continuous

EDIT: Figured out the distribution. And got it working mostly, except for when the shape parameter is negative. The PDF should work for negative shape values but doesn't on the subclassed distribution.


I am trying to create a skewed normal distribution with scipy stats. I only need the PDF for now.

I subclassed rv_continuous but when I use the skew_norm.pdf(x, shape) I get an array of NaN.

Here is my class:

class skew_norm_gen(rv_continuous):
    def _pdf(self, x, s):
        return 2 * norm.pdf(x) * norm.cdf(x * s)

skew_norm = skew_norm_gen(name='skew_norm', shapes='s')

I've tried calculating the PDF directly (outside of the class) and that works.

Further, if I add in *args* can I pass the location & scale like I do for the normal distribution PDF norm.pdf(x, loc=mu, scale=std) :

class skew_norm_gen(rv_continuous):
    def _pdf(self, x, s, *args):
        return 2 * norm.pdf(x, *args) * norm.cdf(x * s, *args)

skew_norm = skew_norm_gen(name='skew_norm', shapes='s')

Thanks.


EDIT:

I also tried a simple example, thanks to suggestion by CT Zhu. The code below spits out a nan array sometimes and other times an array of values.

In [26]:
import scipy.stats as ss

class skew_norm_gen(ss.rv_continuous):
    def _pdf(self, x, s):
        return 2 * ss.norm.pdf(x) * ss.norm.cdf(x * s)
skew_norm = skew_norm_gen(name='skew_norm', shapes='s')

In [27]:
data = ss.norm.rvs(0, size=100)
s = ss.skew(data)
skew_norm.pdf(data, s)

Out[28]:
array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan])

EDIT 2:

The PDF spits out NaN if the shape parameter is < 0.

I can calculate the skewnorm PDF directly and it is fine. If I try to use the subclassed PDF it returns NaNs.

Can not replicate the error, see:

In [15]:
import scipy.stats as ss
class skew_norm_gen(ss.rv_continuous):
    def _pdf(self, x, s):
        return 2 * ss.norm.pdf(x) * ss.norm.cdf(x * s)
skew_norm = skew_norm_gen(name='skew_norm', shapes='s')

In [17]:
skew_norm.pdf(3, 4)
Out[17]:
0.0088636968238760151

Yes you can pass additional *args :

In [18]:

class skew_norm_gen(ss.rv_continuous):
    def _pdf(self, x, s, *args):
        return 2 * ss.norm.pdf(x, *args) * ss.norm.cdf(x * s, *args)
skew_norm = skew_norm_gen(name='skew_norm', shapes='s')

In [20]:
skew_norm.pdf(3, 4, loc=0.5, scale=3)
Out[20]:
0.18786061213807126

In [21]:
skew_norm.pdf(3, s=4, loc=0.5, scale=3)
Out[21]:
0.18786061213807126
In [22]:

skew_norm.pdf(3, s=4, loc=0, scale=1)
Out[22]:
0.0088636968238760151
In [28]:
plt.plot(np.linspace(-5, 5), skew_norm.pdf(np.linspace(-5,5),4), label='Skewed')
plt.plot(np.linspace(-5, 5), ss.norm.pdf(np.linspace(-5,5)), label='Normal')
plt.legend()    
Out[28]:
[<matplotlib.lines.Line2D at 0x1092667d0>]

在此输入图像描述

Edit:

In your example data, the s is negative, which causes resulting pdf to contain only nan , the default badvalue (I think that what is called) defined by rv_continuous .

The root of the problem is: there is a default _argcheck() method, to verify if the parameter(s) is/are valid. The default is to check if all the parameters are >0. In this case, it is not.

So the solution, is to overwrite the default _argchek() method, by:

class skew_norm_gen(ss.rv_continuous):
    def _argcheck(self, skew):
        return np.isfinite(skew) #I guess we can confine it to finite value
    def _pdf(self, x, skew):
        return 2 * ss.norm.pdf(x) * ss.norm.cdf(x * skew)  

And then it should work fine.

(Alos I will suggest call the additional parameter skew , just for readability. 's' could mean, say, standard deviation. etc.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM