简体   繁体   English

使用scipy.stats使用自定义分布拟合数据

[英]Fitting data with a custom distribution using scipy.stats

So I noticed that there is no implementation of the Skewed generalized t distribution in scipy . 所以我注意到scipy中没有scipy 广义t分布scipy It would be useful for me to fit this is distribution to some data I have. 对我来说,适合这是对我的一些数据的分配是有用的。 Unfortunately fit doesn't seem to be working in this case for me. 不幸的是,在这种情况下, fit似乎并不适用于我。 To explain further I have implemented it like so 为了进一步解释,我已经实现了它

import numpy as np
import pandas as pd
import scipy.stats as st
from scipy.special import beta

class sgt(st.rv_continuous):

    def _pdf(self, x, mu, sigma, lam, p, q):

        v = q ** (-1 / p) * \
            ((3 * lam ** 2 + 1) * (
                    beta(3 / p, q - 2 / p) / beta(1 / p, q)) - 4 * lam ** 2 *
             (beta(2 / p, q - 1 / p) / beta(1 / p, q)) ** 2) ** (-1 / 2)

        m = 2 * v * sigma * lam * q ** (1 / p) * beta(2 / p, q - 1 / p) / beta(
            1 / p, q)

        fx = p / (2 * v * sigma * q ** (1 / p) * beta(1 / p, q) * (
                abs(x - mu + m) ** p / (q * (v * sigma) ** p) * (
                lam * np.sign(x - mu + m) + 1) ** p + 1) ** (
                          1 / p + q))

        return fx

    def _argcheck(self, mu, sigma, lam, p, q):

        s = sigma > 0
        l = -1 < lam < 1
        p_bool = p > 0
        q_bool = q > 0

        all_bool = s & l & p_bool & q_bool

        return all_bool

This all works fine and I can generate random variables with given parameters no problem. 这一切都很好,我可以生成给定参数的随机变量没问题。 The _argcheck is required as a simple positive params only check is not suitable. _argcheck是必需的,因为只有一个简单的积极参数检查不合适。

sgt_inst = sgt(name='sgt')
vars = sgt_inst.rvs(mu=1, sigma=3, lam = -0.1, p = 2, q = 50, size = 100)

However, when I try fit these parameters I get an error 但是,当我尝试fit这些参数时,我得到一个错误

sgt_inst.fit(vars)

RuntimeWarning: invalid value encountered in subtract RuntimeWarning:减法中遇到无效值
numpy.max(numpy.abs(fsim[0] - fsim[1:])) <= fatol): numpy.max(numpy.abs(fsim [0] - fsim [1:]))<= fatol):

and it just returns 它只是回来了

What I find strange is that when I implement the example custom Gaussian distribution as shown in the docs , it has no problem running the fit method. 我发现奇怪的是,当我实现文档中所示的示例自定义高斯分布时,运行fit方法没有问题。

Any ideas? 有任何想法吗?

As fit docstring says, 正如fit docstring所说,

Starting estimates for the fit are given by input arguments; 拟合的起始估计由输入参数给出; for any arguments not provided with starting estimates, self._fitstart(data) is called to generate such. 对于没有提供起始估计的任何参数, self._fitstart(data)来生成这样的参数。

Calling sgt_inst._fitstart(data) returns (1.0, 1.0, 1.0, 1.0, 1.0, 0, 1) (the first five are shape parameters, the last two are loc and scale). 调用sgt_inst._fitstart(data)返回(1.0, 1.0, 1.0, 1.0, 1.0, 0, 1) sgt_inst._fitstart(data) (1.0, 1.0, 1.0, 1.0, 1.0, 0, 1) (前五个是形状参数,后两个是loc和scale)。 Looks like _fitstart is not a sophisticated process. 看起来_fitstart不是一个复杂的过程。 The parameter l it picks does not meet your argcheck requirement. 参数l它选择不符合您的要求argcheck。

Conclusion: provide your own starting parameters for fit , eg, 结论:提供fit自己的起始参数,例如:

sgt_inst.fit(data, 0.5, 0.5, -0.5, 2, 10)

returns (1.4587093459289049, 5.471769032259468, -0.02391466905874927, 7.07289326147152 4, 0.741434497805832, -0.07012808188413872, 0.5308181287869771) for my random data. 对于我的随机数据(1.4587093459289049, 5.471769032259468, -0.02391466905874927, 7.07289326147152 4, 0.741434497805832, -0.07012808188413872, 0.5308181287869771)返回(1.4587093459289049, 5.471769032259468, -0.02391466905874927, 7.07289326147152 4, 0.741434497805832, -0.07012808188413872, 0.5308181287869771)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM