简体   繁体   English

是否可以通过scipy.stats使用statsmodels中的发行版?

[英]Is it possible to use a distribution from statsmodels with scipy.stats?

I'm using a certain StatsModels distribution ( Azzalini's Skew Student-t ) and I'd like to perform a (one-sample) Kolmogorov-Smirnov test with it. 我正在使用某个StatsModels发行版( Azzalini的Skew Student-t ),并且我想对其执行一个(一个样本)Kolmogorov-Smirnov测试。

Is it possible to use Scipy's kstest with a StatsModels distribution? 是否可以将Scipy的kstest与StatsModels分发一起使用? Scipy's documentation (rather vaguely) suggests that the cdf argument may be a String or a callable , with no further details or examples about the latter. Scipy的文档 (含糊地)建议cdf参数可以是String或callable ,没有关于后者的更多详细信息或示例。

On the other hand, the StatsModels' distribution I'm using has many of the methods that Scipy distributions do; 另一方面,我正在使用的StatsModels分布具有Scipy分布执行的许多方法。 thus, I'm supposing there is some way of using it as a callable argument passed to kstest . 因此,我想有某种方法可以将它用作传递给kstest可调用参数。 Am I wrong? 我错了吗?

Here is what I have so far. 这是我到目前为止所拥有的。 What I'd like to achieve is commented out in the last line: 最后一行注释了我想要实现的目标:

import statsmodels.sandbox.distributions.extras as azt
import scipy.stats as stats

x = ([-0.2833379 , -3.05224565,  0.13236267, -0.24549146, -1.75106484,
       0.95375723,  0.28628686,  0.        , -3.82529261, -0.26714159,
       1.07142857,  2.56183746, -1.89491817, -0.3414301 ,  1.11589663,
       -0.74540174, -0.60470106, -1.93307821,  1.56093656,  1.28078818])

# This is how kstest works.
print stats.kstest(x, stats.norm.cdf) #(0.21003262911224113, 0.29814145956367311)

# This is Statsmodels' distribution I'm using. It has a cdf function as well.
ast = azt.ACSkewT_gen()

# This is what I'd want. Executing this will throw a TypeError because ast.cdf 
# needs some shape parameters etc.
# print stats.kstest(x, ast.cdf) 

Note: I'll happily use two-sample KS test if what I'm expecting is not possible. 注意:如果无法达到预期的效果,我将很高兴使用两样本KS测试 Just wanted to know if this is possible. 只想知道这是否可能。

Those functions have been written a long time ago with scipy compatibility in mind. 这些功能是很久以前就编写的,并且考虑了科学兼容性。 But there were several changes in scipy in the meantime. 但是与此同时,scipy也发生了一些变化。

kstest has an args keyword for the distribution parameters. kstest具有用于分布参数的args关键字。

To get the distribution parameters we can try to estimate them by using the fit method of the scipy.stats distributions. 为了获得分布参数,我们可以尝试使用scipy.stats分布的fit方法来估计它们。 However, estimating all parameters prints some warnings and the estimated df parameter is large. 但是,估计所有参数会显示一些警告,并且估计的df参数很大。 If we fix df at specific values we get estimates without warnings that we can use in the call of kstest . 如果将df固定为特定值,我们将获得估计值而没有警告,可以在kstest调用中kstest

>>> ast.fit(x)
C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\integrate\quadpack.py:352: IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
  If increasing the limit yields no improvement it is advised to analyze 
  the integrand in order to determine the difficulties.  If the position of a 
  local difficulty can be determined (singularity, discontinuity) one will 
  probably gain from splitting up the interval and calling the integrator 
  on the subranges.  Perhaps a special-purpose integrator should be used.
  warnings.warn(msg, IntegrationWarning)
C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\integrate\quadpack.py:352: IntegrationWarning: The integral is probably divergent, or slowly convergent.
  warnings.warn(msg, IntegrationWarning)
(31834.800527154337, -2.3475921468088172, 1.3720725621594987, 2.2766515091760722)

>>> p = ast.fit(x, f0=100)
>>> print(stats.kstest(x, ast.cdf, args=p)) 
(0.13897385693057401, 0.83458552699682509)

>>> p = ast.fit(x, f0=5)
>>> print(stats.kstest(x, ast.cdf, args=p)) 
(0.097960232618178544, 0.990756154198281)

However , the distribution for the Kolmogorov-Smirnov test assumes that the distribution parameters are fixed and not estimated. 但是 ,Kolmogorov-Smirnov检验的分布假定分布参数是固定的而不是估计的。 If we estimate the parameters as above, then the p-value will not be correct since it is not based on the correct distribution. 如果我们按上述方式估算参数,则p值将不正确,因为它不是基于正确的分布。

For some distributions we can use tables for the kstest with estimated mean and scale parameter, eg the Lilliefors test kstest_normal in statsmodels. 对于某些分布,我们可以使用带有估计均值和小数位数参数的kstest表,例如statsmodels中的Lilliefors测试kstest_normal。 If we have estimated shape parameters, then the distribution of the ks test statistics will depend on the parameters of the model, and we could get the pvalue from bootstrapping. 如果我们估计了形状参数,则ks检验统计量的分布将取决于模型的参数,并且我们可以从自举中获得p值。

(I don't remember anything about estimating the parameters of the SkewT distribution and whether maximum likelihood estimation has any specific problems.) (我不记得有关估计SkewT分布的参数以及最大似然估计是否有任何特定问题的任何信息。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM