简体   繁体   English

来自 scipy 的 jarque_bera 计算

[英]jarque_bera calculation from scipy

i am trying to calculate the Jarque-Bera-Bera test (normality test) on my data that look like that (after chain operation) :我正在尝试在我的数据上计算 Jarque-Bera-Bera 测试(正态性测试),看起来像这样(链操作后):

ranking Q1  Q2  Q3  Q4
Date                
2009-12-29  nan nan nan nan
2009-12-30  0.12    -0.21   -0.36   -0.39
2009-12-31  0.05    0.09    0.06    -0.02
2010-01-01  nan nan nan nan
2010-01-04  1.45    1.90    1.81    1.77
... ... ... ... ...
2020-10-13  -0.67   -0.59   -0.63   -0.61
2020-10-14  -0.05   -0.12   -0.05   -0.13
2020-10-15  -1.91   -1.62   -1.78   -1.91
2020-10-16  1.21    1.13    1.09    1.37
2020-10-19  -0.03   0.01    0.06    -0.02

I use a function like that :我使用这样的函数:

from scipy import stats

def stat(x):
    return pd.Series([x.mean(),
                      np.sqrt(x.var()),
                      stats.jarque_bera(x),
                      ],
                     index=['Return',
                            'Volatility',
                            'JB P-Value'
                            ])

data.apply(stat)

whereas the mean and variance calculation work fine, I have a error message stats.jarque_bera function with is :而均值和方差计算工作正常,我有一个错误消息stats.jarque_bera函数是:

ValueError: Length of passed values is 10, index implies 9.

Any idea ?任何的想法 ?

I tried to reproduce and the function works fine for me, by copying the 10 rows of data you are providing above.我试图通过复制您在上面提供的 10 行数据来重现并且该功能对我来说很好用。 This looks like a data input issue, where some column seems to have fewer values than the index of that pd.Series (effectively somehow len(data[col]) > len(data[col].index) ).这看起来像是一个数据输入问题,其中某些列的值似乎比该pd.Series的索引pd.Series (实际上以某种方式len(data[col]) > len(data[col].index) )。 You can try to figure out which column it is by running a naive "debugging" function such as:您可以尝试通过运行一个简单的“调试”功能来找出它是哪一列,例如:

for col in data.columns: 
    if len(data[col].values) != len(data[col].index):
        print(f"Column {col} has more/less values than the index")

However, the Jarque-Bera test documentation on Scipy says that x can be any "array-like" structure, so you don't need to pass a pd.Series , which might run you into issues with missing values, etc. Essentially you can just pass a list of values and calculate their JB test statistic and p-value.然而,关于 Scipy 的Jarque-Bera 测试文档x可以是任何“类似数组”的结构,所以你不需要传递pd.Series ,这可能会让你遇到缺失值等问题。本质上你可以只传递一个值列表并计算它们的 JB 检验统计量和 p 值。

So with that, I would modify your function to因此,我会将您的功能修改为

def stat(x):
    return pd.Series([x.mean(),
                      np.sqrt(x.var()),
                      stats.jarque_bera(x.dropna().values), # drop NaN and get numpy array instead of pd.Series
                      ],
                     index=['Return',
                            'Volatility',
                            'JB P-Value'
                            ])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM