来自 scipy 的 jarque_bera 计算

Question

i am trying to calculate the Jarque-Bera-Bera test (normality test) on my data that look like that (after chain operation) :我正在尝试在我的数据上计算 Jarque-Bera-Bera 测试（正态性测试），看起来像这样（链操作后）：

ranking Q1  Q2  Q3  Q4
Date                
2009-12-29  nan nan nan nan
2009-12-30  0.12    -0.21   -0.36   -0.39
2009-12-31  0.05    0.09    0.06    -0.02
2010-01-01  nan nan nan nan
2010-01-04  1.45    1.90    1.81    1.77
... ... ... ... ...
2020-10-13  -0.67   -0.59   -0.63   -0.61
2020-10-14  -0.05   -0.12   -0.05   -0.13
2020-10-15  -1.91   -1.62   -1.78   -1.91
2020-10-16  1.21    1.13    1.09    1.37
2020-10-19  -0.03   0.01    0.06    -0.02

I use a function like that :我使用这样的函数：

from scipy import stats

def stat(x):
    return pd.Series([x.mean(),
                      np.sqrt(x.var()),
                      stats.jarque_bera(x),
                      ],
                     index=['Return',
                            'Volatility',
                            'JB P-Value'
                            ])

data.apply(stat)

whereas the mean and variance calculation work fine, I have a error message stats.jarque_bera function with is :而均值和方差计算工作正常，我有一个错误消息stats.jarque_bera函数是：

ValueError: Length of passed values is 10, index implies 9.

Any idea ?任何的想法？

Answer 1

I tried to reproduce and the function works fine for me, by copying the 10 rows of data you are providing above.我试图通过复制您在上面提供的 10 行数据来重现并且该功能对我来说很好用。 This looks like a data input issue, where some column seems to have fewer values than the index of that pd.Series (effectively somehow len(data[col]) > len(data[col].index) ).这看起来像是一个数据输入问题，其中某些列的值似乎比该pd.Series的索引pd.Series （实际上以某种方式len(data[col]) > len(data[col].index) ）。 You can try to figure out which column it is by running a naive "debugging" function such as:您可以尝试通过运行一个简单的“调试”功能来找出它是哪一列，例如：

for col in data.columns: 
    if len(data[col].values) != len(data[col].index):
        print(f"Column {col} has more/less values than the index")

However, the Jarque-Bera test documentation on Scipy says that x can be any "array-like" structure, so you don't need to pass a pd.Series , which might run you into issues with missing values, etc. Essentially you can just pass a list of values and calculate their JB test statistic and p-value.然而，关于 Scipy 的Jarque-Bera 测试文档说x可以是任何“类似数组”的结构，所以你不需要传递pd.Series ，这可能会让你遇到缺失值等问题。本质上你可以只传递一个值列表并计算它们的 JB 检验统计量和 p 值。

So with that, I would modify your function to因此，我会将您的功能修改为

def stat(x):
    return pd.Series([x.mean(),
                      np.sqrt(x.var()),
                      stats.jarque_bera(x.dropna().values), # drop NaN and get numpy array instead of pd.Series
                      ],
                     index=['Return',
                            'Volatility',
                            'JB P-Value'
                            ])

来自 scipy 的 jarque_bera 计算

问题描述

1 个解决方案

解决方案1
0 2020-10-23 18:04:38

来自 scipy 的 jarque_bera 计算

问题描述

1 个解决方案

解决方案1 0 2020-10-23 18:04:38

解决方案1
0 2020-10-23 18:04:38