[英]jarque_bera calculation from scipy
i am trying to calculate the Jarque-Bera-Bera test (normality test) on my data that look like that (after chain operation) :我正在尝试在我的数据上计算 Jarque-Bera-Bera 测试(正态性测试),看起来像这样(链操作后):
ranking Q1 Q2 Q3 Q4
Date
2009-12-29 nan nan nan nan
2009-12-30 0.12 -0.21 -0.36 -0.39
2009-12-31 0.05 0.09 0.06 -0.02
2010-01-01 nan nan nan nan
2010-01-04 1.45 1.90 1.81 1.77
... ... ... ... ...
2020-10-13 -0.67 -0.59 -0.63 -0.61
2020-10-14 -0.05 -0.12 -0.05 -0.13
2020-10-15 -1.91 -1.62 -1.78 -1.91
2020-10-16 1.21 1.13 1.09 1.37
2020-10-19 -0.03 0.01 0.06 -0.02
I use a function like that :我使用这样的函数:
from scipy import stats
def stat(x):
return pd.Series([x.mean(),
np.sqrt(x.var()),
stats.jarque_bera(x),
],
index=['Return',
'Volatility',
'JB P-Value'
])
data.apply(stat)
whereas the mean and variance calculation work fine, I have a error message stats.jarque_bera
function with is :而均值和方差计算工作正常,我有一个错误消息stats.jarque_bera
函数是:
ValueError: Length of passed values is 10, index implies 9.
Any idea ?任何的想法 ?
I tried to reproduce and the function works fine for me, by copying the 10 rows of data you are providing above.我试图通过复制您在上面提供的 10 行数据来重现并且该功能对我来说很好用。 This looks like a data input issue, where some column seems to have fewer values than the index of that pd.Series
(effectively somehow len(data[col]) > len(data[col].index)
).这看起来像是一个数据输入问题,其中某些列的值似乎比该pd.Series
的索引pd.Series
(实际上以某种方式len(data[col]) > len(data[col].index)
)。 You can try to figure out which column it is by running a naive "debugging" function such as:您可以尝试通过运行一个简单的“调试”功能来找出它是哪一列,例如:
for col in data.columns:
if len(data[col].values) != len(data[col].index):
print(f"Column {col} has more/less values than the index")
However, the Jarque-Bera test documentation on Scipy says that x
can be any "array-like" structure, so you don't need to pass a pd.Series
, which might run you into issues with missing values, etc. Essentially you can just pass a list of values and calculate their JB test statistic and p-value.然而,关于 Scipy 的Jarque-Bera 测试文档说x
可以是任何“类似数组”的结构,所以你不需要传递pd.Series
,这可能会让你遇到缺失值等问题。本质上你可以只传递一个值列表并计算它们的 JB 检验统计量和 p 值。
So with that, I would modify your function to因此,我会将您的功能修改为
def stat(x):
return pd.Series([x.mean(),
np.sqrt(x.var()),
stats.jarque_bera(x.dropna().values), # drop NaN and get numpy array instead of pd.Series
],
index=['Return',
'Volatility',
'JB P-Value'
])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.