大小不等组的卡方检验

Question

I'd like to apply chi-square test scipy.stats.chisquare .我想应用卡方测试scipy.stats.chisquare 。 And the total number of observations is different in my groups.并且我的组中的观察总数不同。

import pandas as pd

data={'expected':[20,13,18,21,21,29,45,37,35,32,53,38,25,21,50,62],
      'observed':[19,10,15,14,15,25,25,20,26,38,50,36,30,28,59,49]}

data=pd.DataFrame(data)
print(data.expected.sum())
print(data.observed.sum())

To ignore this is incorrect - right?忽略这一点是不正确的 - 对吧？

Does the default behavior of scipy.stats.chisquare takes this into account? scipy.stats.chisquare的默认行为是否考虑了这一点？ I checked with pen and paper and looks like it doesn't.我用笔和纸检查过，看起来没有。 Is there a parameter for this?有这个参数吗？

from scipy.stats import chisquare
# incorrect since the number of observations is unequal 
chisquare(f_obs=data.observed, f_exp=data.expected)

When I do manual adjustment I get slightly different result.当我进行手动调整时，我得到的结果略有不同。

# adjust actual number of observations
data['obs_prop']=data['observed'].apply(lambda x: x/data['observed'].sum())
data['observed_new']=data['obs_prop']*data['expected'].sum()

# proper way
chisquare(f_obs=data.observed_new, f_exp=data.expected)

Please correct me if I am wrong at some point.如果我在某个时候错了，请纠正我。 Thanks.谢谢。

ps: I tagged R for additional statistical expertise ps：我标记了 R 以获得额外的统计专业知识

Answer 1

Basically this was a different statistical problem - Chi-square test of independence of variables in a contingency table.基本上这是一个不同的统计问题 - 列联表中变量独立性的卡方检验。

from scipy.stats import contingency as cont
chi2, p, dof, exp=cont.chi2_contingency(data)
p

Answer 2

I didn't get the question quite well.我没有很好地理解这个问题。 However, the way I see it is that you can use scipy.stats.chi2_contingency if you want to compute the independence test between two categorical variable.但是，我认为如果您想计算两个分类变量之间的独立性测试，您可以使用scipy.stats.chi2_contingency 。 Also the scipy.stats.chi2_sqaure can be used to compare observed vs expected.此外scipy.stats.chi2_sqaure可用于比较观察到的与预期的。 Here the number of categories should be the same.这里类别的数量应该相同。 Logicaly a category would get a 0 frequency if there is an observed frequecy but the expeceted frequency does not exist and vice-versa.逻辑上，如果存在观察到的频率但预期频率不存在，则类别将获得 0 频率，反之亦然。

Hope this helps希望这可以帮助

大小不等组的卡方检验

问题描述

2 个解决方案

解决方案1
1 2020-01-23 23:29:56

解决方案2
0 2020-02-14 14:26:22

大小不等组的卡方检验

问题描述

2 个解决方案

解决方案1 1 2020-01-23 23:29:56

解决方案2 0 2020-02-14 14:26:22

解决方案1
1 2020-01-23 23:29:56

解决方案2
0 2020-02-14 14:26:22