简体   繁体   English

Python:卡方列联检验(如何解释)

[英]Python: chi-squared contingency test (how to interpret)

I've done practicing a chi-squared contingency test as below but i'm having a problem on how to interpret the result.我已经完成了如下的卡方应变测试,但我在如何解释结果方面遇到了问题。 The result of below test says p-val = 0. So does it means that two variables are not independent??下面测试的结果是 p-val = 0。那么这是否意味着两个变量不独立? As it's a small data, I thought it's pretty sure that the variables are independent.由于这是一个小数据,我认为变量是独立的。 And it seems weird the p-val is 0. Did I do something wrong?? p-val 为 0 似乎很奇怪。我做错了吗?

import pandas as pd
df = pd.DataFrame({
    "~60m2" : [54, 577, 143, 782],
    "60~85m2" : [2, 735, 1437, 1],
    "85m2~" : [0, 142, 44, 0],
    })
df.index = ["A", "B", "C", "D"]
df.columns.names = ["size"]
df.index.names = ["city"]

from scipy import stats
stats.chi2_contingency(df)

the output output

(2064.576731417199,
 0.0,
 6,
 array([[ 22.24559612,  31.09522594,   2.65917794],
        [577.59101353, 807.36533061,  69.04365586],
        [645.12228746, 901.76155221,  77.11616033],
        [311.04110288, 434.77789124,  37.18100587]]))

I think it is correct.我认为这是正确的。 Your cities are very different.你们的城市非常不同。 Just try to normalize by row:只需尝试按行标准化:

(df.T / df.sum(axis=1)).T                                             

size     ~60m2   60~85m2     85m2~
city                              
A     0.964286  0.035714  0.000000
B     0.396836  0.505502  0.097662
C     0.088054  0.884852  0.027094
D     0.998723  0.001277  0.000000

each row is very different from the others, so yes cities seems to be different, ie sampled from different population.每一行都与其他行非常不同,所以是的,城市似乎是不同的,即从不同的人口中抽样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM