[英]Python: chi-squared contingency test (how to interpret)
I've done practicing a chi-squared contingency test as below but i'm having a problem on how to interpret the result.我已经完成了如下的卡方应变测试,但我在如何解释结果方面遇到了问题。 The result of below test says p-val = 0. So does it means that two variables are not independent??
下面测试的结果是 p-val = 0。那么这是否意味着两个变量不独立? As it's a small data, I thought it's pretty sure that the variables are independent.
由于这是一个小数据,我认为变量是独立的。 And it seems weird the p-val is 0. Did I do something wrong??
p-val 为 0 似乎很奇怪。我做错了吗?
import pandas as pd
df = pd.DataFrame({
"~60m2" : [54, 577, 143, 782],
"60~85m2" : [2, 735, 1437, 1],
"85m2~" : [0, 142, 44, 0],
})
df.index = ["A", "B", "C", "D"]
df.columns.names = ["size"]
df.index.names = ["city"]
from scipy import stats
stats.chi2_contingency(df)
the output output
(2064.576731417199,
0.0,
6,
array([[ 22.24559612, 31.09522594, 2.65917794],
[577.59101353, 807.36533061, 69.04365586],
[645.12228746, 901.76155221, 77.11616033],
[311.04110288, 434.77789124, 37.18100587]]))
I think it is correct.我认为这是正确的。 Your cities are very different.
你们的城市非常不同。 Just try to normalize by row:
只需尝试按行标准化:
(df.T / df.sum(axis=1)).T
size ~60m2 60~85m2 85m2~
city
A 0.964286 0.035714 0.000000
B 0.396836 0.505502 0.097662
C 0.088054 0.884852 0.027094
D 0.998723 0.001277 0.000000
each row is very different from the others, so yes cities seems to be different, ie sampled from different population.每一行都与其他行非常不同,所以是的,城市似乎是不同的,即从不同的人口中抽样。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.