[英]Represent a p-value obtained from chi square test between multiple columns in the form of a crosstab in Python
我的数据框中有 10 个特征。 我应用了卡方检验并为数据框中的所有列对生成了 p 值。 我想将 p 值表示为多个特征的交叉网格。
示例:A、B、C 是我的特征和 (A,B) = 0.0001、(A,C) = 0.5、(B,C) = 0.0 之间的 p 值
所以,我想把这件事看成:
A B C
A 1 0.001 0.5
B 0.001 1 0.0
C 0.5 0.0 1
如果需要任何其他详细信息,请告知。
假设您将特征列表设为features = ['A','B','C',...]
并将 p 值设为p_values = {('A','B'):0.0001,('A','C'):0.5,...}
import pandas as pd
p_values = {('A','B'):0.0001,('A','C'):0.5}
features = ['A','B','C']
df = pd.DataFrame(columns=features)
for row in features:
rowdf = [] # prepare a row for df
for col in features:
if row == col:
rowdf.append(1) # (A,A) taken as 1
continue
try:
rowdf.append(p_values[(row,col)]) # add the value from dictionary
except KeyError:
try:
rowdf.append(p_values[(col, row)]) # look for pair like (B,A) if (A,B) not found
except KeyError: # still not found, append None
rowdf.append(None)
df.loc[len(df)] = rowdf # write row in df
df.index = features # to make row names as A,B,C ...
print(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.