简体   繁体   中英

Represent a p-value obtained from chi square test between multiple columns in the form of a crosstab in Python

I had 10 features in my dataframe. I applied chi square test and generated the p-values for all the column pairs in the dataframe. I want to represent the p-values as a cross-grid of multiple features.

Example : A, B, C are my features and p-values between (A,B) = 0.0001, (A,C) = 0.5, (B,C) = 0.0

So, I want to see this thing as:

      A      B       C
  A   1      0.001   0.5
  B   0.001  1       0.0
  C   0.5    0.0     1

If any other detail needed please let know.

Assuming you have list of features as features = ['A','B','C',...] and p-values as
p_values = {('A','B'):0.0001,('A','C'):0.5,...}

import pandas as pd

p_values = {('A','B'):0.0001,('A','C'):0.5}
features = ['A','B','C']
df = pd.DataFrame(columns=features)

for row in features:
    rowdf = [] # prepare a row for df
    for col in features:
        if row == col:
            rowdf.append(1) # (A,A) taken as 1
            continue
        try:
            rowdf.append(p_values[(row,col)]) # add the value from dictionary
        except KeyError:
            try:
                rowdf.append(p_values[(col, row)]) # look for pair like (B,A) if (A,B) not found
            except KeyError: # still not found, append None
                rowdf.append(None)

    df.loc[len(df)] = rowdf # write row in df


df.index = features # to make row names as A,B,C ...
print(df)


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM