Summary Matrix from multiple Yes or No Dataframe columns

Question

I have a dataframe with multiple columns showing yes or no (1 or 0) values. I need to prepare a matrix of each of those columns against another (similar to a pairplot under seaborn - but I need a matrix instead of plots).

     A    B    C    D    E    F    G
0    1    1    0    0    1    0    1
1    1    0    1    0    1    1    0
2    0    1    0    0    1    1    1
and so on...

I did it manually by getting the counts of values from a filtered dataframe using following for each combination: AvsB = df[(df['A'] == 1) & (df['B'] == 1)].count()

And then I create a matrix dataframe using all these values manually by creating a list from these variables.

However, I am looking for a shortcut to do this.

Any function or method that I can use for this?

Appreciating the help.

Edit: Expected output:

      A   B   C   D...
A    ...  ...
B    ...
C
D
...

Answer 1

You can create list of DataFrames or create one DataFrame. One approach is itertools.combinations + DataFrame.groupby . In first case you can use .to_numpy() to get matrix.

from itertools import combinations

l = [df[[*comb]].groupby([*comb]).size().unstack(fill_value=0)
     for comb in combinations(df, 2)]

print(l[0])

B  0  1
A      
0  0  1
1  1  1

new_df = pd.DataFrame({comb : df[[*comb]].groupby([*comb]).size() 
                       for comb in combinations(df, 2)}).fillna(0)
print(new_df)

       A                             B                 ...    C            \
       B    C    D    E    F    G    C    D    E    F  ...    D    E    F   
0 0  0.0  1.0  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  ...  2.0  0.0  1.0   
  1  1.0  0.0  0.0  1.0  1.0  1.0  1.0  0.0  1.0  1.0  ...  0.0  2.0  1.0   
1 0  1.0  1.0  2.0  0.0  1.0  1.0  2.0  2.0  0.0  1.0  ...  1.0  0.0  0.0   
  1  1.0  1.0  0.0  2.0  1.0  1.0  0.0  0.0  2.0  1.0  ...  0.0  1.0  1.0   

            D              E         F  
       G    E    F    G    F    G    G  
0 0  0.0  0.0  1.0  1.0  0.0  0.0  0.0  
  1  2.0  3.0  2.0  2.0  0.0  0.0  1.0  
1 0  1.0  0.0  0.0  0.0  1.0  1.0  1.0  
  1  0.0  0.0  0.0  0.0  2.0  2.0  1.0

We can see size A, B for each combination (0, 1), (1, 0) etc.

Detail

list(combinations(df, 2))

[('A', 'B'),
 ('A', 'C'),
 ('A', 'D'),
 ('A', 'E'),
 ('A', 'F'),
 ('A', 'G'),
 ('B', 'C'),
 ('B', 'D'),
 ('B', 'E'),
 ('B', 'F'),
 ('B', 'G'),
 ('C', 'D'),
 ('C', 'E'),
 ('C', 'F'),
 ('C', 'G'),
 ('D', 'E'),
 ('D', 'F'),
 ('D', 'G'),
 ('E', 'F'),
 ('E', 'G'),
 ('F', 'G')]

Elegant outlook

from itertools import permutations


new_df = pd.DataFrame({comb : df[[*comb]].groupby([*comb]).size() 
                       for comb in permutations(df, 2)})\
    .stack(dropna=False).unstack(level=0).fillna(0).swaplevel().sort_index()
print(new_df)

       A         B         C         D         E         F         G     
       0    1    0    1    0    1    0    1    0    1    0    1    0    1
A 0  0.0  0.0  0.0  1.0  1.0  0.0  1.0  0.0  0.0  1.0  0.0  1.0  0.0  1.0
  1  0.0  0.0  1.0  1.0  1.0  1.0  2.0  0.0  0.0  2.0  1.0  1.0  1.0  1.0
B 0  0.0  1.0  0.0  0.0  0.0  1.0  1.0  0.0  0.0  1.0  0.0  1.0  1.0  0.0
  1  1.0  1.0  0.0  0.0  2.0  0.0  2.0  0.0  0.0  2.0  1.0  1.0  0.0  2.0
C 0  1.0  1.0  0.0  2.0  0.0  0.0  2.0  0.0  0.0  2.0  1.0  1.0  0.0  2.0
  1  0.0  1.0  1.0  0.0  0.0  0.0  1.0  0.0  0.0  1.0  0.0  1.0  1.0  0.0
D 0  1.0  2.0  1.0  2.0  2.0  1.0  0.0  0.0  0.0  3.0  1.0  2.0  1.0  2.0
  1  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
E 0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
  1  1.0  2.0  1.0  2.0  2.0  1.0  3.0  0.0  0.0  0.0  1.0  2.0  1.0  2.0
F 0  0.0  1.0  0.0  1.0  1.0  0.0  1.0  0.0  0.0  1.0  0.0  0.0  0.0  1.0
  1  1.0  1.0  1.0  1.0  1.0  1.0  2.0  0.0  0.0  2.0  0.0  0.0  1.0  1.0
G 0  0.0  1.0  1.0  0.0  0.0  1.0  1.0  0.0  0.0  1.0  0.0  1.0  0.0  0.0
  1  1.0  1.0  0.0  2.0  2.0  0.0  2.0  0.0  0.0  2.0  1.0  1.0  0.0  0.0

Summary Matrix from multiple Yes or No Dataframe columns

Question

1 answers

solution1
1 ACCPTED 2021-01-30 11:49:51

Summary Matrix from multiple Yes or No Dataframe columns

Question

1 answers

solution1 1 ACCPTED 2021-01-30 11:49:51

solution1
1 ACCPTED 2021-01-30 11:49:51