I have a dataframe with multiple columns showing yes or no (1 or 0) values. I need to prepare a matrix of each of those columns against another (similar to a pairplot under seaborn - but I need a matrix instead of plots).
A B C D E F G
0 1 1 0 0 1 0 1
1 1 0 1 0 1 1 0
2 0 1 0 0 1 1 1
and so on...
I did it manually by getting the counts of values from a filtered dataframe using following for each combination: AvsB = df[(df['A'] == 1) & (df['B'] == 1)].count()
And then I create a matrix dataframe using all these values manually by creating a list from these variables.
However, I am looking for a shortcut to do this.
Any function or method that I can use for this?
Appreciating the help.
Edit: Expected output:
A B C D...
A ... ...
B ...
C
D
...
You can create list of DataFrames or create one DataFrame. One approach is itertools.combinations
+ DataFrame.groupby
. In first case you can use .to_numpy()
to get matrix.
from itertools import combinations
l = [df[[*comb]].groupby([*comb]).size().unstack(fill_value=0)
for comb in combinations(df, 2)]
print(l[0])
B 0 1
A
0 0 1
1 1 1
new_df = pd.DataFrame({comb : df[[*comb]].groupby([*comb]).size()
for comb in combinations(df, 2)}).fillna(0)
print(new_df)
A B ... C \
B C D E F G C D E F ... D E F
0 0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 ... 2.0 0.0 1.0
1 1.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 ... 0.0 2.0 1.0
1 0 1.0 1.0 2.0 0.0 1.0 1.0 2.0 2.0 0.0 1.0 ... 1.0 0.0 0.0
1 1.0 1.0 0.0 2.0 1.0 1.0 0.0 0.0 2.0 1.0 ... 0.0 1.0 1.0
D E F
G E F G F G G
0 0 0.0 0.0 1.0 1.0 0.0 0.0 0.0
1 2.0 3.0 2.0 2.0 0.0 0.0 1.0
1 0 1.0 0.0 0.0 0.0 1.0 1.0 1.0
1 0.0 0.0 0.0 0.0 2.0 2.0 1.0
We can see size
A, B for each combination (0, 1), (1, 0) etc.
Detail
list(combinations(df, 2))
[('A', 'B'),
('A', 'C'),
('A', 'D'),
('A', 'E'),
('A', 'F'),
('A', 'G'),
('B', 'C'),
('B', 'D'),
('B', 'E'),
('B', 'F'),
('B', 'G'),
('C', 'D'),
('C', 'E'),
('C', 'F'),
('C', 'G'),
('D', 'E'),
('D', 'F'),
('D', 'G'),
('E', 'F'),
('E', 'G'),
('F', 'G')]
Elegant outlook
from itertools import permutations
new_df = pd.DataFrame({comb : df[[*comb]].groupby([*comb]).size()
for comb in permutations(df, 2)})\
.stack(dropna=False).unstack(level=0).fillna(0).swaplevel().sort_index()
print(new_df)
A B C D E F G
0 1 0 1 0 1 0 1 0 1 0 1 0 1
A 0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0
1 0.0 0.0 1.0 1.0 1.0 1.0 2.0 0.0 0.0 2.0 1.0 1.0 1.0 1.0
B 0 0.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0
1 1.0 1.0 0.0 0.0 2.0 0.0 2.0 0.0 0.0 2.0 1.0 1.0 0.0 2.0
C 0 1.0 1.0 0.0 2.0 0.0 0.0 2.0 0.0 0.0 2.0 1.0 1.0 0.0 2.0
1 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0
D 0 1.0 2.0 1.0 2.0 2.0 1.0 0.0 0.0 0.0 3.0 1.0 2.0 1.0 2.0
1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
E 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 2.0 1.0 2.0 2.0 1.0 3.0 0.0 0.0 0.0 1.0 2.0 1.0 2.0
F 0 0.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
1 1.0 1.0 1.0 1.0 1.0 1.0 2.0 0.0 0.0 2.0 0.0 0.0 1.0 1.0
G 0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0
1 1.0 1.0 0.0 2.0 2.0 0.0 2.0 0.0 0.0 2.0 1.0 1.0 0.0 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.