I have two dataframes, lets called them A & B, which are basically matrices. Both are the same shape, 100 x 350.
Dataframe A has numerical values & dataframe B contains only boolean values.
What I want to do is sum the columns in dataframe A but only where the corresponding element cell in dataframe B is True. Please see the example below.
Dataframe A Dataframe B
'ad' 'bc' 'de' 'ad' 'bc' 'de'
2 3 6 True False True
1 1 3 True True True
4 7 2 False True True
desired output
'ad' 'bc' 'de'
3 8 11
I am currently looping through each column and indexing in & then summing. I imagine there are better ways though to do this?
As you are aggregating, it would make more sense to output a Series.
You just need to mask the False with where
and aggregate the rows per column with sum
out = df1.where(df2).sum().astype(int)
output:
'ad' 3
'bc' 8
'de' 11
dtype: int64
If you really need a DataFrame:
df1.where(df2).sum().astype(int).to_frame().T
output:
'ad' 'bc' 'de'
0 3 8 11
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.