简体   繁体   中英

sum dataframe based on values from another dataframe

I have two dataframes, lets called them A & B, which are basically matrices. Both are the same shape, 100 x 350.

Dataframe A has numerical values & dataframe B contains only boolean values.

What I want to do is sum the columns in dataframe A but only where the corresponding element cell in dataframe B is True. Please see the example below.

 Dataframe A              Dataframe B
 'ad'  'bc'  'de'         'ad'  'bc'  'de'
  2     3     6            True  False True
  1     1     3            True  True  True
  4     7     2            False True  True

desired output

 'ad'  'bc'  'de'
  3     8     11

I am currently looping through each column and indexing in & then summing. I imagine there are better ways though to do this?

As you are aggregating, it would make more sense to output a Series.

You just need to mask the False with where and aggregate the rows per column with sum

out = df1.where(df2).sum().astype(int)

output:

'ad'     3
'bc'     8
'de'    11
dtype: int64

If you really need a DataFrame:

df1.where(df2).sum().astype(int).to_frame().T

output:

   'ad'  'bc'  'de'
0     3     8    11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM