sum dataframe based on values from another dataframe

Question

I have two dataframes, lets called them A & B, which are basically matrices. Both are the same shape, 100 x 350.

Dataframe A has numerical values & dataframe B contains only boolean values.

What I want to do is sum the columns in dataframe A but only where the corresponding element cell in dataframe B is True. Please see the example below.

 Dataframe A              Dataframe B
 'ad'  'bc'  'de'         'ad'  'bc'  'de'
  2     3     6            True  False True
  1     1     3            True  True  True
  4     7     2            False True  True

desired output

 'ad'  'bc'  'de'
  3     8     11

I am currently looping through each column and indexing in & then summing. I imagine there are better ways though to do this?

Answer 1

As you are aggregating, it would make more sense to output a Series.

You just need to mask the False with where and aggregate the rows per column with sum

out = df1.where(df2).sum().astype(int)

output:

'ad'     3
'bc'     8
'de'    11
dtype: int64

If you really need a DataFrame:

df1.where(df2).sum().astype(int).to_frame().T

output:

   'ad'  'bc'  'de'
0     3     8    11

sum dataframe based on values from another dataframe

Question

1 answers

solution1
-1 ACCPTED 2022-01-28 11:11:21

sum dataframe based on values from another dataframe

Question

1 answers

solution1 -1 ACCPTED 2022-01-28 11:11:21

solution1
-1 ACCPTED 2022-01-28 11:11:21