简体   繁体   中英

Pandas filter values in two columns and sum?

I have a dataframe as follows:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({'key1' : ['a','a','b','b','a'], 'key2' : ['b', 'b', 'b', 'a', 'b'], 'val' : np.random.randint(10, size=5)})
>>> df
  key1 key2  val
0    a    b    9
1    a    b    8
2    b    b    2
3    b    a    2
4    a    b    1

I am trying to get the total sum of the val column where either key1=='a' or key2=='a'. Here is what I have:

>>> total = (df[(df['key1']=='a') | (df['key2']=='a')]).sum()
>>> total
key1    aaba
key2    bbab
val       20
dtype: object

I have two questions:

  1. How to only get the final value of the sum (ie, here it's 20)
  2. For a case with several columns, is there a more efficient way to do this operation?
  1. Pass only the column you want to calculate sum:
df.loc[(df['key1']=='a') | (df['key2']=='a'), 'val'].sum()
# out
# 20
  1. For several columns:
cols = ['key1','key2']

df.loc[df[cols].eq('a').any(1), 'val'].sum()
# same out
# 20

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM