简体   繁体   中英

pandas df filter on rows with any n rows satisfying condition

I have a df with many columns and I want to select all rows that have any 2 column-values nonzero. filtering on rows where specific columns are nonzero is trivial, I mean this is probably trivial as well but it's eluding me. have / want dataframes for reproduceable example below:

import pandas as pd

have = pd.DataFrame({'ID': [1,2,3,4,5],
        'grp1': [1,0,0,0,0],
        'grp2': [0,2,0,0,0],
        'grp3': [1,1,0,0,5]})

want = have.iloc[0:2]
>>> want
   ID  grp1  grp2  grp3
0   1     1     0     1
1   2     0     2     1

last note - I pass back and forth between pandas and sql(ite) a lot, so if there's a more elegant way to do this in sql than pandas that I'm missing, I'm open to that too.

In pandas:

have[have.iloc[:,1:].gt(0).sum(1).gt(1)]
 
   ID  grp1  grp2  grp3
0   1     1     0     1
1   2     0     2     1

IIUC, you need:

have.loc[(have.iloc[:,1:].=0).sum(1)>1] as you said non-zero values.

You can convert the column type to bool then zero will be Fasle and non zero will be True and sum the True to get non zero count.

out = have[have.filter(like='grp').astype(bool).sum(axis=1).ge(2)]
print(out)

   ID  grp1  grp2  grp3
0   1     1     0     1
1   2     0     2     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM