简体   繁体   中英

drop rows using pandas groupby and filter

I'm trying to drop rows from a df where certain conditions are met. Using below, I'm grouping values using column C . For each unique group, I want to drop ALL rows where A is less than 1 AND B is greater than 100 . This has to occur on the same row though. If I use .any() or .all() , it doesn't return what I want.

df = pd.DataFrame({
    'A' : [1,0,1,0,1,0,0,1,0,1], 
    'B' : [101, 2, 3, 1, 5, 101, 2, 3, 4, 5], 
    'C' : ['d', 'd', 'd', 'd', 'e', 'e', 'e', 'f', 'f',], 
    })

df.groupby(['C']).filter(lambda g: g['A'].lt(1) & g['B'].gt(100))

initial df:

   A    B  C
0  1  101  d # A is not lt 1 so keep all d's
1  0    2  d
2  1    3  d
3  0    1  d
4  1    5  e
5  0  101  e # A is lt 1 and B is gt 100 so drop all e's
6  0    2  e
7  1    3  f
8  0    4  f
9  1    5  f

intended out:

   A    B  C
0  1  101  d
1  0    2  d
2  1    3  d
3  0    1  d
7  1    3  f
8  0    4  f
9  1    5  f

For better performnce get all C values match condition and then filter original column C by Series.isin in boolean indexing with inverted mask:

df1 = df[~df['C'].isin(df.loc[df['A'].lt(1) & df['B'].gt(100), 'C'])]

Another idea is use GroupBy.transform with GroupBy.any for test if match at least one value:

df1 = df[~(df['A'].lt(1) & df['B'].gt(100)).groupby(df['C']).transform('any')]

Your solution is possible with any and not for scalars, if large DataFrame it should be slow:

df1 = df.groupby(['C']).filter(lambda g:not ( g['A'].lt(1) & g['B'].gt(100)).any())

df1 = df.groupby(['C']).filter(lambda g: (g['A'].ge(1) | g['B'].le(100)).all())

print (df1)
   A    B  C
0  1  101  d
1  0    2  d
2  1    3  d
3  0    1  d
7  1    3  f
8  0    4  f
9  1    5  f

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM