简体   繁体   中英

how to drop entire group when certain condition met in pandas

I'm trying to drop all group of data when the certain condition is met!

import pandas as pd


raw_data = {'regiment': ['51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st', '51st'], 
            'trucks': ['MAZ-7310', 'MAZ-7310', 'MAZ-7310', 'MAZ-7310', 'Tatra 810', 'Tatra 810', 'Tatra 810', 'Tatra 810', 'ZIS-150', 'ZIS-150', 'ZIS-150', 'ZIS-150'],
            'drivers': ['MAZ', 'MAZ', 'IVE', 'IVE', 'MAN', 'MAN', 'MERC', 'TATA', 'TATA', 'MAN', 'REN', 'TATA'],


            'counts': [0,0,1,1,0,0,1,0, 1,2,3,4]}


df = pd.DataFrame(raw_data, columns = ['regiment', 'trucks','drivers','counts']) 

   regiment     trucks drivers  counts
0      51st   MAZ-7310     MAZ       0
1      51st   MAZ-7310     MAZ       0
2      51st   MAZ-7310     IVE       1
3      51st   MAZ-7310     IVE       1
4      51st  Tatra 810     MAN       0
5      51st  Tatra 810     MAN       0
6      51st  Tatra 810    MERC       1
7      51st  Tatra 810    TATA       0
8      51st    ZIS-150    TATA       1
9      51st    ZIS-150     MAN       2
10     51st    ZIS-150     REN       3
11     51st    ZIS-150    TATA       4

I'm trying to drop the MAZ-7310 group when drivers are MAZ and counts == 0

So I followed this post Pandas groupby and filter

df = df.groupby(['regiment','trucks']).filter(lambda x: ~((x['counts'] == 0) & (x['drivers'] == 'MAZ')).all())

but it seems that it does not give me the output that I need.

The expected output

    regiment     trucks drivers  counts
4      51st  Tatra 810     MAN       0
5      51st  Tatra 810     MAN       0
6      51st  Tatra 810    MERC       1
7      51st  Tatra 810    TATA       0
8      51st    ZIS-150    TATA       1
9      51st    ZIS-150     MAN       2
10     51st    ZIS-150     REN       3
11     51st    ZIS-150    TATA       4

How can I get this output ?

thx

First we assign a new column called m which is a boolean for the rows where drivers is MAZ and counts is 0 .

Then we use GroupBy and get all the groups where any m is True .

Then we use boolean indexing to get the opposite with ~

Methods used:

mask = (df.assign(m=(df['drivers'].eq('MAZ') & ~df['counts']))
          .groupby(['regiment','trucks'])['m'].transform('any')
       )

df[~mask]

   regiment     trucks drivers  counts
4      51st  Tatra 810     MAN       0
5      51st  Tatra 810     MAN       0
6      51st  Tatra 810    MERC       1
7      51st  Tatra 810    TATA       0
8      51st    ZIS-150    TATA       1
9      51st    ZIS-150     MAN       2
10     51st    ZIS-150     REN       3
11     51st    ZIS-150    TATA       4

As you desired output, you need to use any instead of all . Therefore, just change all to any in your code

df_final = df.groupby(['regiment','trucks']).filter(lambda x: ~((x['counts'] ==0) 
                                                    & (x['drivers'] == 'MAZ')).any())

Out[234]:
   regiment     trucks drivers  counts
4      51st  Tatra 810     MAN       0
5      51st  Tatra 810     MAN       0
6      51st  Tatra 810    MERC       1
7      51st  Tatra 810    TATA       0
8      51st    ZIS-150    TATA       1
9      51st    ZIS-150     MAN       2
10     51st    ZIS-150     REN       3
11     51st    ZIS-150    TATA       4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM