简体   繁体   中英

Drop a group of rows if one column has missing data in a pandas dataframe

I have the following dataframe:

df

          Group       Dist
    0     A             5
    1     B             2
    2     A             3
    3     B             1
    4     B             0
    5     A             5

I am trying to drop all rows that match Group if the Dist column equals zero. This works to delete row 4:

df = df[df.Dist != 0]

however I also want to delete rows 1 and 3 so I am left with:

df
          Group       Dist
    0     A             5
    2     A             3
    5     A             5

Any ideas on how to drop the group based off this condition?

Thanks!

First get all Group values for Entry == 0 and then filter out them by check column Group with inverted mask by ~ :

df1 = df[~df['Group'].isin(df.loc[df.Dist == 0, 'Group'])]
print (df1)
  Group   Dist
0     A      5
2     A      3
5     A      5

Or you can use GroupBy.transform with GroupBy.all for test if groups has no 0 values:

df1 = df[(df.Dist != 0).groupby(df['Group']).transform('all')]

EDIT: For remove all groups with missing values:

df2 = df[df['Dist'].notna().groupby(df['Group']).transform('all')]

For test missing values:

print (df[df['Dist'].isna()])

if return nothing there are no missing values NaN or no None like Nonetype.

So is possible check scalar, eg if this value is in row with index 10 :

print (df.loc[10, 'Dist'])
print (type(df.loc[10, 'Dist']))

You can use groupby and the method filter :

df.groupby('Group').filter(lambda x: x['Dist'].ne(0).all())

Output:

  Group  Dist
0     A     5
2     A     3
5     A     5

If you want to filter out groups with missing values:

df.groupby('Group').filter(lambda x: x['Dist'].notna().all())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM