简体   繁体   中英

Pandas filter rows based on condition, but always retain the first row

I would like to drop some rows that meets certain conditions but I do not want to drop the first row even if the first row meets that criteria.

I tried dropping rows by using the df.drop function but it will erase the first row if the first row meets that condition. I do not want that.

Data looks something like this:

Column1 Column2 Column3
  1        3      A
  2        1      B
  3        3      C
  4        1      D
  5        1      E
  6        3      F

I want to do it in a way that if a row has a value of 3 in column2 then drop it.

And I want the new data to be like this (after dropping but keeping the first one even though the first row had a value of 3 in column 2):

Column1 Column2 Column3
  1        3      A
  2        1      B
  4        1      D
  5        1      E

You can make "retain first row" a part of your condition for dropping/keeping rows.

The condition for keeping the rows would be <ORIGINAL CONDITION> or <CONDITION TO KEEP FIRST ROW> . In code, this is

# (condition to drop 3) | (condition to keep 0th row)
df[(df['Column2'] != 3) | (df.index == 0)]

   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

Inversely, using DeMorgan's Laws, the condition for dropping rows would be (df['Column2'] == 3) & (df.index != 0) . We then invert the condition to get our expected output,

df[~((df['Column2'] == 3) & (df.index != 0))]

   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

These work assuming your index is a RangeIndex . If not, use pd.RangeIndex(len(df)) == 0 as the second condition instead.

I am using duplicated

df[(~df.Column2.duplicated())|df.Column2.ne(3)]
   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

Already great answers posted here, but just for your convenience. We can also use cumcount for this to rank Nth 3 we find:

df = df[~((df.groupby('Column2').Column2.cumcount() != 0) & (df.Column2 == 3))]

print(df)
   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM