简体   繁体   中英

trim last rows of a pandas dataframe based on a condition

let's assume a dataframe like this:

idx  x  y
0    a  3
1    b  2
2    c  0
3    d  2
4    e  5

how can I trim the bottom rows, based on a condition, so that any row after the last one matching the condition would be removed?

for example:

with the following condition: y == 0

the output would be

idx  x  y
0    a  3
1    b  2
2    c  0

the condition can happen many times, but the last one is the one that triggers the cut.

you could do, here np.where returns a tuple, so we access the value of the indexes as the first element of the tuple using np.where(df.y == 0) , the first occurence is then returned as the last element of this vector, finaly we add 1 to the index so we can include this index of the last occurence while slicing

df_cond = df.iloc[:np.where(df.y == 0)[0][-1]+1, :]

or you could do :

df_cond  = df[ :df.y.eq(0).cumsum().idxmax()+1 ]

Method 1:

Usng index.max & iloc :

  • index.max to get the last row with condition y==0
  • iloc to slice of the dataframe on the index found with df['y'].eq(0)
idx = df.query('y.eq(0)').index.max()+1 
# idx = df.query('y==0').index.max()+1 -- if pandas < 0.25 

df.iloc[:idx]

Output

   x  y
0  a  3
1  b  2
2  c  0

Method 2:

Using np.where

idx = np.where(df['y'].eq(0), df.index, 0).max()+1
df.iloc[:idx]

Output

   x  y
0  a  3
1  b  2
2  c  0

I would do something like this:

df.iloc[:df['y'].eq(0).idxmax()+1]

Just look for the largest index where your condition is true.

EDIT

So the above code wont work because idxmax() still only takes the first index where the value is true. So we we can do the following to trick it:

df.iloc[:df['y'].eq(0).sort_index(ascending = False).idxmax()+1]

Flip the index, so the last index is the first index that idxmax picks up.

Set up your dataframe:

data = [
    [ 'a',  3],
[ 'b' , 2],
[  'c' , 0],
[  'd',  2],
[ 'e' , 5]
]
df = pd.DataFrame(data, columns=['x',  'y']).reset_index().rename(columns={'index':'idx'}).sort_values('idx')

Then find your cutoff (assuming the idx column is already sorted):

cutoff = df[df['y'] == 0].idx.min()

The df['y'] == 0 is your condition. Then get the min idx that meets that condition and save it as our cutoff.

Finally, create a new dataframe using your cutoff:

df_new = df[df.idx <= cutoff].copy()

Output:

df_new

   idx  x   y
0   0   a   3
1   1   b   2
2   2   c   0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM