简体   繁体   中英

Pandas - Keep largest value comparing to next row

Consider a simple dataframe:

df = pd.DataFrame([2,3,4,3,4,4,2,3,4,5,3,4], columns=['len'])

I only want to keep a value if the next row is larger or equal than the current one.

The expected output would be:

    len
2   4
4   4
5   4
9   5
11  4

However, I tried with:

df[(df.len.ge(df.len.shift(-1)))]

But it removes the last row. How can I fix this?

Try via fill_value parameter in shift() method:

m=df['len'].ge(df['len'].shift(-1,fill_value=df['len'].iloc[-1]))
#you can also use fill_value=df.loc[len(df)-1,'len']
#Finally:
df[m]
#OR
df.loc[m]

output of above code:

    len
2   4
4   4
5   4
9   5
11  4

Note: If you don't want to use fill_na parameter in shift() method then you can also use fillna() method to achieve the same

I think it works.

>>> df = pd.DataFrame([2,3,4,3,4,4,2,3,4,5,3,4], columns=['len'])
>>> df[df >= df.shift(-1, fill_value=0)].dropna()
len
2   4.0
4   4.0
5   4.0
9   5.0
11  4.0

You could try this as well:

import pandas as pd

df = pd.DataFrame([2,3,4,3,4,4,2,3,4,5,3,4], columns=['len'])

df[df.len.ge(df.len.shift(-1).fillna(df["len"].iat[-1]))]

Use diff to compare rows and then check where it is <=0 for form the Boolean mask.

Use True as the fill_value as you want to also include the last row (which is the only NaN with all valid 'len' values)

df[df['len'].diff().le(0).shift(-1, fill_value=True)]

    len
2     4
4     4
5     4
9     5
11    4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM