简体   繁体   中英

Find max since condition in pandas timeseries dataframe

I'm trying to find the max since condition was true in a pandas dataframe. I've searched for similar questions and read the documentation but haven't been able to find this problem discussed. To illustrate, I want a function that will return the maxsince column below.

In [84]: df
Out[84]: 
                     a      b  maxsince
2007-04-27 11:00:00  1   True         1
2007-04-27 11:30:00  5  False         5
2007-04-27 12:00:00  3  False         5
2007-04-27 12:30:00  2   True         2
2007-04-27 13:00:00  2  False         2
2007-04-27 13:30:00  7   True         7
2007-04-27 14:00:00  3  False         7
2007-04-27 14:30:00  4  False         7

I'm having trouble calculating this without resorting to looping. What would be the most efficient way? Thanks.

How about:

>>> df.groupby(df["b"].cumsum())["a"].cummax()
2007-04-27  11:00:00    1
            11:30:00    5
            12:00:00    5
            12:30:00    2
            13:00:00    2
            13:30:00    7
            14:00:00    7
            14:30:00    7
dtype: int64

This works because

>>> df["b"].cumsum()
2007-04-27  11:00:00    1
            11:30:00    1
            12:00:00    1
            12:30:00    2
            13:00:00    2
            13:30:00    3
            14:00:00    3
            14:30:00    3
Name: b, dtype: int32

gives us a new value whenever we see a True . You might have to patch it a bit depending on what you want to happen when the first value is False , but I'll leave that as an exercise for the reader. ;^)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM