简体   繁体   中英

Most effective method to get the max value from a column based on a timedelta calculated from the current row

I would like to identify the maximum value in a column that occurs within the following X days from the current date.

This is a subselect of the data frame showing the daily values for 2020.

           Date        Data
6780 2020-01-02  323.540009
6781 2020-01-03  321.160004
6782 2020-01-06  320.489990
6783 2020-01-07  323.019989
6784 2020-01-08  322.940002
...         ...         ...
7028 2020-12-24  368.079987
7029 2020-12-28  371.739990 
7030 2020-12-29  373.809998 
7031 2020-12-30  372.339996

I would like to find a way to identify the max value within the following 30 days. eg

           Date        Data         Max  
6780 2020-01-02  323.540009  323.019989  
6781 2020-01-03  321.160004  323.019989  
6782 2020-01-06  320.489990  323.730011  
6783 2020-01-07  323.019989  323.540009  
6784 2020-01-08  322.940002  325.779999  
...         ...         ...         ...  
7028 2020-12-24  368.079987  373.809998  
7029 2020-12-28  371.739990  373.809998  
7030 2020-12-29  373.809998  372.339996  
7031 2020-12-30  372.339996  373.100006  

I tried calculating the start and end dates and storing them in the columns. eg

df['startDate'] = df['Date'] +  pd.to_timedelta(1, unit='d')
df['endDate'] = df['Date'] +  pd.to_timedelta(30, unit='d')

before trying to calculate the max. eg,

df['Max'] = df.loc[(df['Date'] > df['startDate']) & (df['Date'] < df['endDate'])]['Data'].max()

But this results in;

        Date        Data      startDate     endDate  Max
6780 2020-01-02  323.540009  2020-01-03  2020-01-29  NaN
6781 2020-01-03  321.160004  2020-01-04  2020-01-30  NaN
6782 2020-01-06  320.489990  2020-01-07  2020-02-02  NaN
6783 2020-01-07  323.019989  2020-01-08  2020-02-03  NaN
6784 2020-01-08  322.940002  2020-01-09  2020-02-04  NaN
...         ...         ...         ...         ...  ...
7027 2020-12-23  368.279999  2020-12-24  2021-01-19  NaN
7028 2020-12-24  368.079987  2020-12-25  2021-01-20  NaN
7029 2020-12-28  371.739990  2020-12-29  2021-01-24  NaN
7030 2020-12-29  373.809998  2020-12-31  2021-01-26  NaN

If I statically add dates to the loc[] statement, it partially works, filling the max for this static range however this just gives me the same value for every field.

Any help on the correct panda way to achieve this would be appreciated.

Kind Regards

df.rolling can do this if you make the date a datetime object as the axis:

df["Date"] = pd.to_datetime(df.Date)
df.set_index("Date").rolling("2d").max()

output:

                  Data
Date
2020-01-02  323.540009
2020-01-03  323.540009
2020-01-06  320.489990
2020-01-07  323.019989
2020-01-08  323.019989

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM