简体   繁体   中英

Taking first and last value in a rolling window

Initial problem statement

Using pandas, I would like to apply function available for resample() but not for rolling().

This works:

df1 = df.resample(to_freq,
                  closed='left',
                  kind='period',
                   ).agg(OrderedDict([('Open', 'first'),
                                      ('Close', 'last'),
                                                        ]))

This doesn't:

df2 = df.rolling(my_indexer).agg(
                 OrderedDict([('Open', 'first'),
                              ('Close', 'last') ]))
>>> AttributeError: 'first' is not a valid function for 'Rolling' object

df3 = df.rolling(my_indexer).agg(
                 OrderedDict([
                              ('Close', 'last') ]))
>>> AttributeError: 'last' is not a valid function for 'Rolling' object

What would be your advice to keep first and last value of a rolling windows to be put into two different columns?

EDIT 1 - with usable input data

import pandas as pd
from random import seed
from random import randint
from collections import OrderedDict

# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')
seed(1)
values = [randint(0,10) for ts in ts_1h]
df = pd.DataFrame({'Values' : values}, index=ts_1h)

# First & last work with resample
resampled_first = df.resample('3H',
                              closed='left',
                              kind='period',
                             ).agg(OrderedDict([('Values', 'first')]))
resampled_last = df.resample('3H',
                             closed='left',
                             kind='period',
                            ).agg(OrderedDict([('Values', 'last')]))

# They don't with rolling
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'first')]))
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'last')]))

Thanks for your help! Bests,

You can use own function to get first or last element in rolling window

rolling_first = df.rolling(3).agg(lambda rows: rows[0])
rolling_last  = df.rolling(3).agg(lambda rows: rows[-1])

Example

import pandas as pd
from random import seed, randint

# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')

seed(1)
values = [randint(0, 10) for ts in ts_1h]

df = pd.DataFrame({'Values' : values}, index=ts_1h)

df['first'] = df['Values'].rolling(3).agg(lambda rows: rows[0])
df['last']  = df['Values'].rolling(3).agg(lambda rows: rows[-1])

print(df)

Result

                          Values  first  last
2020-01-01 00:00:00+00:00       2    NaN   NaN
2020-01-01 01:00:00+00:00       9    NaN   NaN
2020-01-01 02:00:00+00:00       1    2.0   1.0
2020-01-01 03:00:00+00:00       4    9.0   4.0
2020-01-01 04:00:00+00:00       1    1.0   1.0
2020-01-01 05:00:00+00:00       7    4.0   7.0
2020-01-01 06:00:00+00:00       7    1.0   7.0
2020-01-01 07:00:00+00:00       7    7.0   7.0
2020-01-01 08:00:00+00:00      10    7.0  10.0
2020-01-01 09:00:00+00:00       6    7.0   6.0
2020-01-01 10:00:00+00:00       3   10.0   3.0
2020-01-01 11:00:00+00:00       1    6.0   1.0
2020-01-01 12:00:00+00:00       7    3.0   7.0
2020-01-01 13:00:00+00:00       0    1.0   0.0
2020-01-01 14:00:00+00:00       6    7.0   6.0
2020-01-01 15:00:00+00:00       6    0.0   6.0
2020-01-01 16:00:00+00:00       9    6.0   9.0
2020-01-01 17:00:00+00:00       0    6.0   0.0
2020-01-01 18:00:00+00:00       7    9.0   7.0
2020-01-01 19:00:00+00:00       4    0.0   4.0
2020-01-01 20:00:00+00:00       3    7.0   3.0
2020-01-01 21:00:00+00:00       9    4.0   9.0
2020-01-01 22:00:00+00:00       1    3.0   1.0
2020-01-01 23:00:00+00:00       5    9.0   5.0
2020-01-02 00:00:00+00:00       0    1.0   0.0

EDIT:

Using dictionary you have to put directly lambda , not string

result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last':  lambda rows: rows[-1]})
print(result)

The same with own function - you have to put its name, not string with name

def first(rows):
    return rows[0]

def last(rows):
    return rows[-1]

result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)

Example

import pandas as pd
from random import seed, randint

# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')

seed(1)
values = [randint(0, 10) for ts in ts_1h]

df = pd.DataFrame({'Values' : values}, index=ts_1h)

result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last': lambda rows: rows[-1]})
print(result)

def first(rows):
    return rows[0]

def mylast(rows):
    return rows[-1]

result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)

In case anyone else needs to find the difference between the first and last value in a 'rolling-window'. I used this on stock market data and wanted to know the price difference from the beginning to the end of the 'window' so I created a new column which used the current row 'close' value and the 'open' value using .shift() so it is taking the "open" value from 60 rows above.

df[windowColumn] = df["close"] - (df["open"].shift(60))    

I think it's a very quick method for large datasets.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM