简体   繁体   中英

Pandas variance over rolling groups

I have a large df at hand that looks like the following example, but with many more PERMNOs per day.I would like to apply a rolling two-period variance of daily returns for each PERMNO.

I know how to do this for each single period:

df['Monthly Variance'] = df.groupby(['PERMNO', 'Period'])['RET'].var()

But how do I do this for rolling periods? Eg every row with period 2019-05 should include the variance of all daily returns included in 2019-05 and 2019-04.

Data:

    date        Period  PERMNO  RET         SPREAD
0   2019-03-19  2019-03 93436   -0.007496   0.037349
1   2019-03-29  2019-03 93436   0.004450    0.020619
2   2019-04-10  2019-04 93436   0.013771    0.020109
3   2019-04-23  2019-04 93436   0.004377    0.038514
4   2019-05-03  2019-05 93436   0.044777    0.053883
5   2019-05-15  2019-05 93436   -0.001550   0.031920
6   2019-05-28  2019-05 93436   -0.010124   0.038062
7   2019-06-07  2019-06 93436   -0.007041   0.036093
8   2019-06-19  2019-06 93436   0.007520    0.030354
9   2019-07-01  2019-07 93436   0.016602    0.030137
10  2019-07-12  2019-07 93436   0.027158    0.023654
11  2019-07-24  2019-07 93436   0.018104    0.030640
12  2019-08-05  2019-08 93436   -0.025689   0.024769
13  2019-08-15  2019-08 93436   -0.018122   0.047317
14  2019-08-27  2019-08 93436   -0.004279   0.031929
15  2019-09-09  2019-09 93436   0.019081    0.019762
16  2019-09-19  2019-09 93436   0.012773    0.012661
17  2019-10-01  2019-10 93436   0.015859    0.028520
18  2019-10-11  2019-10 93436   0.012871    0.017301
19  2019-10-23  2019-10 93436   -0.003521   0.019057
20  2019-11-04  2019-11 93436   0.013278    0.041001
21  2019-11-14  2019-11 93436   0.009361    0.031874
22  2019-11-26  2019-11 93436   -0.022061   0.025680
23  2019-12-09  2019-12 93436   0.010837    0.027964
24  2019-12-19  2019-12 93436   0.027699    0.026103

import pandas as pd
from pandas import Timestamp

d = {'date': {0: Timestamp('2019-03-19 00:00:00'),
    1: Timestamp('2019-03-29 00:00:00'),
    2: Timestamp('2019-04-10 00:00:00'),
    3: Timestamp('2019-04-23 00:00:00'),
    4: Timestamp('2019-05-03 00:00:00'),
    5: Timestamp('2019-05-15 00:00:00'),
    6: Timestamp('2019-05-28 00:00:00'),
    7: Timestamp('2019-06-07 00:00:00'),
    8: Timestamp('2019-06-19 00:00:00'),
    9: Timestamp('2019-07-01 00:00:00'),
    10: Timestamp('2019-07-12 00:00:00'),
    11: Timestamp('2019-07-24 00:00:00'),
    12: Timestamp('2019-08-05 00:00:00'),
    13: Timestamp('2019-08-15 00:00:00'),
    14: Timestamp('2019-08-27 00:00:00'),
    15: Timestamp('2019-09-09 00:00:00'),
    16: Timestamp('2019-09-19 00:00:00'),
    17: Timestamp('2019-10-01 00:00:00'),
    18: Timestamp('2019-10-11 00:00:00'),
    19: Timestamp('2019-10-23 00:00:00'),
    20: Timestamp('2019-11-04 00:00:00'),
    21: Timestamp('2019-11-14 00:00:00'),
    22: Timestamp('2019-11-26 00:00:00'),
    23: Timestamp('2019-12-09 00:00:00'),
    24: Timestamp('2019-12-19 00:00:00')},
    'Period': {0: Period('2019-03', 'M'),
    1: Period('2019-03', 'M'),
    2: Period('2019-04', 'M'),
    3: Period('2019-04', 'M'),
    4: Period('2019-05', 'M'),
    5: Period('2019-05', 'M'),
    6: Period('2019-05', 'M'),
    7: Period('2019-06', 'M'),
    8: Period('2019-06', 'M'),
    9: Period('2019-07', 'M'),
    10: Period('2019-07', 'M'),
    11: Period('2019-07', 'M'),
    12: Period('2019-08', 'M'),
    13: Period('2019-08', 'M'),
    14: Period('2019-08', 'M'),
    15: Period('2019-09', 'M'),
    16: Period('2019-09', 'M'),
    17: Period('2019-10', 'M'),
    18: Period('2019-10', 'M'),
    19: Period('2019-10', 'M'),
    20: Period('2019-11', 'M'),
    21: Period('2019-11', 'M'),
    22: Period('2019-11', 'M'),
    23: Period('2019-12', 'M'),
    24: Period('2019-12', 'M')},
    'PERMNO': {0: 93436,
    1: 93436,
    2: 93436,
    3: 93436,
    4: 93436,
    5: 93436,
    6: 93436,
    7: 93436,
    8: 93436,
    9: 93436,
    10: 93436,
    11: 93436,
    12: 93436,
    13: 93436,
    14: 93436,
    15: 93436,
    16: 93436,
    17: 93436,
    18: 93436,
    19: 93436,
    20: 93436,
    21: 93436,
    22: 93436,
    23: 93436,
    24: 93436},
    'RET': {0: -0.007496,
    1: 0.00445,
    2: 0.013771,
    3: 0.004377,
    4: 0.044777,
    5: -0.00155,
    6: -0.010124,
    7: -0.007041,
    8: 0.00752,
    9: 0.016602,
    10: 0.027158,
    11: 0.018104,
    12: -0.025689,
    13: -0.018122,
    14: -0.004279,
    15: 0.019081,
    16: 0.012773,
    17: 0.015859,
    18: 0.012871,
    19: -0.003521,
    20: 0.013278,
    21: 0.009361,
    22: -0.022061,
    23: 0.010837,
    24: 0.027699},
    'SPREAD': {0: 0.03734912462419806,
    1: 0.02061930783242268,
    2: 0.02010868822370299,
    3: 0.03851421309872922,
    4: 0.053883031997904014,
    5: 0.031920088790233066,
    6: 0.038062228476857696,
    7: 0.03609261156529571,
    8: 0.030353750113091504,
    9: 0.030137440339402532,
    10: 0.02365353870704016,
    11: 0.030639552742658626,
    12: 0.024769351113690646,
    13: 0.04731741904986996,
    14: 0.031929443946611374,
    15: 0.019761767656938437,
    16: 0.012661329848064019,
    17: 0.028520051854639707,
    18: 0.017300757667841702,
    19: 0.01905709094660478,
    20: 0.04100106573753255,
    21: 0.03187425271937228,
    22: 0.025680188759395033,
    23: 0.027963531931584486,
    24: 0.026103430012610333}}
    
df = pd.DataFrame(d)

If you use pandas resampling it works. Note you need to define a column that meets requirements for resampling. This effectively makes Period column redundant. You can also look into rollup() as well. I've done an example of this as well.

df["ts"] = pd.to_datetime(df.date, unit="ms", utc=True)
df["Monthly Variance"] = df.groupby(["PERMNO"]).resample("M", on="ts")["RET"].transform("var")
df["Bi-Monthly Variance"] = df.groupby(["PERMNO"]).resample("2M", on="ts")["RET"].transform("var")
df["Quarterly Variance"] = df.groupby(["PERMNO"]).resample("Q", on="ts")["RET"].transform("var")
df["Yearly Variance"] = df.groupby(["PERMNO"]).resample("Y", on="ts")["RET"].transform("var")
df["Rolling Variance"] = df.rolling(10,on="ts")["RET"].var()

Only calculate latest data rather than whole data frame

dfsub = df[df["ts"]>=pd.to_datetime(Timestamp('2019-08-01 00:00:00'), unit="ms", utc=True)].copy()
df.loc[dfsub.index,"Bi-Monthly Variance"] = 0
df.loc[dfsub.index,"Bi-Monthly Variance"] = df.loc[dfsub.index,].groupby(["PERMNO"]).resample("2M", on="ts").transform("var")["RET"]
    date        Period  PERMNO  RET         SPREAD      ts                       Monthly Variance   Bi-Monthly Variance Quarterly Variance  Yearly Variance Rolling Variance
0   2019-03-19  2019-03 93436   -0.007496   0.037349    2019-03-19 00:00:00+00:00   0.000071    0.000071    0.000071    0.000268    NaN
1   2019-03-29  2019-03 93436   0.004450    0.020619    2019-03-29 00:00:00+00:00   0.000071    0.000071    0.000071    0.000268    NaN
2   2019-04-10  2019-04 93436   0.013771    0.020109    2019-04-10 00:00:00+00:00   0.000044    0.000448    0.000340    0.000268    NaN
3   2019-04-23  2019-04 93436   0.004377    0.038514    2019-04-23 00:00:00+00:00   0.000044    0.000448    0.000340    0.000268    NaN
4   2019-05-03  2019-05 93436   0.044777    0.053883    2019-05-03 00:00:00+00:00   0.000872    0.000448    0.000340    0.000268    NaN
5   2019-05-15  2019-05 93436   -0.001550   0.031920    2019-05-15 00:00:00+00:00   0.000872    0.000448    0.000340    0.000268    NaN
6   2019-05-28  2019-05 93436   -0.010124   0.038062    2019-05-28 00:00:00+00:00   0.000872    0.000448    0.000340    0.000268    NaN
7   2019-06-07  2019-06 93436   -0.007041   0.036093    2019-06-07 00:00:00+00:00   0.000106    0.000167    0.000340    0.000268    NaN
8   2019-06-19  2019-06 93436   0.007520    0.030354    2019-06-19 00:00:00+00:00   0.000106    0.000167    0.000340    0.000268    NaN
9   2019-07-01  2019-07 93436   0.016602    0.030137    2019-07-01 00:00:00+00:00   0.000033    0.000167    0.000374    0.000268    0.000261
10  2019-07-12  2019-07 93436   0.027158    0.023654    2019-07-12 00:00:00+00:00   0.000033    0.000167    0.000374    0.000268    0.000273
11  2019-07-24  2019-07 93436   0.018104    0.030640    2019-07-24 00:00:00+00:00   0.000033    0.000167    0.000374    0.000268    0.000275
12  2019-08-05  2019-08 93436   -0.025689   0.024769    2019-08-05 00:00:00+00:00   0.000118    0.000370    0.000374    0.000268    0.000410
13  2019-08-15  2019-08 93436   -0.018122   0.047317    2019-08-15 00:00:00+00:00   0.000118    0.000370    0.000374    0.000268    0.000475
14  2019-08-27  2019-08 93436   -0.004279   0.031929    2019-08-27 00:00:00+00:00   0.000118    0.000370    0.000374    0.000268    0.000284
15  2019-09-09  2019-09 93436   0.019081    0.019762    2019-09-09 00:00:00+00:00   0.000020    0.000370    0.000374    0.000268    0.000318
16  2019-09-19  2019-09 93436   0.012773    0.012661    2019-09-19 00:00:00+00:00   0.000020    0.000370    0.000374    0.000268    0.000308
17  2019-10-01  2019-10 93436   0.015859    0.028520    2019-10-01 00:00:00+00:00   0.000109    0.000214    0.000221    0.000268    0.000301
18  2019-10-11  2019-10 93436   0.012871    0.017301    2019-10-11 00:00:00+00:00   0.000109    0.000214    0.000221    0.000268    0.000304
19  2019-10-23  2019-10 93436   -0.003521   0.019057    2019-10-23 00:00:00+00:00   0.000109    0.000214    0.000221    0.000268    0.000304
20  2019-11-04  2019-11 93436   0.013278    0.041001    2019-11-04 00:00:00+00:00   0.000375    0.000214    0.000221    0.000268    0.000256
21  2019-11-14  2019-11 93436   0.009361    0.031874    2019-11-14 00:00:00+00:00   0.000375    0.000214    0.000221    0.000268    0.000236
22  2019-11-26  2019-11 93436   -0.022061   0.025680    2019-11-26 00:00:00+00:00   0.000375    0.000214    0.000221    0.000268    0.000214
23  2019-12-09  2019-12 93436   0.010837    0.027964    2019-12-09 00:00:00+00:00   0.000142    0.000142    0.000221    0.000268    0.000159
24  2019-12-19  2019-12 93436   0.027699    0.026103    2019-12-19 00:00:00+00:00   0.000142    0.000142    0.000221    0.000268    0.000185

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM