I have a pandas-dataframe holding a GROUP, DATE, VALUE and VARIANCE column:
Index GROUP DATE VALUE VARIANCE
1 g1 2015-12-02 10 3.2
2 g1 2015-10-12 9 4.25
3 g1 2013-12-13 8 8
4 g1 2013-12-13 11 8
5 g1 2013-07-15 7 NaN
6 g1 2015-12-02 11 NaN
7 g2 ...
Basically I want to calculate the shifted rolling variance for the VALUE column. So the value for variance for Index 1 is the variance over the values in Index 2-6, etc.
My first approach was to use an expanding windown to calculate the variances and shift the values by 1 but I am not sure whether this is the right approach here. I am happy about every suggestion.
To use pd.expanding
on the VALUE
column you should first flip the dataframe and calculate the expanding variance shifted:
variance = df['VALUE'].iloc[::-1].expanding(
).var().shift().iloc[::-1].rename('VARIANCE')
>> variance
Index
1 3.200000
2 4.250000
3 5.333333
4 8.000000
5 NaN
6 NaN
Name: VARIANCE, dtype: float64
Multiple Groups
Let us create a new df with values for each group:
>> df
GROUP DATE VALUE
Index
1 g1 2015-12-02 10
2 g1 2015-10-12 9
3 g1 2013-12-13 8
4 g1 2013-12-13 11
5 g1 2013-07-15 7
6 g1 2015-12-02 11
1 g2 2015-12-02 10
2 g2 2015-10-12 9
3 g2 2013-12-13 8
4 g2 2013-12-13 11
5 g2 2013-07-15 7
6 g2 2015-12-02 11
For multiple groups you can iterate over the groups and store the results.
variance = []
for name, group in df.groupby('GROUP'):
variance.append(group['VALUE'].iloc[::-1].expanding(
).var().shift().iloc[::-1])
>> df.assign(VARIANCE=pd.concat(variance))
GROUP DATE VALUE VARIANCE
Index
1 g1 2015-12-02 10 3.200000
2 g1 2015-10-12 9 4.250000
3 g1 2013-12-13 8 5.333333
4 g1 2013-12-13 11 8.000000
5 g1 2013-07-15 7 NaN
6 g1 2015-12-02 11 NaN
1 g2 2015-12-02 10 3.200000
2 g2 2015-10-12 9 4.250000
3 g2 2013-12-13 8 5.333333
4 g2 2013-12-13 11 8.000000
5 g2 2013-07-15 7 NaN
6 g2 2015-12-02 11 NaN
For everyone looking into this question. Mabel Villalba's answer pointed me in the right direction. I altered her approach accordingly:
variance_r = df[['GROUP','DATE','VALUE']].sort_values(['GROUP','DATE'])
variance = []
for name, group in variance_r.groupby('GROUP'):
variance.append(
group['VALUE'].expanding().var().shift(1).
where(group['DATE'].shift() != group['DATE'],
group['VALUE'].expanding().var().shift(2)
))
variance_r.assign(VARIANCE = pd.concat(variance))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.