I have a dataframe which has a MultiIndex where the last column of the index is a date. I am trying to make a rolling operation on the columns with a specific frequency. As I understand it, the usual pandas approach if I had a TimeIndex would be to call the rolling function with a string of the frequency (for example '2D' if I wanted the window to be two days). Yet another approach suggested is to resample the TimeIndex and then apply rolling function with integer 2. Essentially what I want to be able to do is group by all the columns except for the last one and then tell the rolling column to use the last column for timedelta-specific rolling. Below is an example to demonstrate this:
from datetime import datetime
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([
("A", datetime(2017, 1, 1)),
("A", datetime(2017, 1, 2)),
("A", datetime(2017, 1, 3)),
("A", datetime(2017, 1, 4)),
("B", datetime(2017, 1, 1)),
("B", datetime(2017, 1, 3)),
("B", datetime(2017, 1, 4))])
df = pd.DataFrame(index=multi_index, data={"colA": [1, 1, 1, 1, 1, 1, 1]})
display(df)
df.groupby([df.index.get_level_values(0), pd.Grouper(freq="1D", level=-1)]).sum().rolling(2).sum
The above code does not create a row for (B, datetime(2017, 1, 2)) and so the rolling sums will be all two.
One ugly way to get around this, which really only works if there is a group which has all the days is to unstack, fillna and stack before rolling:
df.groupby([df.index.get_level_values(0), pd.Grouper(freq="1D", level=-1)]
).sum().unstack().fillna(0).stack().rolling(2).sum()
Needless to say this is an ugly hack, slow and error-prone. Is there a nice way achieve what I need here without extensive manipulation? Ideally some way to tell the grouper to take the timestamp column or fill missing values itself?
You can use groupby
+ resample
+ fillna
- need version pandas 0.19.0 :
multi_index = pd.MultiIndex.from_tuples([
("A", datetime(2017, 1, 1)),
("A", datetime(2017, 1, 2)),
("A", datetime(2017, 1, 3)),
("A", datetime(2017, 1, 4)),
("B", datetime(2017, 1, 1)),
("B", datetime(2017, 1, 3)),
("B", datetime(2017, 1, 4))])
df = pd.DataFrame(index=multi_index, data={"colA": [1, 2, 3, 4, 1, 2, 3]})
print (df)
colA
A 2017-01-01 1
2017-01-02 2
2017-01-03 3
2017-01-04 4
B 2017-01-01 1
2017-01-03 2
2017-01-04 3
b = df.groupby(level=0).resample('1D', level=1).sum().fillna(0).rolling(2).sum()
print (b)
colA
A 2017-01-01 NaN
2017-01-02 3.0
2017-01-03 5.0
2017-01-04 7.0
B 2017-01-01 5.0
2017-01-02 1.0
2017-01-03 2.0
2017-01-04 5.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.