I would like to compute the 1-year rolling average for each row in this Dataframe test
:
index id date variation
2313 7034 2018-03-14 4.139148e-06
2314 7034 2018-03-13 4.953194e-07
2315 7034 2018-03-12 2.854749e-06
2316 7034 2018-03-09 3.907458e-06
2317 7034 2018-03-08 1.662412e-06
2318 7034 2018-03-07 1.346433e-06
2319 7034 2018-03-06 8.731700e-06
2320 7034 2018-03-05 7.145597e-06
2321 7034 2018-03-02 4.893283e-06
...
For example, I would need to calculate:
7034
between 2018-03-14 and 2017-08-14 7034
between 2018-03-13 and 2017-08-13 I tried:
test.groupby(['id','date'])['variation'].rolling(window=1,freq='Y',on='date').mean()
but I got the error message:
ValueError: invalid on specified as date, must be a column (if DataFrame) or None
How can I use the pandas rolling()
function in this case?
[EDIT 1] [thanks to Sacul]
I tested:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date').groupby('id').rolling(window=1, freq='Y').mean()['variation']
But freq='Y'
doesn't work (I got: ValueError: Invalid frequency: Y
) Then I used window = 365, freq = 'D'
.
But there is another issue: because there are never 365 consecutive dates for each combined id-date
, the result is always empty. Even if there missing dates, I would like to ignore them and consider all dates between the current date and the (current date - 365) to compute the rolling mean. For instance, imagine I have:
index id date variation
2313 7034 2018-03-14 4.139148e-06
2314 7034 2018-03-13 4.953194e-07
2315 7034 2017-03-13 2.854749e-06
Then,
How can I do that?
[EDIT 2]
Finally I used the formula below to calculate rolling median, averages and standard deviation on 1 Year by ignoring missing values:
pd.rolling_median(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)
pd.rolling_mean(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)
pd.rolling_std(df.set_index('date').groupby('id')['variation'],window=365, freq='D',min_periods=1)
I believe this should work for you:
# First make sure that `date` is a datetime object:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date').groupby('id').rolling(window=1, freq='A').mean()['variation']
using pd.DataFrame.rolling
with datetime works well when the date
is the index, which is why I used df.set_index('date')
(as can be seen in one of the documentation's examples )
I can't really test if it works on the year's average on your example dataframe, as there is only one year and only one ID, but it should work.
[EDIT] As pointed out by Mihai-Andrei Dinculescu, freq
is now a deprecated argument. Here is an alternative (and probably more future-proof) way to do what you're looking for:
df.set_index('date').groupby('id')['variation'].resample('A').mean()
You can take a look at the resample
documentation for more details on how this works, and this link regarding the frequency arguments.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.