简体   繁体   中英

Forward fill pandas column not with last value, but with mean over non-null and null elements

I experience this a lot in modeling time series. Sometimes you may have data reported at different frequencies, say one daily and one weekly. What I'd like is not to forward fill the weekly data point for every day of the week (since it is usually a sum of all the values of during the week already), but forward fill or replace the data with it's mean. In essence, I'd like to spread out the data.

So if I have

s = pd.Series(index=pd.date_range('2015/1/1', '2015/1/9'), 
             data=[2, np.nan, 6, np.nan, np.nan, 2, np.nan, np.nan, np.nan])

then I'd like to return

2015-01-01     1
2015-01-02     1
2015-01-03     2
2015-01-04     2
2015-01-05     2
2015-01-06   0.5
2015-01-07   0.5
2015-01-08   0.5
2015-01-09   0.5
Freq: D, dtype: float64

Any thoughts on an easy way to do this? Is a for-loop inescapable?

Here is one way using .cumcount to separate series into different groups and then transform .

s.fillna(method='ffill').groupby(s.notnull().cumsum()).transform(lambda g: g/len(g))

2015-01-01    1.0
2015-01-02    1.0
2015-01-03    2.0
2015-01-04    2.0
2015-01-05    2.0
2015-01-06    0.5
2015-01-07    0.5
2015-01-08    0.5
2015-01-09    0.5
Freq: D, dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM