简体   繁体   English

正向填充熊猫列不具有最后一个值,但在非null和null元素上具有均值

[英]Forward fill pandas column not with last value, but with mean over non-null and null elements

I experience this a lot in modeling time series. 我在建模时间序列中经历了很多。 Sometimes you may have data reported at different frequencies, say one daily and one weekly. 有时,您可能以不同的频率报告数据,例如每天一次和每周一次。 What I'd like is not to forward fill the weekly data point for every day of the week (since it is usually a sum of all the values of during the week already), but forward fill or replace the data with it's mean. 我不希望提前填充一周中每一天的每周数据点(因为它通常通常已经是一周中所有值的总和),而是向前填充或用平均值代替数据。 In essence, I'd like to spread out the data. 本质上,我想分散数据。

So if I have 所以如果我有

s = pd.Series(index=pd.date_range('2015/1/1', '2015/1/9'), 
             data=[2, np.nan, 6, np.nan, np.nan, 2, np.nan, np.nan, np.nan])

then I'd like to return 那我想回来

2015-01-01     1
2015-01-02     1
2015-01-03     2
2015-01-04     2
2015-01-05     2
2015-01-06   0.5
2015-01-07   0.5
2015-01-08   0.5
2015-01-09   0.5
Freq: D, dtype: float64

Any thoughts on an easy way to do this? 有什么简单的方法可以做到这一点吗? Is a for-loop inescapable? for循环不可避免吗?

Here is one way using .cumcount to separate series into different groups and then transform . 这是使用.cumcount将序列分成不同的组然后进行transform

s.fillna(method='ffill').groupby(s.notnull().cumsum()).transform(lambda g: g/len(g))

2015-01-01    1.0
2015-01-02    1.0
2015-01-03    2.0
2015-01-04    2.0
2015-01-05    2.0
2015-01-06    0.5
2015-01-07    0.5
2015-01-08    0.5
2015-01-09    0.5
Freq: D, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM