简体   繁体   English

带有时间偏移大熊猫的移动平均线

[英]moving average with time offset pandas

I looking for a vectorized solution to calculating a moving average with a date offset. 我正在寻找一个矢量化解决方案来计算具有日期偏移的移动平均线。 I have an irregularly spaced times series of costs for a product and for each value I would like to calculate the mean of the previous three values, with a date offset of 45 days. 我有一个不规则间隔的产品成本时间序列,对于每个值,我想计算前三个值的平均值,日期偏移为45天。 For example if this were my input dataframe: 例如,如果这是我的输入数据帧:

    In [1]: df
    Out [1]:
        ActCost OrDate
   0    8       2015-01-01
   1    5       2015-02-04
   2    10      2015–02-11
   3    1       2015-02-11
   4    10      2015-03-11
   5    18      2015-03-15
   6    20      2015-05-18
   7    25      2015-05-23
   8    8       2015-06-11
   9    5       2015-10-09
  10    15      2015-11-02
  12    18      2015-12-20

The output would be: 输出将是:

    In[2]: df
    Out[2]:
        ActCost OrDate      EstCost
   0    8       2015-01-01  NaN
   1    5       2015-02-04  NaN
   2    10      2015–02-11  NaN
   3    1       2015-02-11  NaN
   4    10      2015-03-11  NaN
   5    18      2015-03-15  NaN
   6    20      2015-05-18  9.67  # mean(index 3:5)
   7    25      2015-05-23  9.67  # mean(index 3:5)
   8    8       2015-06-11  9.67  # mean(index 3:5) 
   9    5       2015-10-09  17.67 # mean(index 6:8)
  10    15      2015-11-02  17.67 # mean(index 6:8)
  12    18      2015-12-20  12.67 # mean(index 7:9)

My current solution is the following: 我目前的解决方案如下:

    for index, row in df.iterrows():
        orDate=row['OrDate']
        costsLanded = orDate - timedelta(45)
        if costsLanded <= np.min(df.OrDate):
            df.loc[index,'EstCost']=np.nan
            break
        if len(dfID[df.OrDate <= costsLanded]) < 3:
            df.loc[index,'EstCost'] = np.nan
            break
        df.loc[index,'EstCost']=np.mean(df[‘ActShipCost'][df.OrDate <=         
                                           costsLanded].head(3)) 

My code works, but is rather slow, and I have millions of these time series. 我的代码有效,但速度很慢,而且我有数百万个这样的时间序列。 I'm hoping that someone can give me some advice on how to speed this process up. 我希望有人可以给我一些有关如何加快此过程的建议。 I imagine that the best thing to do would be to vectorize the operation, but I'm not sure how to implement that. 我想最好的办法就是对操作进行矢量化处理,但是我不确定如何实现。 Thanks so much for the help!! 非常感谢你的帮助!!

Try something like this: 尝试这样的事情:

#Set up DatetimeIndex (easier to just load in data with index as OrDate)
df = df.set_index('OrDate', drop=True)
df.index = pd.DatetimeIndex(df.index)
df.index.name = 'OrDate'

#Save original timestamps for later
idx = df.index

#Make timeseries with regular daily interval
df = df.resample('d').first()

#Take the moving mean with window size of 45 days
df = df.rolling(window=45, min_periods=0).mean()

#Grab the values for the original timestamp and put the index back
df = df.ix[idx].reset_index()

如果我理解正确,我认为你想要的就是

df.resample('45D').agg('mean')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM