带有时间偏移大熊猫的移动平均线

Question

I looking for a vectorized solution to calculating a moving average with a date offset. 我正在寻找一个矢量化解决方案来计算具有日期偏移的移动平均线。 I have an irregularly spaced times series of costs for a product and for each value I would like to calculate the mean of the previous three values, with a date offset of 45 days. 我有一个不规则间隔的产品成本时间序列，对于每个值，我想计算前三个值的平均值，日期偏移为45天。 For example if this were my input dataframe: 例如，如果这是我的输入数据帧：

    In [1]: df
    Out [1]:
        ActCost OrDate
   0    8       2015-01-01
   1    5       2015-02-04
   2    10      2015–02-11
   3    1       2015-02-11
   4    10      2015-03-11
   5    18      2015-03-15
   6    20      2015-05-18
   7    25      2015-05-23
   8    8       2015-06-11
   9    5       2015-10-09
  10    15      2015-11-02
  12    18      2015-12-20

The output would be: 输出将是：

    In[2]: df
    Out[2]:
        ActCost OrDate      EstCost
   0    8       2015-01-01  NaN
   1    5       2015-02-04  NaN
   2    10      2015–02-11  NaN
   3    1       2015-02-11  NaN
   4    10      2015-03-11  NaN
   5    18      2015-03-15  NaN
   6    20      2015-05-18  9.67  # mean(index 3:5)
   7    25      2015-05-23  9.67  # mean(index 3:5)
   8    8       2015-06-11  9.67  # mean(index 3:5) 
   9    5       2015-10-09  17.67 # mean(index 6:8)
  10    15      2015-11-02  17.67 # mean(index 6:8)
  12    18      2015-12-20  12.67 # mean(index 7:9)

My current solution is the following: 我目前的解决方案如下：

    for index, row in df.iterrows():
        orDate=row['OrDate']
        costsLanded = orDate - timedelta(45)
        if costsLanded <= np.min(df.OrDate):
            df.loc[index,'EstCost']=np.nan
            break
        if len(dfID[df.OrDate <= costsLanded]) < 3:
            df.loc[index,'EstCost'] = np.nan
            break
        df.loc[index,'EstCost']=np.mean(df[‘ActShipCost'][df.OrDate <=         
                                           costsLanded].head(3))

My code works, but is rather slow, and I have millions of these time series. 我的代码有效，但速度很慢，而且我有数百万个这样的时间序列。 I'm hoping that someone can give me some advice on how to speed this process up. 我希望有人可以给我一些有关如何加快此过程的建议。 I imagine that the best thing to do would be to vectorize the operation, but I'm not sure how to implement that. 我想最好的办法就是对操作进行矢量化处理，但是我不确定如何实现。 Thanks so much for the help!! 非常感谢你的帮助！！

Answer 1

Try something like this: 尝试这样的事情：

#Set up DatetimeIndex (easier to just load in data with index as OrDate)
df = df.set_index('OrDate', drop=True)
df.index = pd.DatetimeIndex(df.index)
df.index.name = 'OrDate'

#Save original timestamps for later
idx = df.index

#Make timeseries with regular daily interval
df = df.resample('d').first()

#Take the moving mean with window size of 45 days
df = df.rolling(window=45, min_periods=0).mean()

#Grab the values for the original timestamp and put the index back
df = df.ix[idx].reset_index()

Answer 2

如果我理解正确，我认为你想要的就是

df.resample('45D').agg('mean')

带有时间偏移大熊猫的移动平均线

问题描述

2 个解决方案

解决方案1
0 2016-03-15 15:47:01

解决方案2
0 2017-03-02 01:01:34

带有时间偏移大熊猫的移动平均线

问题描述

2 个解决方案

解决方案1 0 2016-03-15 15:47:01

解决方案2 0 2017-03-02 01:01:34

解决方案1
0 2016-03-15 15:47:01

解决方案2
0 2017-03-02 01:01:34