如何在 pandas 中重新采样 dataframe 并应用加权平均值？

Question

I have a dataframe, indexed by time, with 2 columns: price and quantity .我有一个 dataframe，按时间索引，有 2 列： price和quantity 。

I want to construct a new series that is the weighted average price over 15 minute intervals, weighted by quantity.我想构建一个新系列，它是按数量加权的 15 分钟间隔内的加权平均价格。

Here is my dataframe's head:这是我的数据框的头：

                          price  quantity
ts                                        
2020-06-10 15:56:34+00:00  203.0       400
2020-06-10 15:57:10+00:00  203.0      1300
2020-06-10 15:57:11+00:00  203.0      1100
2020-06-10 15:57:13+00:00  203.0      3000
2020-06-10 15:57:14+00:00  203.0       700

Here is my best attempt:这是我最好的尝试：

def resample_method(x):
    return np.average(x.price, weights=x.quantity)

df.resample("15T").apply(resample_method)

While the above code expresses my intent (I believe), I get the following error:虽然上面的代码表达了我的意图（我相信），但我收到以下错误：

Exception has occurred: AttributeError
'Series' object has no attribute 'price'

Answer 1

So as pointed out by @Scott Boston in a comment, when using resample , both columns are not accessible at the same time.正如@Scott Boston 在评论中指出的那样，使用resample时，不能同时访问两列。 One trick could be to append the column quantity to the index because the index are accessible with each column.一个技巧可能是将 append 列数量到索引，因为每个列都可以访问索引。

# note I used '1T' instead of 15T like you but simple change in the method
dfr = (df.set_index('quantity', append=True)
         .resample('1T', level=0) # the datetime index is the level=0 
         .apply(lambda x: np.average(x, weights=x.index.get_level_values(1))) #quantity is on level=1
      )
print (dfr) #result not really interesting here it works
                           price
ts                              
2020-06-10 15:56:00+00:00  203.0
2020-06-10 15:57:00+00:00  203.0

如何在 pandas 中重新采样 dataframe 并应用加权平均值？

问题描述

1 个解决方案

解决方案1
0 2020-06-11 20:40:58

如何在 pandas 中重新采样 dataframe 并应用加权平均值？

问题描述

1 个解决方案

解决方案1 0 2020-06-11 20:40:58

解决方案1
0 2020-06-11 20:40:58