简体   繁体   English

如何在 pandas 中重新采样 dataframe 并应用加权平均值?

[英]How to resample a dataframe in pandas and apply a weighted average?

I have a dataframe, indexed by time, with 2 columns: price and quantity .我有一个 dataframe,按时间索引,有 2 列: pricequantity

I want to construct a new series that is the weighted average price over 15 minute intervals, weighted by quantity.我想构建一个新系列,它是按数量加权的 15 分钟间隔内的加权平均价格。

Here is my dataframe's head:这是我的数据框的头:

                          price  quantity
ts                                        
2020-06-10 15:56:34+00:00  203.0       400
2020-06-10 15:57:10+00:00  203.0      1300
2020-06-10 15:57:11+00:00  203.0      1100
2020-06-10 15:57:13+00:00  203.0      3000
2020-06-10 15:57:14+00:00  203.0       700

Here is my best attempt:这是我最好的尝试:

def resample_method(x):
    return np.average(x.price, weights=x.quantity)

df.resample("15T").apply(resample_method)

While the above code expresses my intent (I believe), I get the following error:虽然上面的代码表达了我的意图(我相信),但我收到以下错误:

Exception has occurred: AttributeError
'Series' object has no attribute 'price'

So as pointed out by @Scott Boston in a comment, when using resample , both columns are not accessible at the same time.正如@Scott Boston 在评论中指出的那样,使用resample时,不能同时访问两列。 One trick could be to append the column quantity to the index because the index are accessible with each column.一个技巧可能是将 append 列数量到索引,因为每个列都可以访问索引。

# note I used '1T' instead of 15T like you but simple change in the method
dfr = (df.set_index('quantity', append=True)
         .resample('1T', level=0) # the datetime index is the level=0 
         .apply(lambda x: np.average(x, weights=x.index.get_level_values(1))) #quantity is on level=1
      )
print (dfr) #result not really interesting here it works
                           price
ts                              
2020-06-10 15:56:00+00:00  203.0
2020-06-10 15:57:00+00:00  203.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM