根据熊猫数据框第 3 列中的条件，按天分组的 2 列的加权平均值

Question

I have a pandas dataframe我有一个熊猫数据框

import pandas as pd
df = pd.DataFrame({'Col1' : 16 * ['A', 'B', 'C'], 
                   'Col2' : np.random.rand(48), 
                   'Col3' : np.random.randint(5, 20, 48)},
                   index = pd.date_range('2017-01-01', periods=48, freq='H'))

In [1]: df.tail()
Out [1]: 
                    Col1      Col2  Col3
2017-01-02 19:00:00    B  0.144572     7
2017-01-02 20:00:00    C  0.740500    11
2017-01-02 21:00:00    A  0.357077    19
2017-01-02 22:00:00    B  0.652536     9
2017-01-02 23:00:00    C  0.022437     8

I want to return a dataframe that displays the weighted average of Col3 by date where Col2 is the weighting and Col1 is either 'B' or 'C' while ignoring 'A.'我想返回一个数据框，按日期显示 Col3 的加权平均值，其中 Col2 是权重，Col1 是“B”或“C”，而忽略“A”。 This would return something that looks like the following.这将返回如下所示的内容。

           WtdAvg
2017-01-01   XX.X
2017-01-02   YY.Y

Answer 1

Filter the DataFrame to remove values where Col1 is 'A', then perform a groupby using np.average :过滤数据帧，以除去其中的值是Col1中“A”，然后执行groupby使用np.average ：

df[df['Col1'] != 'A'].groupby(pd.TimeGrouper('D')) \
                     .apply(lambda grp: np.average(grp['Col3'], weights=grp['Col2']))

The resulting output (using np.random.seed([3,1415]) as the random state seed) :结果输出（使用np.random.seed([3,1415])作为随机状态种子）：

2017-01-01    11.975517
2017-01-02    12.411798

Answer 2

np.random.seed([3,1415])
df = pd.DataFrame({'Col1' : 16 * ['A', 'B', 'C'], 
                   'Col2' : np.random.rand(48), 
                   'Col3' : np.random.randint(5, 20, 48)},
                   index = pd.date_range('2017-01-01', periods=48, freq='H'))


d1 = df.query('Col1 != "A"').drop('Col1', 1)
d2 = d1.assign(Prod=d1.prod(1)).groupby(pd.TimeGrouper('D'))['Col2', 'Prod'].sum()
d2.Prod.div(d2.Col2)

2017-01-01    11.975517
2017-01-02    12.411798
Freq: D, dtype: float64

根据熊猫数据框第 3 列中的条件，按天分组的 2 列的加权平均值

问题描述

2 个解决方案

解决方案1
4 已采纳 2017-01-11 22:46:19

解决方案2
1 2017-01-11 22:59:07

根据熊猫数据框第 3 列中的条件，按天分组的 2 列的加权平均值

问题描述

2 个解决方案

解决方案1 4 已采纳 2017-01-11 22:46:19

解决方案2 1 2017-01-11 22:59:07

解决方案1
4 已采纳 2017-01-11 22:46:19

解决方案2
1 2017-01-11 22:59:07