[英]Weighted average, grouped by day, of 2 columns based on criteria in 3rd column of pandas dataframe
I have a pandas dataframe我有一个熊猫数据框
import pandas as pd
df = pd.DataFrame({'Col1' : 16 * ['A', 'B', 'C'],
'Col2' : np.random.rand(48),
'Col3' : np.random.randint(5, 20, 48)},
index = pd.date_range('2017-01-01', periods=48, freq='H'))
In [1]: df.tail()
Out [1]:
Col1 Col2 Col3
2017-01-02 19:00:00 B 0.144572 7
2017-01-02 20:00:00 C 0.740500 11
2017-01-02 21:00:00 A 0.357077 19
2017-01-02 22:00:00 B 0.652536 9
2017-01-02 23:00:00 C 0.022437 8
I want to return a dataframe that displays the weighted average of Col3 by date where Col2 is the weighting and Col1 is either 'B' or 'C' while ignoring 'A.'我想返回一个数据框,按日期显示 Col3 的加权平均值,其中 Col2 是权重,Col1 是“B”或“C”,而忽略“A”。 This would return something that looks like the following.
这将返回如下所示的内容。
WtdAvg
2017-01-01 XX.X
2017-01-02 YY.Y
Filter the DataFrame to remove values where Col1 is 'A', then perform a groupby
using np.average
:过滤数据帧,以除去其中的值是Col1中“A”,然后执行
groupby
使用np.average
:
df[df['Col1'] != 'A'].groupby(pd.TimeGrouper('D')) \
.apply(lambda grp: np.average(grp['Col3'], weights=grp['Col2']))
The resulting output (using np.random.seed([3,1415])
as the random state seed) :结果输出(使用
np.random.seed([3,1415])
作为随机状态种子):
2017-01-01 11.975517
2017-01-02 12.411798
np.random.seed([3,1415])
df = pd.DataFrame({'Col1' : 16 * ['A', 'B', 'C'],
'Col2' : np.random.rand(48),
'Col3' : np.random.randint(5, 20, 48)},
index = pd.date_range('2017-01-01', periods=48, freq='H'))
d1 = df.query('Col1 != "A"').drop('Col1', 1)
d2 = d1.assign(Prod=d1.prod(1)).groupby(pd.TimeGrouper('D'))['Col2', 'Prod'].sum()
d2.Prod.div(d2.Col2)
2017-01-01 11.975517
2017-01-02 12.411798
Freq: D, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.