简体   繁体   English

根据熊猫数据框第 3 列中的条件,按天分组的 2 列的加权平均值

[英]Weighted average, grouped by day, of 2 columns based on criteria in 3rd column of pandas dataframe

I have a pandas dataframe我有一个熊猫数据框

import pandas as pd
df = pd.DataFrame({'Col1' : 16 * ['A', 'B', 'C'], 
                   'Col2' : np.random.rand(48), 
                   'Col3' : np.random.randint(5, 20, 48)},
                   index = pd.date_range('2017-01-01', periods=48, freq='H'))

In [1]: df.tail()
Out [1]: 
                    Col1      Col2  Col3
2017-01-02 19:00:00    B  0.144572     7
2017-01-02 20:00:00    C  0.740500    11
2017-01-02 21:00:00    A  0.357077    19
2017-01-02 22:00:00    B  0.652536     9
2017-01-02 23:00:00    C  0.022437     8

I want to return a dataframe that displays the weighted average of Col3 by date where Col2 is the weighting and Col1 is either 'B' or 'C' while ignoring 'A.'我想返回一个数据框,按日期显示 Col3 的加权平均值,其中 Col2 是权重,Col1 是“B”或“C”,而忽略“A”。 This would return something that looks like the following.这将返回如下所示的内容。

           WtdAvg
2017-01-01   XX.X
2017-01-02   YY.Y

Filter the DataFrame to remove values where Col1 is 'A', then perform a groupby using np.average :过滤数据帧,以除去其中的值是Col1中“A”,然后执行groupby使用np.average

df[df['Col1'] != 'A'].groupby(pd.TimeGrouper('D')) \
                     .apply(lambda grp: np.average(grp['Col3'], weights=grp['Col2']))

The resulting output (using np.random.seed([3,1415]) as the random state seed) :结果输出(使用np.random.seed([3,1415])作为随机状态种子):

2017-01-01    11.975517
2017-01-02    12.411798
np.random.seed([3,1415])
df = pd.DataFrame({'Col1' : 16 * ['A', 'B', 'C'], 
                   'Col2' : np.random.rand(48), 
                   'Col3' : np.random.randint(5, 20, 48)},
                   index = pd.date_range('2017-01-01', periods=48, freq='H'))


d1 = df.query('Col1 != "A"').drop('Col1', 1)
d2 = d1.assign(Prod=d1.prod(1)).groupby(pd.TimeGrouper('D'))['Col2', 'Prod'].sum()
d2.Prod.div(d2.Col2)

2017-01-01    11.975517
2017-01-02    12.411798
Freq: D, dtype: float64

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较来自相同 pandas dataframe 的 2 列的值和基于比较的第 3 列的返回值 - comparing values of 2 columns from same pandas dataframe & returning value of 3rd column based on comparison 如何基于两列删除重复数据,从而删除熊猫数据框中第三列中最大的列? - How to remove duplicates based on two columns removing the the largest of 3rd column in pandas dataframe? 使用pandas / dataframe计算基于2列的加权平均值 - Calculate weighted average based on 2 columns using a pandas/dataframe 在Pandas DataFrame中比较2列并填充第3列 - Comparing 2 columns in Pandas DataFrame and populating a 3rd column 通过最小索引号对Pandas数据框组进行排序,然后基于第三列对组中的所有其他列进行重新排序 - Order Pandas dataframe groups by minimum index number, then re-order all other columns within groups based on a 3rd column 权重基于列值的熊猫加权平均值? - Weighted average in pandas with weights based on the value of a column? 根据条件获取 Dataframe 中的最后一条和倒数第三条记录 - Getting the Last and 3rd Last Records in a Dataframe Based on Criteria 将多个熊猫数据框列的分组平均加权平均值作为数据框返回 - Return groupby weighted average for multiple pandas dataframe columns as a dataframe Pandas:如果来自第三列的字符串值,则根据另一列的值创建列 - Pandas : Create columns based on values of another column if string value from 3rd column 如何通过对第三列中的值求和,将前两列中具有相同值的 Pandas Dataframe 的行组合在一起? - How to group together rows of Pandas Dataframe with same values in first 2 columns by summing values in the 3rd column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM