[英]Weighted Mean as a Column in Pandas
我正在尝试添加一个具有 4 列权重的 4 列加权平均值的列
df = pd.DataFrame.from_dict(dict([('A', [2000, 1000, 2509, 2145]),
('A_Weight', [37, 47, 33, 16]),
('B', [2100, 1500, 2000, 1600]),
('B_weights', [17, 21, 6, 2]),
('C', [2500, 1400, 0, 2300]),
('C_weights', [5, 35, 0, 40]),
('D', [0, 1600, 2100, 2000]),
('D_weights', [0, 32, 10, 5])]))
我希望加权平均值位于名为“WA”的新列中,但每次尝试时都会显示 NaN
所需的 Dataframe 将是具有以下值的新列,例如:
我使用的公式(((A * A_weight)+(B * b_weight)+(C * C_weight)+(D * D_weight)) / sum(all weights)
df['WA'] = [2071.19,1323.70, 2363.20,2214.60 ]
谢谢
一种直接而简单的方法如下:
(由于您的权重列名称的命名不一致,例如,有些带有“s”,有些没有,有些带有大写“W”,有些带有小写“w”,因此不方便对列进行分组,例如按.filter()
)
df['WA'] = ( (df['A'] * df['A_Weight']) + (df['B'] * df['B_weights']) + (df['C'] * df['C_weights']) + (df['D'] * df['D_weights']) ) / (df['A_Weight'] + df['B_weights'] + df['C_weights'] + df['D_weights'])
结果:
print(df)
A A_Weight B B_weights C C_weights D D_weights WA
0 2000 37 2100 17 2500 5 0 0 2071.186441
1 1000 47 1500 21 1400 35 1600 32 1323.703704
2 2509 33 2000 6 0 0 2100 10 2363.204082
3 2145 16 1600 2 2300 40 2000 5 2214.603175
不那么直接的方式:
str.split
按前缀对列进行分组groupby prod
获取按列的产品sum
的乘积的逐行总和。filter
+ sum
以获得“权重”列的总和df['WA'] = (
df.groupby(df.columns.str.split('_').str[0], axis=1).prod().sum(axis=1)
/ df.filter(regex='_[wW]eight(s)?$').sum(axis=1)
)
A A_Weight B B_weights C C_weights D D_weights WA
0 2000 37 2100 17 2500 5 0 0 2071.186441
1 1000 47 1500 21 1400 35 1600 32 1323.703704
2 2509 33 2000 6 0 0 2100 10 2363.204082
3 2145 16 1600 2 2300 40 2000 5 2214.603175
旧问题的另一种选择:
将数据拆分为分子和分母:
numerator = df.filter(regex=r"[A-Z]$")
denominator = df.filter(like='_')
将denominator
转换为 MultiIndex,在使用numerator
计算时会派上用场:
denominator.columns = denominator.columns.str.split('_', expand = True)
将denominator
乘以numerator
,然后将结果之和除以denominator
之和:
outcome = numerator.mul(denominator, level=0, axis=1).sum(1)
outcome = outcome.div(denominator.sum(1))
df.assign(WA = outcome)
A A_Weight B B_weights C C_weights D D_weights WA
0 2000 37 2100 17 2500 5 0 0 2071.186441
1 1000 47 1500 21 1400 35 1600 32 1323.703704
2 2509 33 2000 6 0 0 2100 10 2363.204082
3 2145 16 1600 2 2300 40 2000 5 2214.603175
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.