I am trying to add a column with the weighted average of 4 columns with 4 columns of weights
df = pd.DataFrame.from_dict(dict([('A', [2000, 1000, 2509, 2145]),
('A_Weight', [37, 47, 33, 16]),
('B', [2100, 1500, 2000, 1600]),
('B_weights', [17, 21, 6, 2]),
('C', [2500, 1400, 0, 2300]),
('C_weights', [5, 35, 0, 40]),
('D', [0, 1600, 2100, 2000]),
('D_weights', [0, 32, 10, 5])]))
I want the weighted average to be in a new column named "WA" but every time I try it displays NaN
Desired Dataframe would be a new column with the following values as ex:
Formula I used (((A * A_weight)+(B * b_weight)+(C * C_weight)+(D * D_weight)) / sum(all weights)
df['WA'] = [2071.19,1323.70, 2363.20,2214.60 ]
Thank you
A straight-forward and simple way to do is as follows:
(Since your columns name for the weights are not consistently named, eg some with 's' and some without, some with capital 'W' and some with lower case 'w', it is not convenient to group columns eg by .filter()
)
df['WA'] = ( (df['A'] * df['A_Weight']) + (df['B'] * df['B_weights']) + (df['C'] * df['C_weights']) + (df['D'] * df['D_weights']) ) / (df['A_Weight'] + df['B_weights'] + df['C_weights'] + df['D_weights'])
Result:
print(df)
A A_Weight B B_weights C C_weights D D_weights WA
0 2000 37 2100 17 2500 5 0 0 2071.186441
1 1000 47 1500 21 1400 35 1600 32 1323.703704
2 2509 33 2000 6 0 0 2100 10 2363.204082
3 2145 16 1600 2 2300 40 2000 5 2214.603175
The not so straight-forward way:
str.split
groupby prod
sum
on axis 1.filter
+ sum
on axis 1 to get sum of "weights" columnsdf['WA'] = (
df.groupby(df.columns.str.split('_').str[0], axis=1).prod().sum(axis=1)
/ df.filter(regex='_[wW]eight(s)?$').sum(axis=1)
)
A A_Weight B B_weights C C_weights D D_weights WA
0 2000 37 2100 17 2500 5 0 0 2071.186441
1 1000 47 1500 21 1400 35 1600 32 1323.703704
2 2509 33 2000 6 0 0 2100 10 2363.204082
3 2145 16 1600 2 2300 40 2000 5 2214.603175
Another option to an old question:
Split data into numerator and denominator:
numerator = df.filter(regex=r"[A-Z]$")
denominator = df.filter(like='_')
Convert denominator
into a MultiIndex, comes in handy when computing with numerator
:
denominator.columns = denominator.columns.str.split('_', expand = True)
Multiply numerator
by denominator
, and divide the sum of the outcome with the sum of the denominator
:
outcome = numerator.mul(denominator, level=0, axis=1).sum(1)
outcome = outcome.div(denominator.sum(1))
df.assign(WA = outcome)
A A_Weight B B_weights C C_weights D D_weights WA
0 2000 37 2100 17 2500 5 0 0 2071.186441
1 1000 47 1500 21 1400 35 1600 32 1323.703704
2 2509 33 2000 6 0 0 2100 10 2363.204082
3 2145 16 1600 2 2300 40 2000 5 2214.603175
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.