Weighted Mean as a Column in Pandas

Question

I am trying to add a column with the weighted average of 4 columns with 4 columns of weights

df = pd.DataFrame.from_dict(dict([('A', [2000, 1000, 2509, 2145]),
                                  ('A_Weight', [37, 47, 33, 16]),
                                  ('B', [2100, 1500, 2000, 1600]),
                                  ('B_weights', [17, 21, 6, 2]),
                                  ('C', [2500, 1400, 0, 2300]),
                                  ('C_weights', [5, 35, 0, 40]),
                                  ('D', [0, 1600, 2100, 2000]),
                                  ('D_weights', [0, 32, 10, 5])]))

I want the weighted average to be in a new column named "WA" but every time I try it displays NaN

Desired Dataframe would be a new column with the following values as ex:

Formula I used (((A * A_weight)+(B * b_weight)+(C * C_weight)+(D * D_weight)) / sum(all weights)

df['WA'] = [2071.19,1323.70, 2363.20,2214.60 ]

Thank you

Answer 1

A straight-forward and simple way to do is as follows:

(Since your columns name for the weights are not consistently named, eg some with 's' and some without, some with capital 'W' and some with lower case 'w', it is not convenient to group columns eg by .filter() )

df['WA'] = ( (df['A'] * df['A_Weight']) + (df['B'] * df['B_weights']) + (df['C'] * df['C_weights']) + (df['D'] * df['D_weights']) ) / (df['A_Weight'] + df['B_weights'] + df['C_weights'] + df['D_weights'])

Result:

print(df)


      A  A_Weight     B  B_weights     C  C_weights     D  D_weights           WA
0  2000        37  2100         17  2500          5     0          0  2071.186441
1  1000        47  1500         21  1400         35  1600         32  1323.703704
2  2509        33  2000          6     0          0  2100         10  2363.204082
3  2145        16  1600          2  2300         40  2000          5  2214.603175

Answer 2

The not so straight-forward way:

Group columns by prefix via str.split
get the column-wise product via groupby prod
get the row-wise sum of the products with sum on axis 1.
filter + sum on axis 1 to get sum of "weights" columns
Divide the the group product sums with the weight sums.

df['WA'] = (
        df.groupby(df.columns.str.split('_').str[0], axis=1).prod().sum(axis=1)
        / df.filter(regex='_[wW]eight(s)?$').sum(axis=1)
)

      A  A_Weight     B  B_weights     C  C_weights     D  D_weights           WA
0  2000        37  2100         17  2500          5     0          0  2071.186441
1  1000        47  1500         21  1400         35  1600         32  1323.703704
2  2509        33  2000          6     0          0  2100         10  2363.204082
3  2145        16  1600          2  2300         40  2000          5  2214.603175

Answer 3

Another option to an old question:

Split data into numerator and denominator:

numerator = df.filter(regex=r"[A-Z]$")
denominator = df.filter(like='_')

Convert denominator into a MultiIndex, comes in handy when computing with numerator :

denominator.columns = denominator.columns.str.split('_', expand = True)

Multiply numerator by denominator , and divide the sum of the outcome with the sum of the denominator :

outcome = numerator.mul(denominator, level=0, axis=1).sum(1)
outcome = outcome.div(denominator.sum(1))
df.assign(WA = outcome)

      A  A_Weight     B  B_weights     C  C_weights     D  D_weights           WA
0  2000        37  2100         17  2500          5     0          0  2071.186441
1  1000        47  1500         21  1400         35  1600         32  1323.703704
2  2509        33  2000          6     0          0  2100         10  2363.204082
3  2145        16  1600          2  2300         40  2000          5  2214.603175

Weighted Mean as a Column in Pandas

Question

3 answers

solution1
5 ACCPTED 2021-06-11 19:49:23

solution2
3 2021-06-11 19:55:15

solution3
0 2021-11-14 07:02:40

Weighted Mean as a Column in Pandas

Question

3 answers

solution1 5 ACCPTED 2021-06-11 19:49:23

solution2 3 2021-06-11 19:55:15

solution3 0 2021-11-14 07:02:40

solution1
5 ACCPTED 2021-06-11 19:49:23

solution2
3 2021-06-11 19:55:15

solution3
0 2021-11-14 07:02:40