简体   繁体   中英

Weighted Mean as a Column in Pandas

I am trying to add a column with the weighted average of 4 columns with 4 columns of weights

df = pd.DataFrame.from_dict(dict([('A', [2000, 1000, 2509, 2145]),
                                  ('A_Weight', [37, 47, 33, 16]),
                                  ('B', [2100, 1500, 2000, 1600]),
                                  ('B_weights', [17, 21, 6, 2]),
                                  ('C', [2500, 1400, 0, 2300]),
                                  ('C_weights', [5, 35, 0, 40]),
                                  ('D', [0, 1600, 2100, 2000]),
                                  ('D_weights', [0, 32, 10, 5])]))

I want the weighted average to be in a new column named "WA" but every time I try it displays NaN

Desired Dataframe would be a new column with the following values as ex:

Formula I used (((A * A_weight)+(B * b_weight)+(C * C_weight)+(D * D_weight)) / sum(all weights)

df['WA'] = [2071.19,1323.70, 2363.20,2214.60 ]

Thank you

A straight-forward and simple way to do is as follows:

(Since your columns name for the weights are not consistently named, eg some with 's' and some without, some with capital 'W' and some with lower case 'w', it is not convenient to group columns eg by .filter() )

df['WA'] = ( (df['A'] * df['A_Weight']) + (df['B'] * df['B_weights']) + (df['C'] * df['C_weights']) + (df['D'] * df['D_weights']) ) / (df['A_Weight'] + df['B_weights'] + df['C_weights'] + df['D_weights'])

Result:

print(df)


      A  A_Weight     B  B_weights     C  C_weights     D  D_weights           WA
0  2000        37  2100         17  2500          5     0          0  2071.186441
1  1000        47  1500         21  1400         35  1600         32  1323.703704
2  2509        33  2000          6     0          0  2100         10  2363.204082
3  2145        16  1600          2  2300         40  2000          5  2214.603175

The not so straight-forward way:

  1. Group columns by prefix via str.split
  2. get the column-wise product via groupby prod
  3. get the row-wise sum of the products with sum on axis 1.
  4. filter + sum on axis 1 to get sum of "weights" columns
  5. Divide the the group product sums with the weight sums.
df['WA'] = (
        df.groupby(df.columns.str.split('_').str[0], axis=1).prod().sum(axis=1)
        / df.filter(regex='_[wW]eight(s)?$').sum(axis=1)
)
      A  A_Weight     B  B_weights     C  C_weights     D  D_weights           WA
0  2000        37  2100         17  2500          5     0          0  2071.186441
1  1000        47  1500         21  1400         35  1600         32  1323.703704
2  2509        33  2000          6     0          0  2100         10  2363.204082
3  2145        16  1600          2  2300         40  2000          5  2214.603175

Another option to an old question:

Split data into numerator and denominator:

numerator = df.filter(regex=r"[A-Z]$")
denominator = df.filter(like='_')

Convert denominator into a MultiIndex, comes in handy when computing with numerator :

denominator.columns = denominator.columns.str.split('_', expand = True)

Multiply numerator by denominator , and divide the sum of the outcome with the sum of the denominator :

outcome = numerator.mul(denominator, level=0, axis=1).sum(1)
outcome = outcome.div(denominator.sum(1))
df.assign(WA = outcome)

      A  A_Weight     B  B_weights     C  C_weights     D  D_weights           WA
0  2000        37  2100         17  2500          5     0          0  2071.186441
1  1000        47  1500         21  1400         35  1600         32  1323.703704
2  2509        33  2000          6     0          0  2100         10  2363.204082
3  2145        16  1600          2  2300         40  2000          5  2214.603175

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM