简体   繁体   中英

weighted average on a column of dataframe with provided weighted rate

    Year    Week_Number DC_Zip  Asin_code
0   2016    1   12206   NaN
1   2016    1   29306   NaN
2   2016    1   33426   NaN
3   2016    1   37206   NaN
4   2016    1   45216   NaN
5   2016    1   60160   NaN
6   2016    1   76110   NaN
7   2016    1   80215   NaN
8   2016    1   84105   NaN
9   2016    1   85034   NaN
10  2016    1   93711   NaN
11  2016    1   98433   NaN
12  2016    2   12206   21.0
13  2016    2   29306   10.0
14  2016    2   33426   11.0
15  2016    2   37206   1.0
16  2016    2   45216   5.0
17  2016    2   60160   7.0
18  2016    2   76110   12.0
19  2016    2   80215   NaN
20  2016    2   84105   2.0
21  2016    2   85034   1.0
22  2016    2   93711   23.0
23  2016    2   98433   7.0
24  2016    3   12206   95.0
25  2016    3   29306   26.0
26  2016    3   33426   51.0
27  2016    3   37206   18.0
28  2016    3   45216   34.0
29  2016    3   60160   30.0
... ... ... ... ...
2778    2020    29  76110   33.0
2779    2020    29  80215   5.0
2780    2020    29  84105   3.0
2781    2020    29  85034   8.0
2782    2020    29  93711   53.0
2783    2020    29  98433   15.0
2784    2020    30  12206   75.0
2785    2020    30  29306   27.0
2786    2020    30  33426   34.0
2787    2020    30  37206   12.0
2788    2020    30  45216   14.0
2789    2020    30  60160   28.0
2790    2020    30  76110   47.0
2791    2020    30  80215   11.0
2792    2020    30  84105   3.0
2793    2020    30  85034   17.0
2794    2020    30  93711   62.0
2795    2020    30  98433   13.0
2796    2020    31  12206   109.0
2797    2020    31  29306   30.0
2798    2020    31  33426   31.0
2799    2020    31  37206   14.0
2800    2020    31  45216   23.0
2801    2020    31  60160   21.0
2802    2020    31  76110   25.0
2803    2020    31  80215   7.0
2804    2020    31  84105   4.0
2805    2020    31  85034   8.0
2806    2020    31  93711   71.0
2807    2020    31  98433   9.0
2808 rows × 4 columns

This is the sales data I am dealing with. I have to perform a weighted average on Asin_code with weighted rate = [5, 5, 20, 30, 40] on respective years 2016, 2017, 2018, 2019 and 2020. I have to create a function so that it will give me a column containing the weighted average of Asin_code ."Nan" values should be dropped. We should also change the weighted rate in the future to view more patterns with the data. Any help would be appreciated.

i am trying the following code:

for i in range(len(df.Asin_code)):          
 df["Weighted_avg"]=rate[0]*df.Asin_code[i]/df.Asin_code.loc[(df.Year==2016)].sum()

just facing difficulties in consolidating the data for whole 5 years.

It becomes much simpler it you define your weights as a dict instead of a list then a simple use of apply() works

# define weights for year as a dict
wr = {2016:5, 2017:5, 2018:20, 2019:30, 2020:40}

df["Weighted_avg"] = df.apply(lambda r: 
         # numerator is weight * Asin_code[i]
         ( r["Asin_code"] * wr[r["Year"]]
            /
         # denomimator sum(Asin_code for year)
         df.Asin_code.loc[(df.Year==r["Year"])].sum() ), axis=1)

output

  Idx  Year  Week_Number DC_Zip  Asin_code  Weighted_avg
   25  2016            3  29306       26.0      0.367232
   26  2016            3  33426       51.0      0.720339
   27  2016            3  37206       18.0      0.254237
   28  2016            3  45216       34.0      0.480226
   29  2016            3  60160       30.0      0.423729
 2778  2020           29  76110       33.0      1.625616
 2779  2020           29  80215        5.0      0.246305
 2780  2020           29  84105        3.0      0.147783
 2781  2020           29  85034        8.0      0.394089
 2782  2020           29  93711       53.0      2.610837

suplementary update

Updated request: weighted_average[at index 1]=rate[for year 2016]*Asin_code[at first index of 2016]+rate[for year 2017]*Asin_code[at first index of 2017]+rate[for year 2018]*Asin_code[at first index of 2018]+rate[for year 2019]*Asin_code[at first index of 2019]+rate[for year 2020]*Asin_code[at first index of 2020]

df.dropna().groupby("Year").agg({"Asin_code":"first"}).reset_index()\
    .assign(wa=lambda dfa: 
            dfa.apply(lambda r: r["Asin_code"]*wr[r['Year']],axis=1))["wa"].sum()
df["Weighted_avg"] = df.apply(lambda r: ( (r["Asin_code"] *wr[r["Year"]]).sum(axis = 0)), axis=1)

Output

12  2016    2   12206   21.0    105.0
13  2016    2   29306   10.0    50.0
14  2016    2   33426   11.0    55.0
15  2016    2   37206   1.0     5.0
16  2016    2   45216   5.0     25.0
17  2016    2   60160   7.0     35.0
18  2016    2   76110   12.0    60.0
19  2016    2   80215   NaN     NaN
20  2016    2   84105   2.0     10.0
21  2016    2   85034   1.0     5.0
22  2016    2   93711   23.0    115.0
23  2016    2   98433   7.0     35.0
24  2016    3   12206   95.0    475.0
25  2016    3   29306   26.0    130.0
26  2016    3   33426   51.0    255.0
27  2016    3   37206   18.0    90.0
28  2016    3   45216   34.0    170.0
29  2016    3   60160   30.0    150.0
... ... ... ... ... ...
2778    2020    29  76110   33.0    1320.0
2779    2020    29  80215   5.0     200.0
2780    2020    29  84105   3.0     120.0
2781    2020    29  85034   8.0     320.0
2782    2020    29  93711   53.0    2120.0
2783    2020    29  98433   15.0    600.0
2784    2020    30  12206   75.0    3000.0
2785    2020    30  29306   27.0    1080.0
2786    2020    30  33426   34.0    1360.0
2787    2020    30  37206   12.0    480.0
2788    2020    30  45216   14.0    560.0
2789    2020    30  60160   28.0    1120.0
2790    2020    30  76110   47.0    1880.0
2791    2020    30  80215   11.0    440.0
2792    2020    30  84105   3.0     120.0
2793    2020    30  85034   17.0    680.0
2794    2020    30  93711   62.0    2480.0
2795    2020    30  98433   13.0    520.0
2796    2020    31  12206   109.0   4360.0
2797    2020    31  29306   30.0    1200.0
2798    2020    31  33426   31.0    1240.0
2799    2020    31  37206   14.0    560.0
2800    2020    31  45216   23.0    920.0
2801    2020    31  60160   21.0    840.0
2802    2020    31  76110   25.0    1000.0
2803    2020    31  80215   7.0     280.0
2804    2020    31  84105   4.0     160.0
2805    2020    31  85034   8.0     320.0
2806    2020    31  93711   71.0    2840.0
2807    2020    31  98433   9.0     360.0

Got my solution with this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM