简体   繁体   中英

need to calculate weighted average across two levels

*****i have 4 million observations. i have a variable X which has 100 unique observations with 5 digit codes assigned to it(ex 10420, 49660, 52739,65231,etc.) and arranged in ascending order. in each of these unique X's, i have variable Y which has codes from 1-4. now i need to calculate weighted average by each codes of Y and also weighted average by each codes of Y across X. So there are two stages: inner stage is to calculate weighted average by each codes of Y for each X's and outer stage is to calculate weighted average by each codes of Y for overall X**.***

X      wrkallmt  medtt_all  Y
-----  --------  ---------  --
10420  15        25.4       1
10420  50        5.7        4
10420  70        5.9        3
10420  110       5.9        4
10420  110       15         2
12630  30        15.8       4
12630  15        10.9       2
12630  85        10.5       2
12630  20        20.7       1
12630  20        23.5       3
13560  20        20.3       3
13560  20        20.8       3
13560  30        20.9       4
13560  20        25.5       2
13560  25        30.6       4
13560  20        45.3       1
13560  20        40.4       4
13560  20        30.9       1
13560  10        10.8       2

inner stage: for each X's suppose x = 10420 weighted average for code 1 is (15*25.4)/15 , code 4 is ((50*5.7)+(110*5.9))/(50+110) . outer stage: weighted average for code 1 is (weithtedavg of code 1 in 10420 + weightedavg of code 1 in 12630 + weightedavg of code 1 in 13560)/(sum of wrkallmt in code 1 across x)


I can only think of using nested DO-LOOP. i figured out the inner stage but cannot construct the outer stage:

data WORK.exp3;
   SET WORK.ABC;
   do X = ??;
      A = 0;
      B = 0;
      wgtavg_all = 0;
      do Y = 1 to 4;
         A = sum(wrkallmt*medtt_all);
         B = sum(wrkallmt);
         wgtavg_all = A/B;
         output;
      end;
      output;
   end;
run; 

I'm using SAS 9.3. Can i use proc sql using group by? but in that case i need to use group by two times to get the desired result.

There is certainly more than one way to do this, but see if this is close to what you want. This requires 2 sorts, but takes advantage of the BY statement (and the LAST temp variable) in the DATA STEP.

data have;
 length x y 8.;
 Input X wrkallmt medtt_all Y;
 datalines;
10420 15 25.4 1
10420 50 5.7 4
10420 70 5.9 3
10420 110 5.9 4
10420 110 15 2
12630 30 15.8 4
12630 15 10.9 2
12630 85 10.5 2
12630 20 20.7 1
12630 20 23.5 3
13560 20 20.3 3
13560 20 20.8 3
13560 30 20.9 4
13560 20 25.5 2
13560 25 30.6 4
13560 20 45.3 1
13560 20 40.4 4
13560 20 30.9 1
13560 10 10.8 2
;

proc sort data=have out=sortedx;
 by x y;
run;

Data need1(keep=x y weighted_avg_combined_y);
 retain a b 0;
 set sortedx;
 by x y;

 a = a + (wrkallmt * medtt_all);
 b = b + wrkallmt;

 if last.y then do;
  weighted_avg_combined_y=a/b;
  output ;
 end;

run;

Proc sort data=have out=sortedy;
 by y x;
run;

Data need2(keep=x y weighted_avg_all_y);
 retain a b 0;
 set sortedy;
 by y x;

 a = a + (wrkallmt * medtt_all);
 b = b + wrkallmt;

 if last.y then do;
  weighted_avg_all_y=a/b;
  output ;
 end;

run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM