简体   繁体   中英

Select top 3 observations based upon multiple fields in SAS?

I've got an extremely large SAS dataset containing records of which I'd like to sum the top 3 records based on multiple fields.

An example of the data:

Data Image

Assume the data is sorted correctly, ie Sorted by Ref, Date1(desc), Time(desc), Date2(Desc). The 'Sum' field doesn't exist in the dataset (see below).

Using SAS, I need to sum the top 3 most recent values (based on Date1 & Time) of each ref, for each instance of Date2. In the example data, the 'Sum' field is how the data needs to be sum'd, ie, sum all the 1s together, the 2s together, etc.

Apologies for the poor explanation, I've been attempting to do this for a few days to no avail!

Many thanks.

This should do the trick. You need to use by-group processing which is enabled with the by statement. You can then use the first. and last. notation to know when you've reached the start or end of each group. The retain statement informs sas which variables should remember their values across observations.

Sample Data:

data tmp;
  informat date1 date2 ddmmyy10.;
  input ref 
        date1 
        date2 
        time
        value
        ;
datalines;
11 03/01/2014 01/01/2014 9 345
11 03/01/2014 01/01/2014 8 322
11 03/01/2014 01/01/2014 7 6546
11 01/01/2014 31/12/2013 6 34
11 01/01/2014 31/12/2013 5 33
22 02/01/2014 01/01/2014 4 234
22 02/01/2014 01/01/2014 3 66
22 01/01/2014 01/01/2014 2 234
33 01/01/2014 01/01/2014 1 2
33 01/01/2014 31/12/2014 0 45
;
run;

Then make sure the data is sorted correctly so we can use by-group processing:

proc sort data=tmp;
  by ref date1 date2 descending time;
run;

Because the sum() statement only cumulatively sums the values when the counter is <= 3 you will get the sum of the top 3 values for each by group. When the end of the group is reached a record will be output.

data tmp2;
  set tmp;
  by ref date1 date2;
  retain counter total .;

  if first.date2 then do; 
    total = 0;
    counter = 1;
  end;

  if counter le 3 then do;
    total = sum(total,value);
  end;

  if last.ref or last.date1 or last.date2 then do;
    output;
  end;

  counter = counter+1;

run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM