简体   繁体   中英

SAS/SQL group by and keeping all rows

I have a table like this, observing the behavior of some accounts in time, here two accounts with acc_ids 1 and 22:

acc_id   date    mob
  1      Dec 13   -1
  1      Jan 14    0
  1      Feb 14    1
  1      Mar 14    2
  22     Mar 14    10
  22     Apr 14    11
  22     May 14    12

I would like to create a column orig_date that would be equal to date if mob=0 and to minimum date by acc_id group if there is no mob=0 for that acc_id .

Therefore the expected output is:

acc_id   date    mob   orig_date
  1      Dec 13   -1     Jan 14
  1      Jan 14    0     Jan 14
  1      Feb 14    1     Jan 14
  1      Mar 14    2     Jan 14
  22     Mar 14    10    Mar 14
  22     Apr 14    11    Mar 14
  22     May 14    12    Mar 14

The second account does not have mob=0 observation, therefore orig_date is set to min(date) by group.

Is there some way how to achieve this in SAS, preferably by one proc sql step?

Here is a data step approach

data have;
input acc_id date $ mob;
datalines;
1  Dec13 -1
1  Jan14  0
1  Feb14  1
1  Mar14  2
22 Mar14  10
22 Apr14  11
22 May14  12
;

data want;
    do until (last.acc_id);
        set have;
        by acc_id;
        if first.acc_id then orig_date=date;
        if mob=0 then orig_date=date;
    end;
    do until (last.acc_id);
        set have;
        by acc_id;
        output;
    end;
run;

Seems pretty simple. Just calculate the min date in two ways and use coalesce() to pick the one you want.

First let's turn your printout into an actual dataset.

data have ;
  input acc_id date :anydtdte. mob ;
  format date date9.;
cards;
1      Dec13   -1
1      Jan14    0
1      Feb14    1
1      Mar14    2
22     Mar14    10
22     Apr14    11
22     May14    12
;

To find the DATE when MOB=0 use a CAsE clause. PROC SQL will automatically remerge the MIN() aggregate results calculated at the ACC_ID level back onto all of the detail rows.

proc sql ;
create table want as
select *
     , coalesce( min(case when mob=0 then date else . end)
               , min(date)
               ) as orig_date format=date9.
from have
group by acc_id
order by acc_id, date 
;
quit;

Result:

Obs    acc_id         date    mob    orig_date

 1        1      01DEC2013     -1    01JAN2014
 2        1      01JAN2014      0    01JAN2014
 3        1      01FEB2014      1    01JAN2014
 4        1      01MAR2014      2    01JAN2014
 5       22      01MAR2014     10    01MAR2014
 6       22      01APR2014     11    01MAR2014
 7       22      01MAY2014     12    01MAR2014

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM