简体   繁体   中英

SAS - calculate the number of amendments

I have a log file which contains different versions of some records. What is the most effective way in SAS to calculate No of amendments for each variable (in the user-defined list) by records for the reasonably big file?

For example:

%let vars='Var1 Var2 Var4';

Record_ID Var1 Var2 VarThree Var4   
1 A A A A  
1 A A A B  
1 A A A A  
2 A A A A  
2 A B B A  
2 A B C B  
2 A B B A  

I want to receive smth like:

ID Var No  
1 Var1 0  
1 Var2 0  
1 Var4 2  
2 Var1 0  
2 Var2 1  
2 Var4 2  

The following solution takes two steps to achieve the layout you want, 1. get the count of the changes 2. Transpose.

data have;
input (id var1-var4) ($);
cards;
1 A A A A
 1 A A A B
 1 A A A A
 2 A A A A
 2 A B B A
 2 A B C B
 2 A B B A 
 ;


 data _want;
 set have(rename=(var1-var4=v1-v4));
 by id;
 array v v:;
 array var var1-var4;
 do over v;
 var+(v ne lag(v));
 if first.id then var=0;
 end;
 if last.id;
 drop v1-v4;
 run;

 PROC TRANSPOSE DATA=_want
    OUT=want(rename=col1=no)
    NAME=var
;
    BY id;
    VAR var1 var2 var3 var4;
RUN; QUIT;

I used first variables to count the runs, but LAG seems easier I suppose if the VARS were mixed type that would present a problem for an array of lags, although not insurmountable. I added some code to handle the user defined list of variables which would be a start to the requirement.

data log;
   input Record_ID (Var1-Var4)(:$1.);
   cards;
 1 A A A A
 1 A A A B
 1 A A A A
 2 A A A A
 2 A B B A
 2 A B C B
 2 A B B A 
 ;;;;
   run;
proc print;
   run;
%macro main(data=log,id=record_id,vars=var1-var2 var4);
   proc transpose data=&data(obs=0) out=vars;
      var &vars;
      run;
   proc sql noprint;
      select catx(' ',"set &data(keep=&id",_name_,"); by notsorted &id",_name_,';') 
         into :stmts separated by ' '
      from vars;
      quit;
   %put NOTE: &=sqlobs %bquote(&=stmts);

   data report(keep=&id varname count);
      do until(last.&id);
         &stmts;
         array _f[*] 'first.'n:;
         array _n[%eval(&sqlobs+1)] n0-n&sqlobs;
         drop n0;
         do j = 2 to dim(_f);
            _n[j] + _f[j];
            end;
         end;
      length varname $32;      
      do j=2 to dim(_f);
         varname = scan(vname(_f[j]),-1);
         count   = _n[j]-1;
         output;
         end;
      call missing(of _n[*]);
      run;
   proc print;
      run;
   %mend main;
options mprint=1;
%main();

Assuming the data aren't too big, here is the transpose and then count approach that came to my mind first.

data have;
  input (id var1-var4) ($);
  rowid=_n_;
cards;
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A 
;

414  %let vars=Var1 Var2 Var4;
415
416  proc transpose data=have out=h(keep=id _name_ col1
417                                rename=(_name_=Var col1=Value)
418                                );
419   var &vars;
420   by rowid id;
421  run;

NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.H has 21 observations and 3 variables.

422
423
424  proc sort data=h equals;
425    by id Var;
426  run;

NOTE: There were 21 observations read from the data set WORK.H.
NOTE: The data set WORK.H has 21 observations and 3 variables.

427
428  data want(keep=id Var NumberOfChanges);
429    set h;
430    by id Var Value notsorted;
431    if first.Var then NumberOfChanges=0;
432    else if first.Value then NumberOfChanges++1;
433    if last.Var;
434
435    put (ID Var NumberofChanges)(=);
436  run;

id=1 Var=var1 NumberOfChanges=0
id=1 Var=var2 NumberOfChanges=0
id=1 Var=var4 NumberOfChanges=2
id=2 Var=var1 NumberOfChanges=0
id=2 Var=var2 NumberOfChanges=1
id=2 Var=var4 NumberOfChanges=2
NOTE: There were 21 observations read from the data set WORK.H.
NOTE: The data set WORK.WANT has 6 observations and 3 variables.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM