[英]SAS - calculate the number of amendments
我有一个日志文件,其中包含一些记录的不同版本。 在SAS中,通过合理的大文件记录来计算每个变量(在用户定义的列表中)修正数的最有效方法是什么?
例如:
%let vars='Var1 Var2 Var4';
Record_ID Var1 Var2 VarThree Var4
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A
我想收到像这样的东西:
ID Var No
1 Var1 0
1 Var2 0
1 Var4 2
2 Var1 0
2 Var2 1
2 Var4 2
以下解决方案分两步实现所需的布局:1.获取更改计数2.转置。
data have;
input (id var1-var4) ($);
cards;
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A
;
data _want;
set have(rename=(var1-var4=v1-v4));
by id;
array v v:;
array var var1-var4;
do over v;
var+(v ne lag(v));
if first.id then var=0;
end;
if last.id;
drop v1-v4;
run;
PROC TRANSPOSE DATA=_want
OUT=want(rename=col1=no)
NAME=var
;
BY id;
VAR var1 var2 var3 var4;
RUN; QUIT;
我使用第一个变量来计算运行次数,但是我想,如果VARS是混合类型的,那么LAG似乎更容易,尽管这不是无法克服的,但会带来一系列滞后问题。 我添加了一些代码来处理用户定义的变量列表,这将是要求的起点。
data log;
input Record_ID (Var1-Var4)(:$1.);
cards;
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A
;;;;
run;
proc print;
run;
%macro main(data=log,id=record_id,vars=var1-var2 var4);
proc transpose data=&data(obs=0) out=vars;
var &vars;
run;
proc sql noprint;
select catx(' ',"set &data(keep=&id",_name_,"); by notsorted &id",_name_,';')
into :stmts separated by ' '
from vars;
quit;
%put NOTE: &=sqlobs %bquote(&=stmts);
data report(keep=&id varname count);
do until(last.&id);
&stmts;
array _f[*] 'first.'n:;
array _n[%eval(&sqlobs+1)] n0-n&sqlobs;
drop n0;
do j = 2 to dim(_f);
_n[j] + _f[j];
end;
end;
length varname $32;
do j=2 to dim(_f);
varname = scan(vname(_f[j]),-1);
count = _n[j]-1;
output;
end;
call missing(of _n[*]);
run;
proc print;
run;
%mend main;
options mprint=1;
%main();
假设数据不是太大,这是我首先想到的转置然后计数方法。
data have;
input (id var1-var4) ($);
rowid=_n_;
cards;
1 A A A A
1 A A A B
1 A A A A
2 A A A A
2 A B B A
2 A B C B
2 A B B A
;
414 %let vars=Var1 Var2 Var4;
415
416 proc transpose data=have out=h(keep=id _name_ col1
417 rename=(_name_=Var col1=Value)
418 );
419 var &vars;
420 by rowid id;
421 run;
NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.H has 21 observations and 3 variables.
422
423
424 proc sort data=h equals;
425 by id Var;
426 run;
NOTE: There were 21 observations read from the data set WORK.H.
NOTE: The data set WORK.H has 21 observations and 3 variables.
427
428 data want(keep=id Var NumberOfChanges);
429 set h;
430 by id Var Value notsorted;
431 if first.Var then NumberOfChanges=0;
432 else if first.Value then NumberOfChanges++1;
433 if last.Var;
434
435 put (ID Var NumberofChanges)(=);
436 run;
id=1 Var=var1 NumberOfChanges=0
id=1 Var=var2 NumberOfChanges=0
id=1 Var=var4 NumberOfChanges=2
id=2 Var=var1 NumberOfChanges=0
id=2 Var=var2 NumberOfChanges=1
id=2 Var=var4 NumberOfChanges=2
NOTE: There were 21 observations read from the data set WORK.H.
NOTE: The data set WORK.WANT has 6 observations and 3 variables.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.