[英]Problems aggregating data by variable in SAS
我有看起來像這樣的數據:
ID FileSource Age MamUlt ProcDate Name
223 Facility 35 M 19591 SWEDISH
223 Facility 35 M 19592 SWEDISH
223 Facility 35 U 19592 SWEDISH
223 Facility 35 U 19593 SWEDISH
223 Non-Facility 35 M 19594 RADIA
223 Non-Facility 35 U 19594 RADIA
我想要做的是將數據(對於數據集中的每個ID)合並為以下形式:
ID Age MAMs ULTs SameDate
223 35 3 3 2
因此,對於每個ID,我需要顯示“ M”和“ U”的總時間以及它們在同一日期顯示的次數; 在此示例中兩次。
這是我到目前為止的內容:
data ImageTotals;
set ImageClaims;
by ID;
retain ID MAMs ULTs SameDate;
if first.ID then do;
MAMs = 0;
ULTs = 0;
MamDate = .;
UltDate = .;
SameDate = 0;
end;
if MamUlt = "M" then do; MAMs = MAMs + 1; MamDate = ProcDate; end;
if MamUlt = "U" then do; ULTs = ULTs + 1; UltDate = ProcDate; end;
if MamDate = UltDate and MamDate ^= . then do; SameDate = SameDate+1; end;
if last.ID;
keep ID MAMs ULTs SameDate;
run;
有什么建議嗎? 這解決了計數問題,但解決了SameDate問題(對於該實例仍然為零)。
您可以使用DOW循環在數據步驟中進行聚合。 數據必須按ID和PROCDATE排序。 在同一日期內計數出現M或U的次數。 然后,您可以使用這些天數在ID級別進行匯總,還可以測試兩者是否都出現在同一日期。 簡單地保留AGE變量,使其具有該ID的最后一條記錄中的值。
data counts ;
do until (last.id);
m=0;
u=0;
do until (last.procdate);
set imageclaims;
by id procdate;
m= sum(m,proc='M');
u= sum(u,proc='U');
end;
MAMs=sum(mams,m);
ULTs=sum(ults,u);
SameDate=sum(samedate,m and u);
end;
keep id age mams ults samedate ;
run;
我認為這可能是一個SQL問題(不是我的專長),但是自從您開始使用DATA步驟解決方案以來,我對這兩者都持了懷疑態度。 我還添加了更多測試數據。
data ImageClaims;
input id age Proc $1. ProcDate;
cards;
223 35 M 19591
223 35 M 19592
223 35 U 19592
223 35 U 19593
223 35 M 19594
223 35 U 19594
224 35 M 19591
224 35 M 19592
224 35 M 19593
224 35 M 19593
224 35 M 19594
224 35 U 19595
225 35 M 19592
225 35 U 19592
225 35 U 19593
225 35 M 19593
225 35 M 19594
225 35 U 19594
;
run;
對於DATA步驟,請為MAM,ULT和MAMULT(同一天的Mam和Ult)創建計數器。 注意,因為我對這些計數器(MAMs ++ 1)使用sum語句,所以它們被隱式保留。
data ImageTotals (keep=id Age MAMs ULTs MAMULTs);
set ImageClaims;
by ID ProcDate;
retain HaveMam HaveUlt; *Count vars are implicitly retained by sum statement;
if first.ID then do;
MAMs=0; *count of mammograms;
ULTs=0; *count of ultrasounds;
MAMULTs=0; *count of mammograms and ultrasounds on same date;
end;
if first.ProcDate then do;
HaveMam=0; *indicator for have a mammogram or not on that date;
HaveUlt=0; *indicator for have an ultrasound or not on that date;
end;
if Proc='M' then do;
HaveMam=1; *set mammogram indicator (for that date);
MAMs++1; *increment counter;
end;
else if Proc='U' then do;
HaveUlt=1; *set ultrasound indicator (for that date);
ULTs++1; *increment counter;
end;
if last.ProcDate then do;
MAMULTs++(HaveMam=1 and HaveUlt=1); *increment MamUlts counter if had both on same date;
end;
if last.id;
run;
對於SQL解決方案,我使用一個子查詢,該子查詢按ID和ProcDate對MAM,ULT和MAMULT進行計數,然后外部查詢按ID對它們進行求和。 可能有更好的SQL解決方案,但我認為這可行。
proc sql;
create table ImageTotals as
select id
,max(age) as age /*arbitrary use of max age is constant within id*/
,sum(MAMs) as MAMs
,sum(ULTs) as ULTs
,sum(MAMULTs) as MAMULTs
from (
select id
,procdate
,max(age) as age
,sum(Proc='M') as MAMs
,sum(Proc='U') as ULTs
,count(distinct(Proc))=2 as MAMULTs
from ImageClaims
group by id,ProcDate
)
group by id
;
quit;
proc print;
run;
從這兩個步驟中獲得的Work.ImageTotals是:
Obs id age MAMs ULTs MAMULTs
1 223 35 3 3 2
2 224 35 5 1 0
3 225 35 3 3 3
認為一旦接受Q的建議,就可以使用proc sql(count / group by)解決此問題,除非我誤解了這里的復雜性...本來要發布一些代碼,但首先讓您對其進行破解...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.