簡體   English   中英

在SAS中按變量匯總數據的問題

[英]Problems aggregating data by variable in SAS

我有看起來像這樣的數據:

ID  FileSource      Age MamUlt  ProcDate    Name
223 Facility        35  M       19591       SWEDISH
223 Facility        35  M       19592       SWEDISH
223 Facility        35  U       19592       SWEDISH
223 Facility        35  U       19593       SWEDISH
223 Non-Facility    35  M       19594       RADIA
223 Non-Facility    35  U       19594       RADIA

我想要做的是將數據(對於數據集中的每個ID)合並為以下形式:

ID   Age MAMs ULTs SameDate 
223  35  3    3    2

因此,對於每個ID,我需要顯示“ M”和“ U”的總時間以及它們在同一日期顯示的次數; 在此示例中兩次。

這是我到目前為止的內容:

data ImageTotals;
    set ImageClaims;
    by ID;
    retain ID MAMs ULTs SameDate;

    if first.ID then do;
        MAMs = 0;
        ULTs = 0;
        MamDate = .;
        UltDate = .;
        SameDate = 0;
    end;

    if MamUlt = "M" then do; MAMs = MAMs + 1; MamDate = ProcDate; end;
    if MamUlt = "U" then do; ULTs = ULTs + 1; UltDate = ProcDate; end;
    if MamDate = UltDate and MamDate ^= . then do; SameDate = SameDate+1; end;

    if last.ID;
    keep ID MAMs ULTs SameDate;
run;

有什么建議嗎? 這解決了計數問題,但解決了SameDate問題(對於該實例仍然為零)。

您可以使用DOW循環在數據步驟中進行聚合。 數據必須按ID和PROCDATE排序。 在同一日期內計數出現M或U的次數。 然后,您可以使用這些天數在ID級別進行匯總,還可以測試兩者是否都出現在同一日期。 簡單地保留AGE變量,使其具有該ID的最后一條記錄中的值。

data counts ;
  do until (last.id);
    m=0;
    u=0;
    do until (last.procdate);
      set imageclaims;
      by id procdate;
      m= sum(m,proc='M');
      u= sum(u,proc='U');
    end;
    MAMs=sum(mams,m);
    ULTs=sum(ults,u);
    SameDate=sum(samedate,m and u);
  end;
  keep id age mams ults samedate ;
run;

我認為這可能是一個SQL問題(不是我的專長),但是自從您開始使用DATA步驟解決方案以來,我對這兩者都持了懷疑態度。 我還添加了更多測試數據。

data ImageClaims;
  input id age Proc $1. ProcDate;
  cards;
223 35 M 19591
223 35 M 19592
223 35 U 19592
223 35 U 19593
223 35 M 19594
223 35 U 19594
224 35 M 19591
224 35 M 19592
224 35 M 19593
224 35 M 19593
224 35 M 19594
224 35 U 19595
225 35 M 19592
225 35 U 19592
225 35 U 19593
225 35 M 19593
225 35 M 19594
225 35 U 19594
;
run;

對於DATA步驟,請為MAM,ULT和MAMULT(同一天的Mam和Ult)創建計數器。 注意,因為我對這些計數器(MAMs ++ 1)使用sum語句,所以它們被隱式保留。

data ImageTotals (keep=id Age MAMs ULTs MAMULTs);
  set ImageClaims;
  by ID ProcDate;
  retain HaveMam HaveUlt; *Count vars are implicitly retained by sum statement;
  if first.ID then do;
    MAMs=0;    *count of mammograms;
    ULTs=0;    *count of ultrasounds;
    MAMULTs=0; *count of mammograms and ultrasounds on same date;
  end;
  if first.ProcDate then do;
    HaveMam=0;  *indicator for have a mammogram or not on that date;
    HaveUlt=0;  *indicator for have an ultrasound or not on that date;
  end;

  if Proc='M' then do;
    HaveMam=1;  *set mammogram indicator (for that date);
    MAMs++1;    *increment counter;
  end;
  else if Proc='U' then do;
    HaveUlt=1;  *set ultrasound indicator (for that date);
    ULTs++1;    *increment counter;
  end;

  if last.ProcDate then do;
    MAMULTs++(HaveMam=1 and HaveUlt=1); *increment MamUlts counter if had both on same date;
  end;

  if last.id;
run;

對於SQL解決方案,我使用一個子查詢,該子查詢按ID和ProcDate對MAM,ULT和MAMULT進行計數,然后外部查詢按ID對它們進行求和。 可能有更好的SQL解決方案,但我認為這可行。

proc sql;
  create table ImageTotals as
    select id
          ,max(age) as age  /*arbitrary use of max age is constant within id*/
          ,sum(MAMs) as MAMs
          ,sum(ULTs) as ULTs
          ,sum(MAMULTs) as MAMULTs
    from (
          select id
                ,procdate
                ,max(age) as age
                ,sum(Proc='M') as MAMs
                ,sum(Proc='U') as ULTs
                ,count(distinct(Proc))=2 as MAMULTs
          from ImageClaims
          group by id,ProcDate
          )
    group by id
  ;
quit;

proc print;
run;

從這兩個步驟中獲得的Work.ImageTotals是:

Obs     id    age    MAMs    ULTs    MAMULTs

 1     223     35      3       3        2
 2     224     35      5       1        0
 3     225     35      3       3        3

認為一旦接受Q的建議,就可以使用proc sql(count / group by)解決此問題,除非我誤解了這里的復雜性...本來要發布一些代碼,但首先讓您對其進行破解...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM