简体   繁体   English

SAS:按ID重复最后一个值

[英]SAS: repeat the last value by ID

I have this database:我有这个数据库:

data temp;
input ID date type ;
  datalines;
 1 10/11/2006   1      
 1 10/12/2006   2      
 1 15/01/2007   2      
 1 20/01/2007   3    
 2 10/08/2008   1        
 2 11/09/2008   1        
 2 17/10/2008   1        
 2 12/11/2008   2    
 2 10/12/2008   3       
 ;

I would like to create a new column where I repeat the last date by ID:我想创建一个新列,在其中按 ID 重复最后一个日期:

data temp;
input ID date type  last_date;
  datalines;
 1 10/11/2006   1        20/01/2007
 1 10/12/2006   2        20/01/2007
 1 15/01/2007   2        20/01/2007
 1 20/01/2007   3        20/01/2007
 2 10/08/2008   1        10/12/2008
 2 11/09/2008   1        10/12/2008
 2 17/10/2008   1        10/12/2008
 2 12/11/2008   2        10/12/2008
 2 10/12/2008   3        10/12/2008
 ;

I have tried this code but it doesn't work:我已经尝试过这段代码,但它不起作用:

  data temp;
  set temp;
  IF last.ID then last_date= .;
  RETAIN last_date;
  if   missing(last_date) then last_date= date;
  run;

Thank you in advance for your help!预先感谢您的帮助!

First thing is that FIRST.ID and LAST.ID variables are not created in the data step unless you include the variable ID in the BY statement.首先,除非您在 BY 语句中包含变量 ID,否则不会在数据步骤中创建 FIRST.ID 和 LAST.ID 变量。

Second is that to attach the last date to each observation you need to process the data twice.其次,要将最后日期附加到每个观察值,您需要处理两次数据。 Your current code (if the BY statement is added) will only assign a value to LAST_DATE on the last observation of the by group.您当前的代码(如果添加了 BY 语句)只会在最后一次观察 by 组时为 LAST_DATE 分配一个值。

One way to do this is to re-sort the data by descending date within each by group then you could use BY ID and FIRST.ID and RETAIN.一种方法是按每个分组中的降序日期对数据进行重新排序,然后您可以使用 BY ID 和 FIRST.ID 和 RETAIN。

proc sort data=have;
   by id descending date;
run;
data want;
   set have;
   by id descending date;
   if first.id then last_date=date;
   retain last_date;
   format last_date ddmmyy10.;
run;

Here is a way to use the original sort order using what is called a double DOW loop.这是一种使用所谓的双 DOW 循环来使用原始排序顺序的方法。 By placing the SET/BY statements inside of a DO loop you can read all of the observations for a group in a single pass of the data step.通过将 SET/BY 语句放在 DO 循环中,您可以在数据步骤的单次传递中读取组的所有观察结果。 You then add a second DO loop to re-process that BY group and use the information calculated in the first loop and write out the observations.然后,您添加第二个 DO 循环以重新处理该 BY 组,并使用在第一个循环中计算的信息并写出观察结果。

data want;
do until (last.id);
  set have;
  by id;
end;
last_date=date ;
format last_date ddmmyy10.;
do until (last.id);
  set have;
  by id; 
  output;
end;
run;

Two other ways are:另外两种方法是:

  • Proc SQL joining a subselect, or Proc SQL加入子选择,或
  • Proc MEANS + DATA/MERGE Proc MEANS + DATA/MERGE

SQL SQL

 proc sql;
   create table want as
   select have.*, id_group.max_date as last_date format=ddmmyy10.
   from
     have
   join 
     ( select id, max(date) as max_date
       from have
       group by id
     ) as id_group
   on
     have.id = id_group.id
   ; 

MEANS方法

proc means noprint data=have;
  by id;
  var date;
  output out=maxdates(keep=id last_date) max(date)=last_date;
run;

data want;
  merge have maxdates;
  by id;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM