[英]Transpose rows to columns
Which is wrong because it only includes idnumber 2 and 4. 这是错误的,因为它仅包含idnumber 2和4。
Data: 数据:
DATA WORK.transpose_csv;
LENGTH
idnumber 8
start_end $ 5
date 8 ;
FORMAT
idnumber BEST1.
start_end $CHAR5.
date YYMMDD10. ;
INFORMAT
idnumber BEST1.
start_end $CHAR5.
date YYMMDD10. ;
INPUT
idnumber : ?? BEST1.
start_end : $CHAR5.
date : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;
My best try: 我最好的尝试:
proc transpose data=transpose_csv
out =wide;
by idnumber;
id start_end ;
run;
As shown by this post it can be easily done in R, but I need to do this in SAS: Spread with duplicate identifiers (using tidyverse and %>%) 如这篇文章所示,可以在R中轻松完成此操作,但我需要在SAS中执行此操作: 使用重复的标识符进行传播(使用tidyverse和%>%)
The problem with proc transpose
here is that you can have multiple events for a particular idnumber. proc transpose
的问题在于,对于一个特定的idnumber,您可以具有多个事件。 If you are able to change the source data to add an extra id variable, eg event_id, then it would make the task much easier. 如果您能够更改源数据以添加额外的id变量,例如event_id,那么它将使任务变得更加容易。
You can either continue with proc transpose
as below, followed by a data step to bring the start / end dates on 1 row, or just do it in a single data step and hard code some values. 您可以按以下步骤继续进行proc transpose
,然后执行数据步骤以将开始/结束日期置于1行,或者仅在单个数据步骤中进行操作并硬编码一些值。 There are other methods as well, such as a hash solution that would probably work well for this type of problem. 还有其他方法,例如散列解决方案可能会很好地解决此类问题。
Edit : Added a 3rd method that first creates an event_id, which makes the subsequent proc transpose
easy 编辑:添加了第三个方法,该方法首先创建一个event_id,这使后续proc transpose
变得容易
/* source data */
DATA WORK.transpose_csv;
LENGTH
idnumber 8
start_end $ 5
date 8 ;
FORMAT
idnumber BEST1.
start_end $CHAR5.
date YYMMDD10. ;
INFORMAT
idnumber BEST1.
start_end $CHAR5.
date YYMMDD10. ;
INPUT
idnumber : ?? BEST1.
start_end : $CHAR5.
date : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;
/* method1 */
proc transpose data=transpose_csv
out =wide1 (drop=_: start_end);
by idnumber start_end notsorted;
id start_end ;
run;
data wide2;
set wide1;
by idnumber;
retain _start;
if not missing(start) then _start=start;
if not missing(end) or last.idnumber then do;
start=_start;
output;
end;
drop _start;
run;
/* method2 */
data wide3;
set transpose_csv;
by idnumber;
retain start;
format start end yymmdd10.;
if start_end='start' then start=date;
if start_end='end' then do;
end=date;
output;
end;
else if last.idnumber then output;
drop start_end date;
run;
/* method3 */
data transpose_csv1;
set transpose_csv;
by idnumber;
if first.idnumber then event_id=0;
event_id+(start_end='start');
run;
proc transpose data=transpose_csv1
out =wide4 (drop=_: event_id);
by idnumber event_id;
id start_end ;
run;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.