简体   繁体   English

将行转置为列

[英]Transpose rows to columns

My input data: 我的输入数据:

输入数据

Preferred output data: 首选输出数据:

首选输出

My best try: 我最好的尝试:

Which is wrong because it only includes idnumber 2 and 4. 这是错误的,因为它仅包含idnumber 2和4。

我最好的尝试

Data: 数据:

    DATA WORK.transpose_csv;
LENGTH
    idnumber           8
    start_end        $ 5
    date               8 ;
FORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INFORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INPUT
    idnumber         : ?? BEST1.
    start_end        : $CHAR5.
    date             : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;

My best try: 我最好的尝试:

    proc transpose data=transpose_csv
                   out =wide;
                   by idnumber;
                   id start_end ;
    run;

As shown by this post it can be easily done in R, but I need to do this in SAS: Spread with duplicate identifiers (using tidyverse and %>%) 如这篇文章所示,可以在R中轻松完成此操作,但我需要在SAS中执行此操作: 使用重复的标识符进行传播(使用tidyverse和%>%)

The problem with proc transpose here is that you can have multiple events for a particular idnumber. proc transpose的问题在于,对于一​​个特定的idnumber,您可以具有多个事件。 If you are able to change the source data to add an extra id variable, eg event_id, then it would make the task much easier. 如果您能够更改源数据以添加额外的id变量,例如event_id,那么它将使任务变得更加容易。

You can either continue with proc transpose as below, followed by a data step to bring the start / end dates on 1 row, or just do it in a single data step and hard code some values. 您可以按以下步骤继续进行proc transpose ,然后执行数据步骤以将开始/结束日期置于1行,或者仅在单个数据步骤中进行操作并硬编码一些值。 There are other methods as well, such as a hash solution that would probably work well for this type of problem. 还有其他方法,例如散列解决方案可能会很好地解决此类问题。

Edit : Added a 3rd method that first creates an event_id, which makes the subsequent proc transpose easy 编辑:添加了第三个方法,该方法首先创建一个event_id,这使后续proc transpose变得容易

/* source data */
DATA WORK.transpose_csv;
LENGTH
    idnumber           8
    start_end        $ 5
    date               8 ;
FORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INFORMAT
    idnumber         BEST1.
    start_end        $CHAR5.
    date             YYMMDD10. ;
INPUT
    idnumber         : ?? BEST1.
    start_end        : $CHAR5.
    date             : ?? YYMMDD10. ;
DATALINES;
2 start 1994-05-01
2 end 1996-11-04
4 start 1979-07-18
5 start 2005-02-01
5 end 2009-09-17
5 start 2010-10-01
5 end 2012-10-06
;
run;

/* method1 */
proc transpose data=transpose_csv
               out =wide1 (drop=_: start_end);
               by idnumber start_end notsorted;
               id start_end ;
run;

data wide2;
set wide1;
by idnumber;
retain _start;
if not missing(start) then _start=start;
if not missing(end) or last.idnumber then do;
        start=_start;
        output;
        end;
drop _start;
run;


/* method2 */
data wide3;
set transpose_csv;
by idnumber;
retain start;
format start end yymmdd10.;
if start_end='start' then start=date;
if start_end='end' then do;
    end=date;
    output;
    end;
else if last.idnumber then output;
drop start_end date;
run;

/* method3 */
data transpose_csv1;
set transpose_csv;
by idnumber;
if first.idnumber then event_id=0;
event_id+(start_end='start');
run;

proc transpose data=transpose_csv1
                   out =wide4 (drop=_: event_id);
                   by idnumber event_id;
                   id start_end ;   
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM