简体   繁体   中英

How do I find first row of last group in SAS, where ordering matters?

I'd like to ask help in this, as I am new to SAS, but a PROC SQL approach is usable as well.

My dataset has IDs, a time variable, and a flag. After I sort by id and time, I need to find the first flagged observation of the last flagged group/streak. As in:

ID TIME FLAG
1   2    1
1   3    1
1   4    1
1   5    0
1   6    1
1   7    0
1   8    1
1   9    1
1  10    1
2   2    0
2   3    1
2   4    1
2   5    1
2   6    1
2   7    1

Here I want my script to return the row where time is 8 for ID 1, as it is the first observation from the last "streak", or flagged group. For ID 2 it should be where time is 3.

Desired output:

ID TIME FLAG
1   8    1
2   3    1

I'm trying to wrap my head around using first. and last. here, but I suppose the problem here is that I view temporally displaced flagged groups/streaks as different groups, while SAS looks at them as they are only separated by flag, so a simple "take first. from last." is not sufficient.

I was also thinking of collapsing the flags to a string and using a regex lookahead, but I couldn't come up with either the method or the pattern.

I would just code a double DOW loop. The first will let you calculate the observation for this ID that you want to output and the second will read through the records again and output the selected observation.

You can use the NOTSORTED keyword on the BY statement to have SAS calculate the FIRST.FLAG variable.

data have;
  input ID TIME FLAG;
cards;
1   2    1
1   3    1
1   4    1
1   5    0
1   6    1
1   7    0
1   8    1
1   9    1
1  10    1
2   2    0
2   3    1
2   4    1
2   5    1
2   6    1
2   7    1
;

data want;
  do obs=1 by 1 until(last.id);
    set have;
    by id flag notsorted;
    if first.flag then want=obs;
  end;
  do obs=1 to obs;
    set have;
    if obs=want then output;
  end;
  drop obs want;
run;

Loop through the dataset by id. Use the lag function to look at the current and previous value of flag. If the current value is 1 and the previous value is 0, or it's the first observation for that ID, write the value of time to a retained variable. Only output the last observation for each id. The retained variable should contain the time of the first flagged observation of the last flagged group:

data result;
 set have;
 by id;
 retain firstflagged;
 prevflag = lag(flag);
 if first.id and flag = 1 then firstflagged = time;
 else if first.id and flag = 0 then firstflagged = .;
 else if flag = 1 and prevflag = 0 then firstflagged = time;
 if last.id then output;
 keep id firstflagged flag;
 rename firstflagged = time;
run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM