简体   繁体   English

如何在 SAS 中找到最后一组的第一行,排序很重要?

[英]How do I find first row of last group in SAS, where ordering matters?

I'd like to ask help in this, as I am new to SAS, but a PROC SQL approach is usable as well.我想在这方面寻求帮助,因为我是 SAS 的新手,但也可以使用 PROC SQL 方法。

My dataset has IDs, a time variable, and a flag.我的数据集有 ID、时间变量和标志。 After I sort by id and time, I need to find the first flagged observation of the last flagged group/streak.在我按 id 和时间排序后,我需要找到最后一个标记组/条纹的第一个标记观察。 As in:如:

ID TIME FLAG
1   2    1
1   3    1
1   4    1
1   5    0
1   6    1
1   7    0
1   8    1
1   9    1
1  10    1
2   2    0
2   3    1
2   4    1
2   5    1
2   6    1
2   7    1

Here I want my script to return the row where time is 8 for ID 1, as it is the first observation from the last "streak", or flagged group.在这里,我希望我的脚本返回 ID 1 的时间为 8 的行,因为它是最后一个“连续”或标记组的第一个观察结果。 For ID 2 it should be where time is 3.对于 ID 2,它应该是时间为 3 的地方。

Desired output:期望的输出:

ID TIME FLAG
1   8    1
2   3    1

I'm trying to wrap my head around using first.我试图首先使用。 and last.最后。 here, but I suppose the problem here is that I view temporally displaced flagged groups/streaks as different groups, while SAS looks at them as they are only separated by flag, so a simple "take first. from last."在这里,但我想这里的问题是我将时间上移位的标记组/条纹视为不同的组,而 SAS 将它们视为仅由标志分隔,因此简单的“先取。从最后”。 is not sufficient.是不够的。

I was also thinking of collapsing the flags to a string and using a regex lookahead, but I couldn't come up with either the method or the pattern.我还考虑将标志折叠为字符串并使用正则表达式前瞻,但我无法想出方法或模式。

I would just code a double DOW loop.我只想编写一个双 DOW 循环。 The first will let you calculate the observation for this ID that you want to output and the second will read through the records again and output the selected observation.第一个将让您计算要输出的此 ID 的观察值,第二个将再次读取记录并输出选定的观察值。

You can use the NOTSORTED keyword on the BY statement to have SAS calculate the FIRST.FLAG variable.您可以在 BY 语句上使用 NOTSORTED 关键字让 SAS 计算 FIRST.FLAG 变量。

data have;
  input ID TIME FLAG;
cards;
1   2    1
1   3    1
1   4    1
1   5    0
1   6    1
1   7    0
1   8    1
1   9    1
1  10    1
2   2    0
2   3    1
2   4    1
2   5    1
2   6    1
2   7    1
;

data want;
  do obs=1 by 1 until(last.id);
    set have;
    by id flag notsorted;
    if first.flag then want=obs;
  end;
  do obs=1 to obs;
    set have;
    if obs=want then output;
  end;
  drop obs want;
run;

Loop through the dataset by id.按 id 遍历数据集。 Use the lag function to look at the current and previous value of flag.使用滞后函数查看标志的当前值和先前值。 If the current value is 1 and the previous value is 0, or it's the first observation for that ID, write the value of time to a retained variable.如果当前值为 1 而前一个值为 0,或者它是该 ID 的第一个观察值,则将时间值写入保留变量。 Only output the last observation for each id.只输出每个 id 的最后一次观察。 The retained variable should contain the time of the first flagged observation of the last flagged group:保留变量应包含最后一个标记组的第一个标记观察的时间:

data result;
 set have;
 by id;
 retain firstflagged;
 prevflag = lag(flag);
 if first.id and flag = 1 then firstflagged = time;
 else if first.id and flag = 0 then firstflagged = .;
 else if flag = 1 and prevflag = 0 then firstflagged = time;
 if last.id then output;
 keep id firstflagged flag;
 rename firstflagged = time;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM