[英]Dropping observations after a certain condition is met in SAS
This is an extension of an earlier question.这是之前问题的延伸。 ( Drop observations once condition is met by multiple variables ). ( 一旦条件满足多个变量,就删除观察值)。
I have the following data and used one of the existing answered questions to solve my data problem but could not get what I want.我有以下数据并使用现有的回答问题之一来解决我的数据问题,但无法得到我想要的。 Here is what I have in my data这是我的数据中的内容
Have:有:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/3/2019 Fee 10
103 2/4/2019 REF1 Y 10
103 2/5/2019 Fee 10
Want:想:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/3/2019 Fee 10
103 2/4/2019 REF1 Y 10
I tried the following我尝试了以下
data want;
_max_n_with_Y = 1e12;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if flag='Y' then _max_n_with_Y = _n_;
end;
do _n_ = 1 to _n_;
set have;
if _n_ <= _max_n_with_Y then OUTPUT;
end;
drop _:;
run;
Any help is appreciated.任何帮助表示赞赏。
Thanks谢谢
The important 'landmark' is the row with flag='Y'
重要的“地标”是flag='Y'
The extra criteria for outputting rows post-landmark complicate the state machine being coded to track (or compute) the row number ( _n_
) for last output of the group.在标记后输出行的额外标准使被编码以跟踪(或计算)组最后输出的行号 ( _n_
) 的状态机复杂化。
The row='Y'
state is easily known. row='Y'
状态很容易知道。 Unconditional use of LAG
can be used to examine the post-Y state.无条件使用LAG
可用于检查后 Y 状态。 SAS IF
statements do not have short circuit evaluation, so as long as the LAG
is not in a subordinate THEN
clause, the LAG
stacks will be appropriate for the task. SAS IF
语句没有短路评估,因此只要LAG
不在从属THEN
子句中, LAG
堆栈将适用于任务。
Example:例子:
data have;
attrib
id format=4.
date informat=mmddyy10. format=mmddyy10.
evt_type length=$4
flag length=$1
amt1 amt2 format=4.
;
input
id Date Evt_Type Flag Amt1 Amt2; datalines;
101 2/2/2019 Fee . 5 .
101 2/3/2019 REF1 Y . 5
101 2/4/2019 Fee . 10 .
101 2/6/2019 REF2 Y . 10
101 2/7/2019 Fee . 4 .
101 2/8/2019 REF1 . . .
102 2/2/2019 Fee . 25 .
102 2/2/2019 REF1 N 25 .
103 2/3/2019 Fee . 10 .
103 2/4/2019 REF1 Y . 10
103 2/5/2019 Fee . 10 .
;
data want;
_y_n = 1e12;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if flag='Y' then _y_n = _n_;
/* rule: post Y output of two rows should only occur once, and at the rows
* immediately succeeding the Y row
*/
if _n_ = _y_n + 2 /* is this row 2 after a Y */
and lag(evt_type) = 'Fee' /* is first row after Y Fee */
and evt_type =: 'REF' /* is second row after Y REF# */
then
_upto_n = _n_;
end;
_upto_n = max (_upto_n, _y_n);
do _n_ = 1 to _n_;
set have;
if _n_ <= _upto_n then OUTPUT;
end;
drop _:;
run;
Note, regarding:注意,关于:
if _n_ = _y_n + 2 /* is this row 2 after a Y */
and lag(evt_type) = 'Fee' /* is first row after Y Fee */
and evt_type =: 'REF' /* is second row after Y REF# */
then
_upto_n = _n_;
For the row 2 after Y,对于 Y 之后的第 2 行,
LAG2(<var>) is the <var> value from the Y row
LAG (<var>) is the <var> value from the Y row+1
<var> is the <var> value from the Y row+2, which is the current row
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.