简体   繁体   English

在 SAS 中满足特定条件后删除观察

[英]Dropping observations after a certain condition is met in SAS

This is an extension of an earlier question.这是之前问题的延伸。 ( Drop observations once condition is met by multiple variables ). 一旦条件满足多个变量,就删除观察值)。

I have the following data and used one of the existing answered questions to solve my data problem but could not get what I want.我有以下数据并使用现有的回答问题之一来解决我的数据问题,但无法得到我想要的。 Here is what I have in my data这是我的数据中的内容

  • Amt1 is populated when the Evt_type is Fee当 Evt_type 为 Fee 时填充 Amt1
  • Amt2 is populated when the Evt_type is REF1/REF2当 Evt_type 为 REF1/REF2 时填充 Amt2
  • I don't want to display any observations after the last Flag='Y'我不想在最后一个 Flag='Y' 之后显示任何观察结果
  • If there is no Flag='Y' then I want all the observations for that id (eg id=102)如果没有 Flag='Y' 那么我想要该 id 的所有观察结果(例如 id=102)
  • I want to display if the next row for that id is a Fee followed by REF1/REF2 after flag='Y' (eg id=101) However I don't want if there is no REF1/REF2 (egid=103)我想显示该 id 的下一行是否是在 flag='Y' 之后是 REF1/REF2 的费用(例如 id=101)但是我不想要如果没有 REF1/REF2(egid=103)

Have:有:

   id   Date        Evt_Type   Flag   Amt1   Amt2
  101  2/2/2019      Fee              5
  101  2/3/2019      REF1      Y             5
  101  2/4/2019      Fee              10
  101  2/6/2019      REF2      Y             10
  101  2/7/2019      Fee               4
  101  2/8/2019      REF1
  102  2/2/2019      Fee              25
  102  2/2/2019      REF1      N      25
  103  2/3/2019      Fee              10
  103  2/4/2019      REF1      Y             10
  103  2/5/2019      Fee              10

Want:想:

  id   Date        Evt_Type   Flag   Amt1   Amt2
 101  2/2/2019      Fee              5
 101  2/3/2019      REF1      Y             5
 101  2/4/2019      Fee              10
 101  2/6/2019      REF2      Y             10
 101  2/7/2019      Fee               4
 101  2/8/2019      REF1
 102  2/2/2019      Fee              25
 102  2/2/2019      REF1      N      25
 103  2/3/2019      Fee              10
 103  2/4/2019      REF1      Y             10

I tried the following我尝试了以下

data want;
  _max_n_with_Y = 1e12;

  do _n_ = 1 by 1 until (last.id);
    set have;
    by id;
  if flag='Y' then _max_n_with_Y = _n_;
 end;

  do _n_ = 1 to _n_;
   set have;
   if _n_ <= _max_n_with_Y then OUTPUT;
  end;
 drop _:;
run;

Any help is appreciated.任何帮助表示赞赏。

Thanks谢谢

The important 'landmark' is the row with flag='Y'重要的“地标”是flag='Y'

The extra criteria for outputting rows post-landmark complicate the state machine being coded to track (or compute) the row number ( _n_ ) for last output of the group.标记后输出行的额外标准使被编码以跟踪(或计算)组最后输出的行号 ( _n_ ) 的状态机复杂化。

The row='Y' state is easily known. row='Y'状态很容易知道。 Unconditional use of LAG can be used to examine the post-Y state.无条件使用LAG可用于检查后 Y 状态。 SAS IF statements do not have short circuit evaluation, so as long as the LAG is not in a subordinate THEN clause, the LAG stacks will be appropriate for the task. SAS IF语句没有短路评估,因此只要LAG不在从属THEN子句中, LAG堆栈将适用于任务。

Example:例子:

data have;
attrib
  id format=4.
  date informat=mmddyy10. format=mmddyy10.
  evt_type length=$4
  flag length=$1
  amt1 amt2 format=4.
;
input
   id   Date        Evt_Type   Flag   Amt1   Amt2; datalines;
  101  2/2/2019      Fee       .      5      .
  101  2/3/2019      REF1      Y      .      5
  101  2/4/2019      Fee       .      10     .
  101  2/6/2019      REF2      Y      .      10
  101  2/7/2019      Fee       .       4     .
  101  2/8/2019      REF1      .      .      .
  102  2/2/2019      Fee       .      25     .
  102  2/2/2019      REF1      N      25     .
  103  2/3/2019      Fee       .      10     .
  103  2/4/2019      REF1      Y      .      10
  103  2/5/2019      Fee       .      10     .
;

data want;
  _y_n = 1e12;

  do _n_ = 1 by 1 until (last.id);
    set have;
    by id;

    if flag='Y' then _y_n = _n_;

    /* rule: post Y output of two rows should only occur once, and at the rows
     * immediately succeeding the Y row
     */
    if _n_ = _y_n + 2            /* is this row 2 after a Y */
      and lag(evt_type) = 'Fee'  /* is first row after Y Fee */
      and evt_type =: 'REF'      /* is second row after Y REF# */
    then 
      _upto_n = _n_;
  end;

  _upto_n = max (_upto_n, _y_n);

  do _n_ = 1 to _n_;
   set have;
   if _n_ <= _upto_n then OUTPUT;
  end;

  drop _:;
run;

Note, regarding:注意,关于:

if _n_ = _y_n + 2            /* is this row 2 after a Y */
  and lag(evt_type) = 'Fee'  /* is first row after Y Fee */
  and evt_type =: 'REF'      /* is second row after Y REF# */
then 
  _upto_n = _n_;

For the row 2 after Y,对于 Y 之后的第 2 行,

  LAG2(<var>) is the <var> value from the Y row
  LAG (<var>) is the <var> value from the Y row+1
       <var>  is the <var> value from the Y row+2, which is the current row

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM