简体   繁体   English

SAS删除组内满足条件的观测值

[英]SAS to delete observations that meet condition within group

I want to delete records in the Have dataset which meets all the following conditions.我想删除Have数据集中满足以下所有条件的记录。 ID_num here stands for the 3-digit part of the ID field这里的ID_num代表ID字段的3位数字部分

  • ID = Mxxx ID = Mxxx
  • Type = blood类型 = 血
  • located prior to any of the following records WITHIN EACH GROUP OF ( ID_num , drug) .位于WITHIN EACH GROUP OF ( ID_num , drug)的以下任何记录之前。
    • ID=Mxxx and Type=milk ID=Mxxx 和类型=牛奶
    • ID=Infxxx ID=Infxxx

Below are Have and the desired output.下面是Have和所需的输出。

data have;
     input ID $ Type $ Drug $;
     cards;
M001    blood A
M001    blood A
M001    blood A
M001    blood B
M001    blood B
M001    milk  B
M001    blood C
M001    blood C
M002    blood A
M002    blood A
Inf002  blood A
M002    blood A
M002    blood B
M002    milk  C
Inf003  blood B
M003    blood B
;
run;
data want;
     input ID $ Type $ Drug $;
     cards;
M001    milk   B
Inf002  blood  A
M002    blood  A
M002    milk   C
Inf003  blood  B
M003    blood  B
;
run;

For example, the M002 (blood, drug A) that is under the inf002 drug A observation stays because it occurs after an infant sample in the same drug group.例如,inf002 药物 A 观察下的 M002(血液,药物 A)保留,因为它发生在同一药物组中的婴儿样本之后。 But two M002 (blood, A) observations above it should get deleted as they occur before the first infant sample in same drug group.但是上面的两个 M002(血液,A)观察结果应该被删除,因为它们出现在同一药物组中的第一个婴儿样本之前。 Conversely, the two M001 (blood, C) observations following M001 (milk, B) should be deleted as the drug groups are different.相反,由于药物组不同,应删除 M001(牛奶,B)之后的两个 M001(血液,C)观察结果。

Edit: group by ( gp , Drug ).编辑: group by ( gp , Drug )。

Keys钥匙

  1. Extract the ID grouping number ( gp in the code) using SAS regex ( prxmatch(patt, var) here).使用SAS 正则表达式prxmatch(patt, var) )提取ID分组号(代码中的gp )。

  2. The keep condition can be examined row-by-row while also grouped by ( gp , Drug ).可以逐行检查保持条件,同时也可以按 ( gp , Drug ) 分组。 A change in gp is identified by FIRST.drug . gp的变化由FIRST.drug识别。

    • The dataset must be sorted before the use of BY statement.在使用BY语句之前必须对数据集进行排序。 Since SAS sorting is stable, the original ordering won't break.由于 SAS 排序是稳定的,因此原始排序不会中断。
    • The original ordering can be tracked by recording _n_ in the regex parsing phase.可以通过在正则表达式解析阶段记录_n_来跟踪原始排序。

Code代码

* "have" is in your post;
data tmp;
    set have;
    pos = prxmatch('(\d{3})', ID);
    gp = substr(ID, pos, pos+2);  * group number;
    mi = substr(ID, 1, 1);  * mother or infant;
    n = _n_; * keep track of the original ordering;
    drop pos;
run;

proc sort data=tmp out=tmp;
    by gp drug;
run;

data want(drop=flag_keep gp mi);
    set tmp;
    by gp drug;
    * state variables;
    retain flag_keep 0;
    if FIRST.drug then flag_keep = 0;
    * mark keep;
    if (flag_keep = 1) or (mi = "I") or ((mi = "M") and (Type = "milk"))
        then flag_keep = 1;
    if flag_keep = 1 then output;
run;

proc sort data=want out=want;
    by n;
run;

Result: the original row number n is shown for clarity.结果:为清楚起见,显示了原始行号n

   ID      Type   Drug  n
1  M001    milk   B     6    
2  Inf002  blood  A     11    
3  M002    blood  A     12    
4  M002    milk   C     14    
5  Inf003  blood  B     15    
6  M003    blood  B     16

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM