SAS删除组内满足条件的观测值

Question

I want to delete records in the Have dataset which meets all the following conditions.我想删除Have数据集中满足以下所有条件的记录。 ID_num here stands for the 3-digit part of the ID field这里的ID_num代表ID字段的3位数字部分

ID = Mxxx ID = Mxxx
Type = blood类型 = 血
located prior to any of the following records WITHIN EACH GROUP OF ( ID_num , drug) .位于WITHIN EACH GROUP OF ( ID_num , drug)的以下任何记录之前。
- ID=Mxxx and Type=milk ID=Mxxx 和类型=牛奶
- ID=Infxxx ID=Infxxx

Below are Have and the desired output.下面是Have和所需的输出。

data have;
     input ID $ Type $ Drug $;
     cards;
M001    blood A
M001    blood A
M001    blood A
M001    blood B
M001    blood B
M001    milk  B
M001    blood C
M001    blood C
M002    blood A
M002    blood A
Inf002  blood A
M002    blood A
M002    blood B
M002    milk  C
Inf003  blood B
M003    blood B
;
run;

data want;
     input ID $ Type $ Drug $;
     cards;
M001    milk   B
Inf002  blood  A
M002    blood  A
M002    milk   C
Inf003  blood  B
M003    blood  B
;
run;

For example, the M002 (blood, drug A) that is under the inf002 drug A observation stays because it occurs after an infant sample in the same drug group.例如，inf002 药物 A 观察下的 M002（血液，药物 A）保留，因为它发生在同一药物组中的婴儿样本之后。 But two M002 (blood, A) observations above it should get deleted as they occur before the first infant sample in same drug group.但是上面的两个 M002（血液，A）观察结果应该被删除，因为它们出现在同一药物组中的第一个婴儿样本之前。 Conversely, the two M001 (blood, C) observations following M001 (milk, B) should be deleted as the drug groups are different.相反，由于药物组不同，应删除 M001（牛奶，B）之后的两个 M001（血液，C）观察结果。

Answer 1

Edit: group by ( gp , Drug ).编辑： group by ( gp , Drug )。

Keys钥匙

Extract the ID grouping number ( gp in the code) using SAS regex ( prxmatch(patt, var) here).使用SAS 正则表达式（ prxmatch(patt, var) ）提取ID分组号（代码中的gp ）。
The keep condition can be examined row-by-row while also grouped by ( gp , Drug ).可以逐行检查保持条件，同时也可以按 ( gp , Drug ) 分组。 A change in gp is identified by FIRST.drug . gp的变化由FIRST.drug识别。
- The dataset must be sorted before the use of BY statement.在使用BY语句之前必须对数据集进行排序。 Since SAS sorting is stable, the original ordering won't break.由于 SAS 排序是稳定的，因此原始排序不会中断。
- The original ordering can be tracked by recording _n_ in the regex parsing phase.可以通过在正则表达式解析阶段记录_n_来跟踪原始排序。

Code代码

* "have" is in your post;
data tmp;
    set have;
    pos = prxmatch('(\d{3})', ID);
    gp = substr(ID, pos, pos+2);  * group number;
    mi = substr(ID, 1, 1);  * mother or infant;
    n = _n_; * keep track of the original ordering;
    drop pos;
run;

proc sort data=tmp out=tmp;
    by gp drug;
run;

data want(drop=flag_keep gp mi);
    set tmp;
    by gp drug;
    * state variables;
    retain flag_keep 0;
    if FIRST.drug then flag_keep = 0;
    * mark keep;
    if (flag_keep = 1) or (mi = "I") or ((mi = "M") and (Type = "milk"))
        then flag_keep = 1;
    if flag_keep = 1 then output;
run;

proc sort data=want out=want;
    by n;
run;

Result: the original row number n is shown for clarity.结果：为清楚起见，显示了原始行号n 。

   ID      Type   Drug  n
1  M001    milk   B     6    
2  Inf002  blood  A     11    
3  M002    blood  A     12    
4  M002    milk   C     14    
5  Inf003  blood  B     15    
6  M003    blood  B     16

SAS删除组内满足条件的观测值

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-21 23:59:56

Keys钥匙

Code代码

SAS删除组内满足条件的观测值

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-21 23:59:56

Keys钥匙

Code代码

解决方案1
1 已采纳 2020-10-21 23:59:56