[英]SAS to delete observations that meet condition within group
I want to delete records in the Have
dataset which meets all the following conditions.我想删除
Have
数据集中满足以下所有条件的记录。 ID_num here stands for the 3-digit part of the ID
field这里的ID_num代表
ID
字段的3位数字部分
Below are Have
and the desired output.下面是
Have
和所需的输出。
data have;
input ID $ Type $ Drug $;
cards;
M001 blood A
M001 blood A
M001 blood A
M001 blood B
M001 blood B
M001 milk B
M001 blood C
M001 blood C
M002 blood A
M002 blood A
Inf002 blood A
M002 blood A
M002 blood B
M002 milk C
Inf003 blood B
M003 blood B
;
run;
data want;
input ID $ Type $ Drug $;
cards;
M001 milk B
Inf002 blood A
M002 blood A
M002 milk C
Inf003 blood B
M003 blood B
;
run;
For example, the M002 (blood, drug A) that is under the inf002 drug A observation stays because it occurs after an infant sample in the same drug group.例如,inf002 药物 A 观察下的 M002(血液,药物 A)保留,因为它发生在同一药物组中的婴儿样本之后。 But two M002 (blood, A) observations above it should get deleted as they occur before the first infant sample in same drug group.
但是上面的两个 M002(血液,A)观察结果应该被删除,因为它们出现在同一药物组中的第一个婴儿样本之前。 Conversely, the two M001 (blood, C) observations following M001 (milk, B) should be deleted as the drug groups are different.
相反,由于药物组不同,应删除 M001(牛奶,B)之后的两个 M001(血液,C)观察结果。
Edit: group by ( gp
, Drug
).编辑: group by (
gp
, Drug
)。
Extract the ID
grouping number ( gp
in the code) using SAS regex ( prxmatch(patt, var)
here).使用SAS 正则表达式(
prxmatch(patt, var)
)提取ID
分组号(代码中的gp
)。
The keep condition can be examined row-by-row while also grouped by ( gp
, Drug
).可以逐行检查保持条件,同时也可以按 (
gp
, Drug
) 分组。 A change in gp
is identified by FIRST.drug
. gp
的变化由FIRST.drug
识别。
BY
statement.BY
语句之前必须对数据集进行排序。 Since SAS sorting is stable, the original ordering won't break._n_
in the regex parsing phase._n_
来跟踪原始排序。* "have" is in your post;
data tmp;
set have;
pos = prxmatch('(\d{3})', ID);
gp = substr(ID, pos, pos+2); * group number;
mi = substr(ID, 1, 1); * mother or infant;
n = _n_; * keep track of the original ordering;
drop pos;
run;
proc sort data=tmp out=tmp;
by gp drug;
run;
data want(drop=flag_keep gp mi);
set tmp;
by gp drug;
* state variables;
retain flag_keep 0;
if FIRST.drug then flag_keep = 0;
* mark keep;
if (flag_keep = 1) or (mi = "I") or ((mi = "M") and (Type = "milk"))
then flag_keep = 1;
if flag_keep = 1 then output;
run;
proc sort data=want out=want;
by n;
run;
Result: the original row number n
is shown for clarity.结果:为清楚起见,显示了原始行号
n
。
ID Type Drug n
1 M001 milk B 6
2 Inf002 blood A 11
3 M002 blood A 12
4 M002 milk C 14
5 Inf003 blood B 15
6 M003 blood B 16
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.