[英]SAS - Find number of observations within multiple BY groups and delete specific observations
I want to try to find a way to count the number of observations within multiple (more than two) BY group variables.我想尝试找到一种方法来计算多个(超过两个)BY 组变量中的观察次数。 After which, I wish to delete observations pertaining to ones whose count is less than two.
之后,我希望删除与计数小于 2 的观察值有关的观察值。 Here is an example of what I am trying to do:
这是我正在尝试做的一个例子:
VAR1 VAR2 VAR3
a a 1
a a 2
a b 1
a b 2
b a 1
b a 2
b b 1
b b 2
c a 1
c b 1
d a 1
Over here, I would like to make sure that there are exactly two distinct values of VAR3, with respect to VAR1 and VAR2.在这里,我想确保 VAR3 恰好有两个不同的值,相对于 VAR1 和 VAR2。
In this example, you can see that I want to delete the last three observations as there is one value per VAR1/VAR2 pair.在此示例中,您可以看到我想删除最后三个观察值,因为每个 VAR1/VAR2 对有一个值。
Was there a simple way to do this?有没有一种简单的方法可以做到这一点?
I have tried:我试过了:
data want;
set have;
by VAR1 VAR2 VAR3;
if first.VAR3 = last.VAR3 then delete;
run;
But that did not work as it deleted observations with the same VAR3 within the same VAR1.但这不起作用,因为它删除了在同一个 VAR1 中具有相同 VAR3 的观察。 I need to help building something more robust.
我需要帮助构建更强大的东西。
In the end, I want this:最后,我想要这个:
VAR1 VAR2 VAR3
a a 1
a a 2
a b 1
a b 2
b a 1
b a 2
b b 1
b b 2
Would appreciate any help.将不胜感激任何帮助。 Thank you.
谢谢你。
EDIT:编辑:
Giving extra clarity for what I need.为我需要的东西提供额外的清晰度。 I'd like to check if VAR3 contains both of the values 1 AND 2 for each combination of VAR1 and VAR2 present.
我想检查 VAR3 是否包含 VAR1 和 VAR2 的每个组合的值 1 和 2 。 Else delete if the entry if it contains only one of the values or none.
如果条目仅包含其中一个值或不包含任何值,则删除。
Thank you.谢谢你。
Since your condition depends on all of the values in the VAR1*VAR2 group you probably want to use a double DOW loop.由于您的条件取决于 VAR1*VAR2 组中的所有值,您可能希望使用双 DOW 循环。 In the first loop calculate flags and in the second loop use those to decide which observations to write.
在第一个循环中计算标志,在第二个循环中使用它们来决定要写入哪些观察结果。
data have;
input VAR1 $ VAR2 $ VAR3 @@;
cards;
a a 1 a a 2 a b 1 a b 2 b a 1 b a 2 b b 1 b b 2 c a 1 c b 1 d a 1
;
data want;
do until(last.var2);
set have;
by VAR1 VAR2 VAR3;
if var3=1 then any1=1;
else if var3=2 then any2=1;
else anyother=1;
end;
do until(last.var2);
set have;
by VAR1 VAR2 VAR3;
if any1 and any2 and not anyother then output;
end;
drop any1 any2 anyother;
run;
something like this.像这样的东西。
data have;
input
VAR1 $ VAR2 $;
datalines;
a a
a a
a b
a b
b a
b a
b b
b b
c a
c b
d a
;
proc sort data=have ;
by var1 var2;
run;
data want;
set have;
by var1 var2;
if first.var1 or first.var2 then var3=1;
else var3+1;
if (first.var1 and last.var1) or (first.var2 and last.var2) then delete;
run;
proc print;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.