I'm looking for a solution for following problem. I'm using SAS, therefore a basic SQL or Datastep approach is both welcomed. Maybe the solution is simple, but I'm kinda new to SAS and can't find a solution.
I got a dataset and want to remove a subgroup on second level by a condition. So for making it easier, let me explain on an example. The condition is: When any value in ColC is 1, then remove the subgroup in the maingroup. The main group is ColA and the subgroup is ColB
ColA | ColB | ColC
1 | a | 0
1 | a | 1
1 | b | 0
1 | b | 0
2 | a | 0
2 | a | 0
2 | b | 0
2 | b | 0
3 | a | 0
3 | a | 0
3 | b | 1
3 | b | 0
Expected output:
ColA | ColB | ColC
1 | b | 0
1 | b | 0
2 | a | 0
2 | a | 0
2 | b | 0
2 | b | 0
3 | a | 0
3 | a | 0
I tried approaches like:
select * from data
group by ColA, ColB having ColC <> 1
Which I thought, will group by the two columns and select all groups without ColC= 1. But it "removes" only the rows with ColC=1.
Another approach is something like this:
select * from data
where ColA in (select ColA from data where ColC <> 1)
But of course, I can't reach the subgroups with this. I also was thinking about a join, but not sure how to do it.
You can use not exists
with a correlated subquery:
select d.*
from data d
where not exists (select 1
from data d2
where d2.cola = d.cola and d2.colb = d.colb and d2.colc = 1
);
This keeps all combinations of cola
/ colb
that do not have a 1
in colc
.
This can also be adapted to a delete
, but you seem to want a filtered result set.
The having
clause in SQL will allow you filter a query by a summary function. The below query says to only include output where the sum of ColC
is 0 after grouping by ColA
and ColB
.
proc sql noprint;
create table want as
select *
from have
group by ColA, ColB
having sum(ColC) = 0
;
quit;
Here is a data step approach using a double DoW loop
data have;
input ColA ColB $ ColC;
infile datalines dlm='|';
datalines;
1 | a | 0
1 | a | 1
1 | b | 0
1 | b | 0
2 | a | 0
2 | a | 0
2 | b | 0
2 | b | 0
3 | a | 0
3 | a | 0
3 | b | 1
3 | b | 0
;
data want (drop=c);
c = 1;
do _n_ = 1 by 1 until (last.ColB);
set have;
by ColA ColB;
if ColC = 1 then c = 0;
end;
do _n_ = 1 to _n_;
set have;
if c then output;
end;
run;
A simple way to do it with common code:
proc sort data=have;
by cola colb;
data want;
merge have (in=in1 where=(colc=1))
have (in=in2)
;
by cola colb;
if ^in1;
run;
The first HAVE selects all records with COLC=1, and since we are merging by COLA and COLB the IF statement will remove all records with the same COLA and COLB, which is the goal.
Also, a Hash Object approach
data want;
if _n_ = 1 then do;
declare hash h (dataset : 'have(where=(ColC=1))');
h.definekey ('ColA', 'ColB');
h.definedone();
end;
set have;
if h.check();
run;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.