简体   繁体   中英

SQL: Retrieve users who are in exactly two groups

My table looks like the following:

user_id group probability
123 group1 0.9
123 group2 0.6
45 group2 0.8
567 group2 0.56
567 group3 0.78
567 group1 0.90

I need to extract users who are just in group 1 and 2, which means I only need to retrieve user 123. I have written my query like the following:

with two_groups as (
select user_id
from table1
where group in ('group1', 'group2')
group by 1
having max(group) <> min(group) and count(user_id) = 2
)
select *
from two_groups
join table1 using (user_id)

The reason that I am joining it back to the table1 is because I could not add the group and probability columns as a field in the "two_groups" subquery because I didn't want to group by them.

So, the problem is with the query I have, it still retrieves user id 567. However, I don't want it to be extracted because it is in group3 as well. What Can I do to just extract the users being in just exactly two groups?

Thank you!

You should try with a proper inner join

select *
from 
   two_groups t1 inner join 
   table1 t2 on t1.user_id = t2.user_id

Perhaps:

with two_groups as (
  select user_id
  from table1
  group by 1
  having min(group) = 'group1' and
         max(group) = 'group2' and
         count(distinct group) = 2
)
select *
from two_groups
join table1 using (user_id)

This method wouldn't extend to "member of exactly three groups" though.

Also:

select *
from   table1 t1
where  exists (select 1 from table1 t2 where t2.user_id = t1.user_id and t2.group = 'group1')
and    exists (select 1 from table1 t2 where t2.user_id = t1.user_id and t2.group = 'group2')
and    not exists (select 1 from table1 t2 where t2.user_id = t1.user_id and t2.group not in ('group1', 'group2'))

You can join what you have with select count( ), user_id from table group by user_id having count( ) = 2. That will give you those who are in exactly 2 groups.

Would this work? This first query is your original CTE, and should get all users that have a row in both groups The second query removed any users that also have a row in a group other than 1 or 2.

select user_id
from table1
where group in ('group1', 'group2')
group by user_id
having max(group) <> min(group)
    and count(user_id) = 2

EXCEPT DISTINCT

select user_id
from table1
where group not in ('group1', 'group2')

Consider below (BigQuery)

select *
from your_table
where true 
qualify 2 = countif(`group` in ('group1', 'group2')) over(partition by user_id) 
and 0 = countif(not `group` in ('group1', 'group2')) over(partition by user_id)          

if applied to sample data in your question - output is

在此处输入图像描述

might consider below

WITH TEMP_USER AS
(
    SELECT "123" AS USER_ID,"group1" AS GROUP_NAME, 0.9 AS PROBABILITY UNION ALL
    SELECT "123" AS USER_ID,"group2" AS GROUP_NAME, 0.6 AS PROBABILITY UNION ALL
    SELECT "45" AS USER_ID,"group2" AS GROUP_NAME, 0.8 AS PROBABILITY UNION ALL
    SELECT "567" AS USER_ID,"group2" AS GROUP_NAME, 0.56 AS PROBABILITY UNION ALL
    SELECT "567" AS USER_ID,"group3" AS GROUP_NAME, 0.78 AS PROBABILITY UNION ALL
    SELECT "567" AS USER_ID,"group1" AS GROUP_NAME, 0.90 AS PROBABILITY 

)

SELECT * FROM TEMP_USER U
WHERE EXISTS 
(
SELECT * FROM
    (
        SELECT USER_ID,STRING_AGG(DISTINCT GROUP_NAME ORDER BY GROUP_NAME) AS NAME 
        FROM TEMP_USER 
        GROUP BY USER_ID
        HAVING NAME="group1,group2"
    ) G 
WHERE U.USER_ID=G.USER_ID
);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM