简体   繁体   中英

Find rows that have same value in one column and other values in another column?

I have a PostgreSQL database that stores users in a users table and conversations they take part in a conversation table. Since each user can take part in multiple conversations and each conversation can involve multiple users, I have a conversation_user linking table to track which users are participating in each conversation:

# conversation_user
id  |  conversation_id | user_id
----+------------------+--------
1   |                1 |      32
2   |                1 |       3
3   |                2 |      32
4   |                2 |       3
5   |                2 |       4

In the above table, user 32 is having one conversation with just user 3 and another with both 3 and user 4. How would I write a query that would show that there is a conversation between just user 32 and user 3?

I've tried the following:

SELECT conversation_id AS cid,
       user_id
FROM conversation_user
GROUP BY cid HAVING count(*) = 2
AND (user_id = 32
     OR user_id = 3);

SELECT conversation_id AS cid,
   user_id
FROM conversation_user
GROUP BY (cid HAVING count(*) = 2
AND (user_id = 32
     OR user_id = 3));

SELECT conversation_id AS cid,
       user_id
FROM conversation_user
WHERE (user_id = 32)
  OR (user_id = 3)
GROUP BY cid HAVING count(*) = 2;

These queries throw an error that says that user_id must appear in the GROUP BY clause or be used in an aggregate function. Putting them in an aggregate function (eg MIN or MAX ) doesn't sound appropriate. I thought that my first two attempts were putting them in the GROUP BY clause.

What am I doing wrong?

This is a case of relational division . We have assembled an arsenal of techniques under this related question:

The special difficulty is to exclude additional users. There are basically 4 techniques.

I suggest LEFT JOIN / IS NULL :

SELECT cu1.conversation_id
FROM        conversation_user cu1
JOIN        conversation_user cu2 USING (conversation_id)
LEFT   JOIN conversation_user cu3 ON cu3.conversation_id = cu1.conversation_id
                                 AND cu3.user_id NOT IN (3,32)
WHERE  cu1.user_id = 32
AND    cu2.user_id = 3
AND    cu3.conversation_id IS NULL;

Or NOT EXISTS :

SELECT cu1.conversation_id
FROM   conversation_user cu1
JOIN   conversation_user cu2 USING (conversation_id)
WHERE  cu1.user_id = 32
AND    cu2.user_id = 3
AND NOT EXISTS (
   SELECT 1
   FROM   conversation_user cu3
   WHERE  cu3.conversation_id = cu1.conversation_id
   AND    cu3.user_id NOT IN (3,32)
   );

Both queries do not depend on a UNIQUE constraint for (conversation_id, user_id) , which may or may not be in place. Meaning, the query even works if user_id 32 (or 3) is listed more than once for the same conversation. You would get duplicate rows in the result, though, and need to apply DISTINCT or GROUP BY .
The only condition is the one you formulated:

... a query that would show that there is a conversation between just user 32 and user 3?

Audited query

The query you linked in the comment wouldn't work. You forgot to exclude other participants. Should be something like:

SELECT *  -- or whatever you want to return
FROM   conversation_user cu1
WHERE  cu1.user_id = 32
AND    EXISTS (
   SELECT 1
   FROM   conversation_user cu2
   WHERE  cu2.conversation_id = cu1.conversation_id 
   AND    cu2.user_id = 3
   )
AND NOT EXISTS (
   SELECT 1
   FROM   conversation_user cu3
   WHERE  cu3.conversation_id = cu1.conversation_id
   AND    cu3.user_id NOT IN (3,32)
   );

Which is similar to the other two queries, except that it will not return multiple rows if user_id = 3 is linked multiple times.

You can use conditional aggregation to select all cids that only have 2 specific particpants

select cid from conversation_user
group by cid
having count(*) = 2
and count(case when user_id not in (32,3) then 1 end) = 0

If (cid,user_id) is not unique then replace having count(*) = 2 with having count(distinct user_id) = 2

Because you want conversations with just 2 users, you can use a self outer join on other users and filter out hits:

To find all 2-user conversations and they're between:

SELECT
    a.conversation_id cid,
    a.user_id user_id_1,
    b.user_id user_id_2
FROM conversation_user a
JOIN conversation_user b ON b.cid = a.cid
  AND b.user_id > a.user_id
LEFT JOIN conversation_user c ON c.cid = a.cid
  AND c.user_id NOT IN (a.user_id, b.user_id)
WHERE c.cid IS NULL -- only return misses on join to others

To find all 2-user conversations for a particular user, just add:

AND a.user_id = 32

if you just want confirmation.

 select conversation_id 
   from  conversation_users 
   group by conversation_id
   having bool_and ( user_id in (3,32))
      and count(*) = 2;

if you want full details, you can use a window function and a CTE like this:

 with a as (
   select *
      ,not bool_and( user_id in (3,32) )
         over  ( partition by conversation_id) 
       and 2 = count(user_id)
         over  ( partition by conversation_id)
           as conv_candidates 
   from conversation_users 
   ) 
 select * from a where conv_candidates;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM