简体   繁体   中英

Improving query that counts distinct values that have particular values in another column

Say I have a table in the format:

| id | category|
|----|---------|
| 10 | A       |
| 10 | B       |  
| 10 | C       |
| 2  | C       |

I want to count the number of distinct id's that have all three values A, B, and C in the category variable. In this case, the query would return 1 since only for id = 10 is this true.

My intuition is to write the following query to get this value:

SELECT 
    COUNT(DISTINCT id), 
    SUM(CASE WHEN category = 'A' THEN 1 else 0 END) AS A,
    SUM(CASE WHEN category = 'B' THEN 1 else 0 END) AS B,
    SUM(CASE WHEN category = 'C' THEN 1 else 0 END) AS C
FROM 
    table 
GROUP BY 
    id
HAVING
    A >= 1
    AND 
    B >= 1
    AND
    C >= 1

This feels a bit overwrought though -- is there a simpler way to achieve the desired outcome?

I assume this is part of a larger table, your id and categories can appear multiple times and still be distinct due to other fields, and that you know how many categories you're looking for.

SELECT ID, COUNT(ID)
FROM(
SELECT DISTINCT ID, CATEGORY
FROM TABLE)
GROUP BY ID
HAVING COUNT(ID) = 3 --or however many categories you want

Your subquery here removes extraneous info and forces your id to show up once per category. You then count up the number of times it shows up and look up the ones that show up 3 or however many times you want.

You are close, but you need two levels of aggregation. Assuming no duplicate rows:

SELECT COUNT(*)
FROM (SELECT id
      FROM t
      WHERE Category IN ('A', 'B', 'C') 
      GROUP BY id
      HAVING COUNT(*) = 3
     ) t;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM