简体   繁体   中英

Count distinct over partition by

I am trying to do a distinct count of names partitioned over their roles. So, in the example below: I have a table with the names and the person's role.

I would like a role count column that gives the total number of distinct people in that role. For example, the role manager comes up four times but there are only 3 distinct people for that role - Sam comes up again on a different date.

If I remove the date column, it works fine using:

select
a.date,
a.Name,
a.Role,
count(a.Role) over (partition by a.Role) as Role_Count

from table a

group by a.date, a.name, a.role

Including the date column then makes it count the total roles rather than by distinct name (which I know I haven't identified in the partition). Giving 4 managers and 3 analysts.

How do I fix this?

Desired output:

Date Name Role Role_Count
01/01 Sam Manager 3
02/01 Sam Manager 3
01/01 John Manager 3
01/01 Dan Manager 3
01/01 Bob Analyst 2
02/01 Bob Analyst 2
01/01 Mike Analyst 2

Current output:

Date Name Role Role_Count
01/01 Sam Manager 4
02/01 Sam Manager 4
01/01 John Manager 4
01/01 Dan Manager 4
01/01 Bob Analyst 3
02/01 Bob Analyst 3
01/01 Mike Analyst 3

Unfortunately, COUNT(DISTINCT is not available as a window aggregate. But we can use a combination of DENSE_RANK and MAX to simulate it:

select

a.Name,
a.Role,
MAX(rnk) OVER (PARTITION BY date, Role) as Role_Count

from (
    SELECT *,
        DENSE_RANK() OVER (PARTITION BY date, Role ORDER BY Name) AS rnk
    FROM table
) a

If Name may have nulls then we need to take that into account:

select

a.Name,
a.Role,
MAX(CASE WHEN Name IS NOT NULL THEN rnk END) OVER (PARTITION BY date, Role) as Role_Count

from (
    SELECT *,
        DENSE_RANK() OVER (PARTITION BY date, Role, CASE WHEN Name IS NULL THEN 0 ELSE 1 END ORDER BY Name) AS rnk
    FROM table
) a

Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT) as a window function. Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK() s minus one:

select a.Name, a.Role,
       (dense_rank() over (partition by a.Role order by a.Name asc) +
        dense_rank() over (partition by a.Role order by a.Name desc) -
        1
       ) as distinct_names_in_role
from table a
group by a.name, a.role

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM