计算不同的分区

Question

I am trying to do a distinct count of names partitioned over their roles.我正在尝试对按角色划分的名称进行不同的计数。 So, in the example below: I have a table with the names and the person's role.因此，在下面的示例中：我有一个包含姓名和人员角色的表格。

I would like a role count column that gives the total number of distinct people in that role.我想要一个角色计数列，它给出该角色中不同人员的总数。 For example, the role manager comes up four times but there are only 3 distinct people for that role - Sam comes up again on a different date.例如，角色经理出现了四次，但该角色只有 3 个不同的人 - Sam 在不同的日期再次出现。

If I remove the date column, it works fine using:如果我删除日期列，它可以正常使用：

select
a.date,
a.Name,
a.Role,
count(a.Role) over (partition by a.Role) as Role_Count

from table a

group by a.date, a.name, a.role

Including the date column then makes it count the total roles rather than by distinct name (which I know I haven't identified in the partition).包括日期列然后使它计算总角色而不是按不同的名称（我知道我没有在分区中标识）。 Giving 4 managers and 3 analysts.给4个经理和3个分析师。

How do I fix this?我该如何解决？

Desired output:所需的 output：

Date日期	Name姓名	Role角色	Role_Count Role_Count
01/01 01/01	Sam山姆	Manager经理	3 3
02/01 02/01	Sam山姆	Manager经理	3 3
01/01 01/01	John约翰	Manager经理	3 3
01/01 01/01	Dan担	Manager经理	3 3
01/01 01/01	Bob鲍勃	Analyst分析师	2 2
02/01 02/01	Bob鲍勃	Analyst分析师	2 2
01/01 01/01	Mike麦克风	Analyst分析师	2 2

Current output:当前 output：

Date日期	Name姓名	Role角色	Role_Count Role_Count
01/01 01/01	Sam山姆	Manager经理	4 4
02/01 02/01	Sam山姆	Manager经理	4 4
01/01 01/01	John约翰	Manager经理	4 4
01/01 01/01	Dan担	Manager经理	4 4
01/01 01/01	Bob鲍勃	Analyst分析师	3 3
02/01 02/01	Bob鲍勃	Analyst分析师	3 3
01/01 01/01	Mike麦克风	Analyst分析师	3 3

Answer 1

Unfortunately, COUNT(DISTINCT is not available as a window aggregate. But we can use a combination of DENSE_RANK and MAX to simulate it:不幸的是， COUNT(DISTINCT不能用作 window 聚合。但我们可以使用DENSE_RANK和MAX的组合来模拟它：

select

a.Name,
a.Role,
MAX(rnk) OVER (PARTITION BY date, Role) as Role_Count

from (
    SELECT *,
        DENSE_RANK() OVER (PARTITION BY date, Role ORDER BY Name) AS rnk
    FROM table
) a

If Name may have nulls then we need to take that into account:如果Name可能有空值，那么我们需要考虑到这一点：

select

a.Name,
a.Role,
MAX(CASE WHEN Name IS NOT NULL THEN rnk END) OVER (PARTITION BY date, Role) as Role_Count

from (
    SELECT *,
        DENSE_RANK() OVER (PARTITION BY date, Role, CASE WHEN Name IS NULL THEN 0 ELSE 1 END ORDER BY Name) AS rnk
    FROM table
) a

Answer 2

Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT) as a window function.不幸的是，SQL 服务器（以及其他数据库）不支持COUNT(DISTINCT)作为 window function。 Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK() s minus one:幸运的是，有一个简单的技巧可以解决这个问题 - DENSE_RANK()的总和减去一：

select a.Name, a.Role,
       (dense_rank() over (partition by a.Role order by a.Name asc) +
        dense_rank() over (partition by a.Role order by a.Name desc) -
        1
       ) as distinct_names_in_role
from table a
group by a.name, a.role

计算不同的分区

问题描述

2 个解决方案

解决方案1
2 2021-02-24 10:23:54

解决方案2
2 已采纳 2021-02-24 12:04:35

计算不同的分区

问题描述

2 个解决方案

解决方案1 2 2021-02-24 10:23:54

解决方案2 2 已采纳 2021-02-24 12:04:35

解决方案1
2 2021-02-24 10:23:54

解决方案2
2 已采纳 2021-02-24 12:04:35