[英]Count distinct over partition by
I am trying to do a distinct count of names partitioned over their roles.我正在尝试对按角色划分的名称进行不同的计数。 So, in the example below: I have a table with the names and the person's role.
因此,在下面的示例中:我有一个包含姓名和人员角色的表格。
I would like a role count column that gives the total number of distinct people in that role.我想要一个角色计数列,它给出该角色中不同人员的总数。 For example, the role manager comes up four times but there are only 3 distinct people for that role - Sam comes up again on a different date.
例如,角色经理出现了四次,但该角色只有 3 个不同的人 - Sam 在不同的日期再次出现。
If I remove the date column, it works fine using:如果我删除日期列,它可以正常使用:
select
a.date,
a.Name,
a.Role,
count(a.Role) over (partition by a.Role) as Role_Count
from table a
group by a.date, a.name, a.role
Including the date column then makes it count the total roles rather than by distinct name (which I know I haven't identified in the partition).包括日期列然后使它计算总角色而不是按不同的名称(我知道我没有在分区中标识)。 Giving 4 managers and 3 analysts.
给4个经理和3个分析师。
How do I fix this?我该如何解决?
Desired output:所需的 output:
Date![]() |
Name![]() |
Role![]() |
Role_Count ![]() |
---|---|---|---|
01/01 ![]() |
Sam![]() |
Manager![]() |
3 ![]() |
02/01 ![]() |
Sam![]() |
Manager![]() |
3 ![]() |
01/01 ![]() |
John![]() |
Manager![]() |
3 ![]() |
01/01 ![]() |
Dan![]() |
Manager![]() |
3 ![]() |
01/01 ![]() |
Bob![]() |
Analyst![]() |
2 ![]() |
02/01 ![]() |
Bob![]() |
Analyst![]() |
2 ![]() |
01/01 ![]() |
Mike![]() |
Analyst![]() |
2 ![]() |
Current output:当前 output:
Date![]() |
Name![]() |
Role![]() |
Role_Count ![]() |
---|---|---|---|
01/01 ![]() |
Sam![]() |
Manager![]() |
4 ![]() |
02/01 ![]() |
Sam![]() |
Manager![]() |
4 ![]() |
01/01 ![]() |
John![]() |
Manager![]() |
4 ![]() |
01/01 ![]() |
Dan![]() |
Manager![]() |
4 ![]() |
01/01 ![]() |
Bob![]() |
Analyst![]() |
3 ![]() |
02/01 ![]() |
Bob![]() |
Analyst![]() |
3 ![]() |
01/01 ![]() |
Mike![]() |
Analyst![]() |
3 ![]() |
Unfortunately, COUNT(DISTINCT
is not available as a window aggregate. But we can use a combination of DENSE_RANK
and MAX
to simulate it:不幸的是,
COUNT(DISTINCT
不能用作 window 聚合。但我们可以使用DENSE_RANK
和MAX
的组合来模拟它:
select
a.Name,
a.Role,
MAX(rnk) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role ORDER BY Name) AS rnk
FROM table
) a
If Name
may have nulls then we need to take that into account:如果
Name
可能有空值,那么我们需要考虑到这一点:
select
a.Name,
a.Role,
MAX(CASE WHEN Name IS NOT NULL THEN rnk END) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role, CASE WHEN Name IS NULL THEN 0 ELSE 1 END ORDER BY Name) AS rnk
FROM table
) a
Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT)
as a window function.不幸的是,SQL 服务器(以及其他数据库)不支持
COUNT(DISTINCT)
作为 window function。 Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK()
s minus one:幸运的是,有一个简单的技巧可以解决这个问题 -
DENSE_RANK()
的总和减去一:
select a.Name, a.Role,
(dense_rank() over (partition by a.Role order by a.Name asc) +
dense_rank() over (partition by a.Role order by a.Name desc) -
1
) as distinct_names_in_role
from table a
group by a.name, a.role
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.