在SQL的两个列上计数不相同

Question

Let's consider this example : 让我们考虑这个例子：

Employee     Function   Start_dept   End_dept
A               dev          10        13
A               dev          11        12
A               test          9         9
A               dev          13        11

What I want to select is employee, their function and the distinct departments in BOTH "start" and "end" department. 我要选择的是员工，他们的职能以及“开始”和“结束”部门中的不同部门。 It will give this result : 它将得到以下结果：

Employee     Function  count_distinct_dept
A                 dev          4
A                 test         1            `

For the dev A, we have only 4 distinct departments (10, 11, 12 and 13) because we shouldn't count duplicate values in the 2 columns (start and end). 对于开发人员A，我们只有4个不同的部门（10、11、12和13），因为我们不应该在2列（开始和结束）中计算重复的值。

How can I do this ? 我怎样才能做到这一点？ (I'm using mySQL). （我正在使用mySQL）。 Is it possible to do this on one request without any JOIN or any UNION ? 是否可以在没有任何JOIN或UNION的情况下按一个请求执行此操作？ Or is it obligatory to use one of them ? 还是必须使用其中之一？ Since I am using a huge database (with more than 3 billions lines), I am not sure if a join or union request will be optimal... 由于我使用的是庞大的数据库（超过30亿行），因此我不确定联接或联合请求是否是最佳选择...

Answer 1

Use a union all and aggregation: 使用union all和聚合：

select Employee, Function, count(distinct dept)
from ((select Employee, Function, Start_dept as dept
       from e
      ) union all
      (select  Employee, Function, End_dept
       from e
      )
     ) e
group by Employee, Function;

If you want performance, I would suggest starting with two indexes on (Employee, Function, Start_Dept) and (Employee, Function, End_Dept) . 如果要提高性能，建议从(Employee, Function, Start_Dept)和(Employee, Function, End_Dept)上的两个索引开始。 Then: 然后：

select Employee, Function, count(distinct dept)
from ((select distinct Employee, Function, Start_dept as dept
       from e
      ) union all
      (select distinct Employee, Function, End_dept
       from e
      )
     ) e
group by Employee, Function;

The subqueries should be scanning the index rather than the overall table. 子查询应扫描索引而不是整个表。 You will still need to do the final GROUP BY . 您仍然需要做最后的GROUP BY 。 I am guessing that COUNT(DISTINCT) is a better approach than UNION in the subquery, but you could test that. 我猜想在子查询中， COUNT(DISTINCT)比UNION更好，但是您可以测试一下。

在SQL的两个列上计数不相同

问题描述

1 个解决方案

解决方案1
2 2018-12-02 16:18:43

在SQL的两个列上计数不相同

问题描述

1 个解决方案

解决方案1 2 2018-12-02 16:18:43

解决方案1
2 2018-12-02 16:18:43