SQL 当没有 where 或 having 子句时，服务器查询聚合不正确

Question

Edit: I've tried the first two solutions, but am still having this issue of the query returning correct results when looking at a single customer with a WHERE clause, but incorrectly for the same customer without it.编辑：我已经尝试了前两个解决方案，但在查看带有 WHERE 子句的单个客户时查询返回正确结果的问题仍然存在，但对于没有它的同一客户则不正确。 How could this be happening?这怎么可能发生？ What is going on under the hood that could lead to this?幕后发生了什么可能导致这种情况？

I am building a query to join and aggregate customer information on a big table, so I am starting out building the query with a where clause for a single customer to make sure the logic is working before implementing it on the population of customers.我正在构建一个查询以在一个大表上加入和聚合客户信息，因此我开始使用针对单个客户的 where 子句构建查询，以确保逻辑在对客户群体实施之前有效。

The tables I'm joining look something like this:我加入的表看起来像这样：

Table A:表一：

| customer | order_id |
----------------------
| abc      | 1       |
| abc      | 2       |
| xyz      | 3       |
| xyz      | 4       |
| xyz      | 5       |
| xyz      | 6       |
...

Table B:表乙：

| order_id | return_date   |
----------------------------
| 1        |       Mon     |
| 3        |       Tues    |
| 5        |       Wed     |
...

I need to aggregate these by the customer name and essentially count the number of times their info appears in each table.我需要按客户名称汇总这些信息，并计算他们的信息在每个表中出现的次数。

So the query looks something like this:所以查询看起来像这样：

SELECT 
  a.customer as customer_name
  ,COUNT(DISTINCT(a.order_id)) as total_orders
  ,COUNT(DISTINCT(B.order_id)) as num_returns
FROM B

RIGHT JOIN (
  SELECT 
    customer
    order_id
  FROM A
  ) as a

ON B.order_id = a.order_id
WHERE customer = 'xyz'
GROUP BY a.customer

This works perfectly when the where clause is present (also works with a HAVING customer = 'xyz' after the group by instead) But when I remove the where clause to apply this to the population of customers, the results are completely incorrect.当存在 where 子句时，这非常有效（也适用于 group by 之后的 HAVING customer = 'xyz'）但是当我删除 where 子句以将其应用于客户群体时，结果是完全不正确的。 How can I fix this to work for the population?我该如何解决这个问题才能为大众服务？

Answer 1

This query should work:此查询应该有效：

SELECT a.customer as customer_name,
       COUNT(DISTINCT a.order_id) as total_orders,
       COUNT(DISTINCT B.order_id) as num_returns
FROM A LEFT JOIN
     B
     ON B.order_id = a.order_id
WHERE a.customer = 'xyz'
GROUP BY a.customer;

If xyz has no rows in A , then this returns no rows.如果xyz在A中没有行，则不返回任何行。

Answer 2

I would recommend pre-aggregation on b , and a left join :我建议在b上进行预聚合和left join ：

select a.customer, count(*) total_orders, coalesce(sum(b.num_returns), 0) num_returns
from a
left join (
    select order_id, count(*) num_returns
    from b
    group by order_id
) b on b.order_id = a.order_id
group by a.customer

The results are consistent, regardless of whether a where clause is used or not.无论是否使用where子句，结果都是一致的。 Note that this assumes no duplicate (customer_id, order_id) in a , as showned in your sample data.请注意，这假设a中没有重复项(customer_id, order_id) ，如示例数据所示。

A lateral join would also do:横向连接也可以：

select a.customer, count(*) total_orders, sum(b.num_returns) num_returns
from a
cross apply (
    select count(*) num_returns
    from b
    where b.order_id = a.order_id
) b
group by a.customer

SQL 当没有 where 或 having 子句时，服务器查询聚合不正确

问题描述

2 个解决方案

解决方案1
1 2020-10-24 00:56:59

解决方案2
1 2020-10-24 00:57:07

SQL 当没有 where 或 having 子句时，服务器查询聚合不正确

问题描述

2 个解决方案

解决方案1 1 2020-10-24 00:56:59

解决方案2 1 2020-10-24 00:57:07

解决方案1
1 2020-10-24 00:56:59

解决方案2
1 2020-10-24 00:57:07