简体   繁体   English

如果要分组的项目范围很大,则按问题进行区分(区分)和分组

[英]Count(distinct) and group by issue if the range of items being grouped is significant

I am joining two tables (shipments and returns) and using group by to view totals for certain criteria. 我将加入两个表(装运和退货),并使用分组依据查看某些条件的总计。 The two tables are related via shipment_id. 这两个表通过shipment_id相关联。 this column is mostly unique, but contains a few duplicates because each shipment can contain more than one item that is also contained in the table. 该列通常是唯一的,但包含一些重复项,因为每个货件可以包含表中也包含的多个项目。

I'm trying to count all the distinct shipments grouped by warehouse, seller, and size. 我试图计算按仓库,卖方和大小分组的所有不同发货。 count(distinct works great, but does not report correct information when used with group by if the range of items being grouped is significant. count(distinct效果很好,但是如果与group by一起使用,如果要分组的项目范围很大,则不会报告正确的信息。

The query below returns 7 shipments (added up) 4 returns (also added). 下面的查询返回7个装运(总计)4个返回(也添加)。 While with the small amount of test data I have the return count is correct, there are in actuality 6 distinct shipments, not 7. With this query i'm basically looking at all shipments and joining return information if an item in the shipment has been returned. 尽管测试数据量少,但我的退货计数是正确的,但实际上有6个不同的货件,而不是7。使用此查询,我基本上是查看所有货件,如果货品中有货,则加入退货信息回来。

select s.warehouse, s.seller, s.size,
count(distinct s.shipment_id) as total_shipments,
count(distinct r.shipment_id) as total_returns
from shipments s
left join returns r
on s.shipment_id = r.shipment_id
group by s.warehouse, s.seller, s.size

I'm concerned that the report I generate won't be entirely accurate. 我担心我生成的报告并不完全准确。 Is there a work around for this issue? 有没有解决此问题的方法? I've seen similar issues, but none that really apply. 我见过类似的问题,但没有一个真正适用。 I am using MYSQL 我正在使用MYSQL

I see a potential problem. 我看到一个潜在的问题。 If a shipment has multiple items and may end up in duplicate shipment records, that means that it's possible that the shipment comes from different warehouses or sellers or that the size is different. 如果一个货件有多个项目,并且可能会出现重复的货件记录,则意味着该货件可能来自不同的仓库或卖方,或者大小可能不同。 By grouping by those fields, you risk ending with with shipment being calculated more then once since the shipment_id is technically distinct for that group. 通过按这些字段进行分组,您可能会面临最终计算出的货运量要多于一次的风险,因为该组的shipment_id在技​​术上是不同的。

You could try grouping by s.shipment_id instead of s.warehouse, s.seller, s.size . 您可以尝试按s.shipment_id而不是s.warehouse, s.seller, s.size The problem here is that if the warehouse, seller or size differs, you'll end up missing one row (for that warehouse/selling/size) but the totals will add up. 这里的问题是,如果仓库,卖方或大小不同,您最终将丢失一行(对于该仓库/销售/大小),但总数将加起来。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM