[英]INNER JOIN on 2 tables returns wrong values
這是我的SQL
查詢:
SELECT SUM(amz_event_shipment_items.quantity),
amz_event_shipment_items.seller_sku
FROM amz_event_shipment_items
INNER JOIN amz_event_fees ON amz_event_shipment_items.id = amz_event_fees.shipment_item_id
INNER JOIN amz_shipment_events ON amz_shipment_events.id = amz_event_shipment_items.shipment_event_id
WHERE amz_event_fees.currency = 'USD'
AND amz_shipment_events.event_type <> 'RefundEvent'
AND amz_shipment_events.posted_date BETWEEN '2016-5-1 07:00:00' AND '2016-5-7 06:59:59'
GROUP BY amz_event_shipment_items.seller_sku
但是返回的值太高了。。。對我來說沒有意義。
我有什么想念的嗎?
編輯
Many shipment_events for each date
Each shipment_event HAS MANY shipment_item / BELONGS TO ONE event
Each shipment_item HAS MANY shipment_fee / BELONGS TO ONE item
您將數量乘以費用數量。 尋找純粹的存在時,請使用IN
或EXISTS
子句。
select
sum(i.quantity),
i.seller_sku
from amz_event_shipment_items i
where exists
(
select *
from amz_event_fees f
where f.currency = 'USD'
and f.shipment_item_id = i.id
)
and exists
(
select *
from amz_shipment_events e
where e.event_type <> 'RefundEvent'
and e.posted_date between '2016-05-01 07:00:00' and '2016-05-07 06:59:59'
and e.id = i.shipment_event_id
)
group by i.seller_sku;
(MySQL有時在IN
子句上運行緩慢,因此盡管我更喜歡IN
,但我在這里使用EXISTS
。)
這不是答案,而是附件。 如果我理解正確,則您的查詢返回了錯誤的結果,但速度相當快,而我的(帶有EXISTS
子句)返回了正確的結果,但速度非常慢。
因此,消除重復項的任務似乎花費了太多時間。
這里有兩個想法:
第一個想法:立即消除重復
除了加入費用之外,我們在加入之前先匯總費用:
select
sum(i.quantity),
i.seller_sku
from amz_event_shipment_items i
join -- join with only one record per ID to substitute an EXISTS clause
(
select distinct shipment_item_id
from amz_event_fees
where f.currency = 'USD'
) f on f.shipment_item_id = i.id
and exists
(
select *
from amz_shipment_events e
where e.event_type <> 'RefundEvent'
and e.posted_date between '2016-05-01 07:00:00' and '2016-05-07 06:59:59'
and e.id = i.shipment_event_id
)
group by i.seller_sku;
第二個想法:預先匯總值
在這里,我們嘗試盡快進行匯總,以使中間結果保持較小,而不必為每個項目記錄查找事件表。
select
sum(i.pre_sum_quantity),
i.seller_sku
from
(
select seller_sku, shipment_event_id, sum(quantity) as pre_sum_quantity
from amz_event_shipment_items
where exists
(
select *
from amz_event_fees f
where f.currency = 'USD'
and f.shipment_item_id = amz_event_shipment_items.id
)
group by seller_sku, shipment_event_id
) i
where exists
(
select *
from amz_shipment_events e
where e.event_type <> 'RefundEvent'
and e.posted_date between '2016-05-01 07:00:00' and '2016-05-07 06:59:59'
and e.id = i.shipment_event_id
)
group by i.seller_sku;
如果事件類型很少,您也可以嘗試擺脫<>
,從而使它更有可能使用索引:
where e.event_type in ('EarlyPaymentEvent','LatePaymentEvent')
(在這種情況下,它可能會支付有哪里指數event_type
而來的前posted_date
。)
我必須承認,我認為這不會比原始的EXISTS
查詢快得多,但是值得一試。
您的聯接之一返回的記錄可能比您預期的要多。 我會嘗試做一個select *
然后按sku和目測結果排序。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.