I would like to calculate the lead time ( Delivery Date to Customer (table oli, below) - Order Date (table olit, below) ) of our suppliers in BigQuery.
In our ecommerce company; each of our customers may have an order from 1/many suppliers. Therefore we assign one SHIPMENT_NUMBER (table ol, below) per one supplier in a given order.
Therefore we calculate the lead time of a suppliers as the average lead time of SHIPMENT_NUMBERs.
For example, assuming there are 2 orders in total for Supplier A: There is order for Supplier A in Order X and Order Y; and lead time of the order for Supplier A in Order X (SHIPMENT_NUMBER_1) is 10 hours and lead time of the order for Supplier A in Order Y (SHIPMENT_NUMBER_2) is 30 hours; Lead Time of Supplier A --> (Lead Time of SHIPMENT_NUMBER_1 + Lead Time of SHIPMENT_NUMBER_2) / 2 = (10+30)/2 =20 hours.
A SHIPMENT_NUMBER is unique to a supplier in a given order, but in the meantime a SHIPMENT_NUMBER may comprise multiple order lines (table ol, below). For example SHIPMENT_NUMBER_1 may include two order lines. The lead time of line 1 is 5 hours, and the lead time of line 2 is 15 hours, then Lead Time of SHIPMENT_NUMBER_1 is (5+15)/2 = 10 hours.
I can easily calculate the lead time of SHIPMENT_NUMBERS in SQL with below code:
SELECT
ol.SHIPMENT_NUMBER,
avg(timestamp_diff(oli.DELIVERY_DATE, olit.ORDER_DATE, hour)) ORDERTOCUSTOMER
FROM
ORDERLINE ol
JOIN ORDERLINEITEMTRX olit on olit.order_line_sk = ol.ORDER_LINE_SK
join ORDERLINEITEM oli ON olit.order_line_item_sk = oli.ORDER_LINE_ITEM_SK
WHERE
s.SUPPLIER_ID = 'SupplierX'
group by
SHIPMENT_NUMBER
The results are correct according to my manual control.
However, when I make the aggregation on Supplier_ID level, with below code, I get the wrong result. To simplify, I worked with only one supplier above and below. The result must have been 45,3619 Hours as per my manual control, however BigQuery reports 45,7695 Hours.
SELECT
s.SUPPLIER_ID,
AVG(OTC.ORDERTOCUSTOMER)
FROM
ORDERLINE ol
JOIN ORDERLINEITEMTRX olit on olit.order_line_sk = ol.ORDER_LINE_SK
join ORDERLINEITEM oli ON olit.order_line_item_sk = oli.ORDER_LINE_ITEM_SK
RIGHT JOIN SUPPLIER s ON s.SUPPLIER_SK = olit.SUPPLIER_SK
INNER JOIN (
SELECT
ol.SHIPMENT_NUMBER,
avg(timestamp_diff(oli.DELIVERY_DATE, olit.ORDER_DATE, hour)) ORDERTOCUSTOMER
FROM
ORDERLINE ol
JOIN ORDERLINEITEMTRX olit on olit.order_line_sk = ol.ORDER_LINE_SK
join ORDERLINEITEM oli ON olit.order_line_item_sk = oli.ORDER_LINE_ITEM_SK
group by
SHIPMENT_NUMBER
WHERE s.SUPPLIER_ID = 'SupplierX'
) AS OTC ON ol.SHIPMENT_NUMBER = OTC.SHIPMENT_NUMBER
WHERE s.SUPPLIER_ID = 'SupplierX'
group by s.SUPPLIER_ID
What am I doing wrong? Sample dataset and expected results are as here: https://drive.google.com/file/d/1HdQkdhJxciHeHznTbie4bfzcIkRIHSfu/view?usp=sharing
Each shipment number may have multiple recurrences in the original order table due to the fact I stated above (one shipment number may have one/many order lines), therefore the challenge here is to find the average lead time of shipment numbers without overcounting, ie, finding the average over unique numbers.
I just added supplier_id to your first query, and then used this output.
WITH
shipment_lead_times as
(
SELECT
s.SUPPLIER_ID,
ol.SHIPMENT_NUMBER,
avg(timestamp_diff(oli.DELIVERY_DATE, olit.ORDER_DATE, hour)) ORDERTOCUSTOMER
FROM
ORDERLINE ol
JOIN ORDERLINEITEMTRX olit on olit.order_line_sk = ol.ORDER_LINE_SK
join ORDERLINEITEM oli ON olit.order_line_item_sk = oli.ORDER_LINE_ITEM_SK
WHERE
s.SUPPLIER_ID = 'SupplierX'
group by
SHIPMENT_NUMBER,
SUPPLIER_ID
)
select
supplier_id,
count(*) as shipments,
avg(ordertocustomer) as avg_leadtime
from shipment_lead_times
group by supplier_id
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.