[英]Simplified Query Date Based on Category
我有數據:
date_key customer_id product_variant_final frequency
2022-03-02 1 a 1
2022-04-01 2 b 2
2022-05-02 3 c 2
......
我想簡化這個查詢。 邏輯是:
date_pertama
和date_kedua
date_pertama
和date_kedua
之間的差距。 那里的名字是selisih
selisih
進行分類我試試這個查詢。 但也許可以簡化。
SELECT customer_id,
product_variant_final,
date_pertama,
date_kedua,
selisih,
CASE WHEN selisih = 0 THEN "Same Day"
WHEN selisih BETWEEN -1 AND -7 THEN "1-7 Days"
WHEN selisih BETWEEN -8 AND -14 THEN "8-14 Days"
WHEN selisih BETWEEN -15 AND -21 THEN "15-21 Days"
WHEN selisih < -21 THEN "21+ Days"
END AS frequency_purchase
FROM(
SELECT customer_id,
product_variant_final,
date_pertama,
date_kedua,
DATE_DIFF(date_pertama, date_kedua, DAY) selisih
FROM(
SELECT customer_id,
product_variant_final,
MAX(CASE WHEN frequency=1 then date_key ELSE NULL END) date_pertama,
MAX(CASE WHEN frequency=2 then date_key ELSE NULL END) date_kedua
FROM(
SELECT date_key,
customer_id,
product_variant_final,
ROW_NUMBER() OVER(PARTITION BY customer_id, product_variant_final ORDER BY date_key) frequency
FROM final_data_variant
WHERE variant_type = "variant"
AND customer_id IN(SELECT customer_id FROM total_purchased_data WHERE total_purchased >= 2)
)
GROUP BY 1,2
))
您的查詢使用外部SELECT
語句中的內部計算。 獲取每個客戶訂單的第一個和第二個日期。 然后計算這些日期selisih
的天數差異。 然后將該值歸類為frequency_purchase
。
這些嵌套SELECT
結構可以通過使用用 SQL 編寫的 UDF 來消除,因為這里沒有進行進一步的聚合或連接。 UDF 的名稱基於最終計算的列。 UDF 返回一個包含所有需要列的結構,這是由(...).*
提取的
create temp function frequency_purchase(selisih int64) as
(CASE WHEN selisih>0 THEN "ERROR: selisih must be negativ "
WHEN selisih = 0 THEN "Same Day"
WHEN selisih >= -7 THEN "1-7 Days"
WHEN selisih >= -14 THEN "8-14 Days"
WHEN selisih >= -21 THEN "15-21 Days"
WHEN selisih < -21 THEN "21+ Days"
else "ERROR"
END);
create temp function selisih(dates array<date>) as
(
struct(dates[safe_offset(0)] as date_pertama, dates[safe_offset(1)] as date_kedua,
DATE_DIFF(dates[safe_offset(0)], dates[safe_offset(1)], DAY) as selisih,
frequency_purchase( DATE_DIFF(dates[safe_offset(0)], dates[safe_offset(1)], DAY)) AS frequency_purchase
)
);
WITH final_data_variant as
(SELECT date_sub(current_date(),interval a day) as date_key, cast(rand()*5 as int64) customer_id,1 product_variant_final
from unnest(generate_array(0,60)) a
)
SELECT customer_id,
product_variant_final,
selisih(array_agg(date_key ORDER BY date_key limit 2) ).* # Extract struct
FROM(
SELECT date_key,
customer_id,
product_variant_final
FROM final_data_variant
#WHERE variant_type = "variant" AND customer_id IN(SELECT customer_id FROM total_purchased_data WHERE total_purchased >= 2)
)
GROUP BY 1,2
引用了Where
條件,因為我沒有這個的數據表。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.