簡體   English   中英

基於類別的簡化查詢日期

[英]Simplified Query Date Based on Category

我有數據:

date_key    customer_id  product_variant_final  frequency
2022-03-02  1            a                       1
2022-04-01  2            b                       2
2022-05-02  3            c                       2
......

我想簡化這個查詢。 邏輯是:

  1. 我想知道頻率
  2. Select 在頻率 1 和 2 時最大,然后獲取列date_pertamadate_kedua
  3. 我想知道date_pertamadate_kedua之間的差距。 那里的名字是selisih
  4. 根據某些條件對selisih進行分類

我試試這個查詢。 但也許可以簡化。

SELECT customer_id,
       product_variant_final,
       date_pertama,
       date_kedua,
       selisih, 
       CASE WHEN selisih = 0 THEN "Same Day"
            WHEN selisih BETWEEN -1 AND -7 THEN "1-7 Days"
            WHEN selisih BETWEEN -8 AND -14 THEN "8-14 Days"
            WHEN selisih BETWEEN -15 AND -21 THEN "15-21 Days"
            WHEN selisih < -21 THEN "21+ Days"
       END AS frequency_purchase 
FROM(
SELECT customer_id,
       product_variant_final,
       date_pertama,
       date_kedua,
       DATE_DIFF(date_pertama, date_kedua, DAY) selisih
FROM(
SELECT customer_id,
       product_variant_final,
       MAX(CASE WHEN frequency=1 then date_key ELSE NULL END) date_pertama,
       MAX(CASE WHEN frequency=2 then date_key ELSE NULL END) date_kedua
FROM(
SELECT date_key,
       customer_id,
       product_variant_final,
       ROW_NUMBER() OVER(PARTITION BY customer_id, product_variant_final ORDER BY date_key) frequency
FROM final_data_variant 
WHERE variant_type = "variant" 
      AND customer_id IN(SELECT customer_id FROM total_purchased_data WHERE total_purchased >= 2)
) 
GROUP BY 1,2
))

您的查詢使用外部SELECT語句中的內部計算。 獲取每個客戶訂單的第一個和第二個日期。 然后計算這些日期selisih的天數差異。 然后將該值歸類為frequency_purchase

這些嵌套SELECT結構可以通過使用用 SQL 編寫的 UDF 來消除,因為這里沒有進行進一步的聚合或連接。 UDF 的名稱基於最終計算的列。 UDF 返回一個包含所有需要列的結構,這是由(...).*提取的

create temp function frequency_purchase(selisih int64) as 
(CASE WHEN selisih>0 THEN "ERROR: selisih must be negativ "
WHEN selisih = 0 THEN "Same Day"
            WHEN selisih >= -7  THEN "1-7 Days"
            WHEN selisih >= -14 THEN "8-14 Days"
            WHEN selisih >= -21 THEN "15-21 Days"
            WHEN selisih < -21 THEN "21+ Days"
            else "ERROR"
       END);
create temp function selisih(dates array<date>) as 
  (
    struct(dates[safe_offset(0)] as date_pertama, dates[safe_offset(1)] as date_kedua,
    DATE_DIFF(dates[safe_offset(0)], dates[safe_offset(1)], DAY) as selisih,
    frequency_purchase( DATE_DIFF(dates[safe_offset(0)], dates[safe_offset(1)], DAY)) AS frequency_purchase 
    )
    );
WITH final_data_variant as 
(SELECT date_sub(current_date(),interval a day) as date_key, cast(rand()*5 as int64)   customer_id,1  product_variant_final
from unnest(generate_array(0,60)) a
)

SELECT customer_id,
       product_variant_final,
       selisih(array_agg(date_key ORDER BY date_key limit 2)  ).* # Extract struct
FROM(
SELECT date_key,
       customer_id,
       product_variant_final  
FROM final_data_variant 
#WHERE variant_type = "variant" AND customer_id IN(SELECT customer_id FROM total_purchased_data WHERE total_purchased >= 2)
) 
GROUP BY 1,2

引用了Where條件,因為我沒有這個的數據表。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM