Select 不同的 customer_id

Question

我想計算article_id級別的每個store_id ：

有多少共享article_id's分別首先到達 store_A 和 store_B。
如果針對arrival_timestamp < store_B 的文章_id article_id=2的到達時間戳（即文章首先到達 store_A），那么我們將 store_A 計為 1，將 store_B 計為 0

請參閱以下示例：

主表


arrival_timestamp           article_id   store_id

2019-04-01 11:04             2            A
2019-04-01 13:12             2            B
2019-04-01 08:24             4            A
2019-04-01 10:24             4            B
2019-04-10 07:00             7            A
2019-04-10 10:14             7            B
2019-04-23 07:34             9            A
2019-04-23 05:52             9            B

Output表


storeA_count_first_articles     storeB_count_first_articles
3                                1

Answer 1

您可以使用兩個級別的聚合：

select
    sum(case when arrival_timestamp_a < arrival_timestamp_b then 1 else 0 end) storeA_count_first_articles,
    sum(case when arrival_timestamp_b < arrival_timestamp_a then 1 else 0 end) storeB_count_first_articles
from (
    select 
        article_id,
        min(case when store_id = 'A' then arrival_timestamp end) arrival_timestamp_a,
        min(case when store_id = 'B' then arrival_timestamp end) arrival_timestamp_b
    from mytable
    group by article_id
) t

子查詢使用條件聚合來計算每篇文章在 eacn 商店中的首次到達日期。 然后，外部查詢比較每篇文章的初到時間戳並產生最終結果。

另一個選項使用row_number() ，它避免了子查詢中的條件邏輯和聚合：

select 
    sum(case when store_id = 'A' then 1 else 0 end) storeA_count_first_articles,
    sum(case when store_id = 'B' then 1 else 0 end) storeB_count_first_articles
from (
    select 
        t.*, 
        row_number() over(partition by article_id order by arrival_timestamp) rn
    from mytable t
) t
where rn = 1

Answer 2

我不熟悉 Presto，但我認為這應該根據他們的文檔工作。 這個答案是一個通用的解決方案，不需要在查詢中專門命名 Store A 和 Store B。

SELECT
    q.first_store_id AS store_id,
    COUNT(*) AS count_first_articles
FROM
    (
        SELECT
            article_id,
            first_value( store_id ) OVER ( ORDER BY arrival_timestamp ) AS first_store_id
        FROM
            table
        GROUP BY
            article_id
    ) AS q
GROUP BY
    first_store_id

這適用於任意數量的store_id值，而無需手動定義每一列 - 因為結果是面向行而不是面向列的，所以它們更容易在應用程序代碼中處理。 如果您仍然想要命名列，您可以在外部查詢中執行此操作或使用PIVOT / UNPIVOT （嗯，顯然 Presto 還不支持 PIVOT - 但您仍然可以在應用程序代碼中執行此操作）

你會得到這樣的結果：

store_id        count_first_articles
      A                            3
      B                            1

神奇之處在於first_value ，它是 Window Function ，Presto 內置了一組不錯的 window 函數。

要將基於行的結果轉換為基於列的原始示例 output，請執行以下操作：

SELECT
    SUM( CASE WHEN q2.store_id = 'A' THEN q2.count_first_articles END ) AS storeA_count_first_articles,
    SUM( CASE WHEN q2.store_id = 'B' THEN q2.count_first_articles END ) AS storeB_count_first_articles
FROM
    (
        SELECT
            q.first_store_id AS store_id,
            COUNT(*) AS count_first_articles
        FROM
            (
                SELECT
                    article_id,
                    first_value( store_id ) OVER ( ORDER BY arrival_timestamp ) AS first_store_id
                FROM
                    table
                GROUP BY
                    article_id
            ) AS q
        GROUP BY
            first_store_id
    ) AS q2

給予：

storeA_count_first_articles     storeB_count_first_articles
3                                1

雖然這個答案表面上比其他答案更復雜（嗯，更嵌套），但它是一個通用解決方案，當您想查看除'A'和'B'之外的更多商店時，它不需要修改。

Answer 3

您可以使用兩個級別的聚合。 一種方法是：

select sum(case when first_store_id = 'A' then 1 else 0 end) as first_a,
       sum(case when first_store_id = 'B' then 1 else 0 end) as first_b       
from (select distinct article_id,
             first_value(store_id) over (partition by article_id order by arrival_timestamp) as first_store_id
      from t
     ) t;

注意：為了方便，內部聚合使用select distinct 。 外部聚合不使用group by因為您只需要結果集中的一行。

這也可以使用min_by()和顯式聚合在 Presto 中編寫：

select sum(case when first_store_id = 'A' then 1 else 0 end) as first_a,
       sum(case when first_store_id = 'B' then 1 else 0 end) as first_b       
from (select article_id, min_by(store_id, arrival_timestamp) as first_store_id
      from t
      group by article_id
     ) t;

注意：這兩個查詢都假定您沒有其他商店。 如果您這樣做並且您只關心這兩個，那么將where store_id in ('A', 'B')添加到查詢中。

Select 不同的 customer_id

問題描述

3 個解決方案

解決方案1
0 2020-06-14 22:07:20

解決方案2
0 2020-06-14 22:10:18

解決方案3
0 2020-06-14 22:12:51

Select 不同的 customer_id

問題描述

3 個解決方案

解決方案1 0 2020-06-14 22:07:20

解決方案2 0 2020-06-14 22:10:18

解決方案3 0 2020-06-14 22:12:51

解決方案1
0 2020-06-14 22:07:20

解決方案2
0 2020-06-14 22:10:18

解決方案3
0 2020-06-14 22:12:51