简体   繁体   English

Select 不同的 customer_id

[英]Select distinct customer_id

I would like to count for each store_id on article_id level:我想计算article_id级别的每个store_id

  • how many shared article_id's arrived first in either store_A and store_B respectively.有多少共享article_id's分别首先到达 store_A 和 store_B。

  • If the arrival_timestamp for eg article_id=2 for store_A< store_B (ie article arrived first in store_A) then we would count 1 for store_A and 0 for store_B如果针对arrival_timestamp < store_B 的文章_id article_id=2的到达时间戳(即文章首先到达 store_A),那么我们将 store_A 计为 1,将 store_B 计为 0

See examples below:请参阅以下示例:

Main table主表


arrival_timestamp           article_id   store_id

2019-04-01 11:04             2            A
2019-04-01 13:12             2            B
2019-04-01 08:24             4            A
2019-04-01 10:24             4            B
2019-04-10 07:00             7            A
2019-04-10 10:14             7            B
2019-04-23 07:34             9            A
2019-04-23 05:52             9            B

Output table Output表


storeA_count_first_articles     storeB_count_first_articles
3                                1

You can use two levels of aggregation:您可以使用两个级别的聚合:

select
    sum(case when arrival_timestamp_a < arrival_timestamp_b then 1 else 0 end) storeA_count_first_articles,
    sum(case when arrival_timestamp_b < arrival_timestamp_a then 1 else 0 end) storeB_count_first_articles
from (
    select 
        article_id,
        min(case when store_id = 'A' then arrival_timestamp end) arrival_timestamp_a,
        min(case when store_id = 'B' then arrival_timestamp end) arrival_timestamp_b
    from mytable
    group by article_id
) t

The subquery uses conditional aggregation to compute the first arrival date of each article in eacn store.子查询使用条件聚合来计算每篇文章在 eacn 商店中的首次到达日期。 Then, the outer query compares the first arrival timestamp of each article and produces the final results.然后,外部查询比较每篇文章的初到时间戳并产生最终结果。

Another option uses row_number() , which avoids conditional logic and aggregation in the subquery:另一个选项使用row_number() ,它避免了子查询中的条件逻辑和聚合:

select 
    sum(case when store_id = 'A' then 1 else 0 end) storeA_count_first_articles,
    sum(case when store_id = 'B' then 1 else 0 end) storeB_count_first_articles
from (
    select 
        t.*, 
        row_number() over(partition by article_id order by arrival_timestamp) rn
    from mytable t
) t
where rn = 1

I'm not familiar with Presto, but I think this should work based on their documentation.我不熟悉 Presto,但我认为这应该根据他们的文档工作。 This answer is a general solution without needing to specifically name Store A and Store B in the query.这个答案是一个通用的解决方案,不需要在查询中专门命名 Store A 和 Store B。

SELECT
    q.first_store_id AS store_id,
    COUNT(*) AS count_first_articles
FROM
    (
        SELECT
            article_id,
            first_value( store_id ) OVER ( ORDER BY arrival_timestamp ) AS first_store_id
        FROM
            table
        GROUP BY
            article_id
    ) AS q
GROUP BY
    first_store_id

This works for any number of store_id values without needing to manually define each column - and because the results are row-oriented instead of column-oriented they're easier to process in application code.这适用于任意数量的store_id值,而无需手动定义每一列 - 因为结果是面向行而不是面向列的,所以它们更容易在应用程序代码中处理。 If you still want named columns you can do that in an outer-query or use a PIVOT / UNPIVOT (hmm, apparently Presto doesn't support PIVOT yet - but you can still do it in application code)如果您仍然想要命名列,您可以在外部查询中执行此操作或使用PIVOT / UNPIVOT (嗯,显然 Presto 还不支持 PIVOT - 但您仍然可以在应用程序代码中执行此操作)

You'll get results like this:你会得到这样的结果:

store_id        count_first_articles
      A                            3
      B                            1

The magic is in the first_value which is a Window Function , and Presto has a decent set of window functions built-in.神奇之处在于first_value ,它是 Window Function ,Presto 内置了一组不错的 window 函数。

To convert the row-based results into your original column-based example output, do this:要将基于行的结果转换为基于列的原始示例 output,请执行以下操作:

SELECT
    SUM( CASE WHEN q2.store_id = 'A' THEN q2.count_first_articles END ) AS storeA_count_first_articles,
    SUM( CASE WHEN q2.store_id = 'B' THEN q2.count_first_articles END ) AS storeB_count_first_articles
FROM
    (
        SELECT
            q.first_store_id AS store_id,
            COUNT(*) AS count_first_articles
        FROM
            (
                SELECT
                    article_id,
                    first_value( store_id ) OVER ( ORDER BY arrival_timestamp ) AS first_store_id
                FROM
                    table
                GROUP BY
                    article_id
            ) AS q
        GROUP BY
            first_store_id
    ) AS q2

Giving:给予:

storeA_count_first_articles     storeB_count_first_articles
3                                1

While this answer is superficially more complicated (well, more nested ) than the other answers, it is a general solution that doesn't need modifications when you want to look at more stores besides 'A' and 'B' .虽然这个答案表面上比其他答案更复杂(嗯,更嵌套),但它是一个通用解决方案,当您想查看除'A''B'之外的更多商店时,它不需要修改。

You can use two levels of aggregation.您可以使用两个级别的聚合。 One method is:一种方法是:

select sum(case when first_store_id = 'A' then 1 else 0 end) as first_a,
       sum(case when first_store_id = 'B' then 1 else 0 end) as first_b       
from (select distinct article_id,
             first_value(store_id) over (partition by article_id order by arrival_timestamp) as first_store_id
      from t
     ) t;

Note: The inner aggregation uses select distinct as a convenience.注意:为了方便,内部聚合使用select distinct The outer aggregation doesn't use group by because you want only one row in the result set.外部聚合不使用group by因为您只需要结果集中的一行。

This can also be written in Presto using min_by() and an explicit aggregation:这也可以使用min_by()和显式聚合在 Presto 中编写:

select sum(case when first_store_id = 'A' then 1 else 0 end) as first_a,
       sum(case when first_store_id = 'B' then 1 else 0 end) as first_b       
from (select article_id, min_by(store_id, arrival_timestamp) as first_store_id
      from t
      group by article_id
     ) t;

Note: Both these queries assume you do not have other stores.注意:这两个查询都假定您没有其他商店。 If you do and you only care about these two, then add a where store_id in ('A', 'B') to the queries.如果您这样做并且您只关心这两个,那么将where store_id in ('A', 'B')添加到查询中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM