I would like to count for each store_id
on article_id
level:
how many shared article_id's
arrived first in either store_A and store_B respectively.
If the arrival_timestamp
for eg article_id=2
for store_A< store_B (ie article arrived first in store_A) then we would count 1 for store_A and 0 for store_B
See examples below:
Main table
arrival_timestamp article_id store_id
2019-04-01 11:04 2 A
2019-04-01 13:12 2 B
2019-04-01 08:24 4 A
2019-04-01 10:24 4 B
2019-04-10 07:00 7 A
2019-04-10 10:14 7 B
2019-04-23 07:34 9 A
2019-04-23 05:52 9 B
Output table
storeA_count_first_articles storeB_count_first_articles
3 1
You can use two levels of aggregation:
select
sum(case when arrival_timestamp_a < arrival_timestamp_b then 1 else 0 end) storeA_count_first_articles,
sum(case when arrival_timestamp_b < arrival_timestamp_a then 1 else 0 end) storeB_count_first_articles
from (
select
article_id,
min(case when store_id = 'A' then arrival_timestamp end) arrival_timestamp_a,
min(case when store_id = 'B' then arrival_timestamp end) arrival_timestamp_b
from mytable
group by article_id
) t
The subquery uses conditional aggregation to compute the first arrival date of each article in eacn store. Then, the outer query compares the first arrival timestamp of each article and produces the final results.
Another option uses row_number()
, which avoids conditional logic and aggregation in the subquery:
select
sum(case when store_id = 'A' then 1 else 0 end) storeA_count_first_articles,
sum(case when store_id = 'B' then 1 else 0 end) storeB_count_first_articles
from (
select
t.*,
row_number() over(partition by article_id order by arrival_timestamp) rn
from mytable t
) t
where rn = 1
I'm not familiar with Presto, but I think this should work based on their documentation. This answer is a general solution without needing to specifically name Store A and Store B in the query.
SELECT
q.first_store_id AS store_id,
COUNT(*) AS count_first_articles
FROM
(
SELECT
article_id,
first_value( store_id ) OVER ( ORDER BY arrival_timestamp ) AS first_store_id
FROM
table
GROUP BY
article_id
) AS q
GROUP BY
first_store_id
This works for any number of store_id
values without needing to manually define each column - and because the results are row-oriented instead of column-oriented they're easier to process in application code. If you still want named columns you can do that in an outer-query or use a PIVOT
/ UNPIVOT
(hmm, apparently Presto doesn't support PIVOT yet - but you can still do it in application code)
You'll get results like this:
store_id count_first_articles
A 3
B 1
The magic is in the first_value
which is a Window Function , and Presto has a decent set of window functions built-in.
To convert the row-based results into your original column-based example output, do this:
SELECT
SUM( CASE WHEN q2.store_id = 'A' THEN q2.count_first_articles END ) AS storeA_count_first_articles,
SUM( CASE WHEN q2.store_id = 'B' THEN q2.count_first_articles END ) AS storeB_count_first_articles
FROM
(
SELECT
q.first_store_id AS store_id,
COUNT(*) AS count_first_articles
FROM
(
SELECT
article_id,
first_value( store_id ) OVER ( ORDER BY arrival_timestamp ) AS first_store_id
FROM
table
GROUP BY
article_id
) AS q
GROUP BY
first_store_id
) AS q2
Giving:
storeA_count_first_articles storeB_count_first_articles
3 1
While this answer is superficially more complicated (well, more nested ) than the other answers, it is a general solution that doesn't need modifications when you want to look at more stores besides 'A'
and 'B'
.
You can use two levels of aggregation. One method is:
select sum(case when first_store_id = 'A' then 1 else 0 end) as first_a,
sum(case when first_store_id = 'B' then 1 else 0 end) as first_b
from (select distinct article_id,
first_value(store_id) over (partition by article_id order by arrival_timestamp) as first_store_id
from t
) t;
Note: The inner aggregation uses select distinct
as a convenience. The outer aggregation doesn't use group by
because you want only one row in the result set.
This can also be written in Presto using min_by()
and an explicit aggregation:
select sum(case when first_store_id = 'A' then 1 else 0 end) as first_a,
sum(case when first_store_id = 'B' then 1 else 0 end) as first_b
from (select article_id, min_by(store_id, arrival_timestamp) as first_store_id
from t
group by article_id
) t;
Note: Both these queries assume you do not have other stores. If you do and you only care about these two, then add a where store_id in ('A', 'B')
to the queries.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.