简体   繁体   中英

Count unique rows GROUP(ed) BY different columns than used in DISTINCT ON

I'm sure this has been asked over and over, but I can't quite find a simple example that I can fully grok.

I'm trying to deduplicate (do a DISTINCT ON ) by one column and COUNT records GROUPed By columns that differ from the column used to deduplicate, but without introducing subqueries.

Let's say I have a table with the following information:

order_num date region timestamp_updated
001 2021-09-01 Murica 2021-09- 02 T19:00:01Z
001 2021-09-01 Murica 2021-09- 03 T19:00:01Z
002 2021-09-01 Yurop 2021-09-02T19:00:01Z
003 2021-09-01 Yurop 2021-09-03T19:00:01Z
004 2021-09-02 Yurop 2021-09-03T19:00:01Z

I would like to first get unique records with different order_num (keeping the most recently updated) ones AND then count groups or orders by date and region .

  • De-duplicate (gets rid of the oldest order_num='001A' ):

    order_num date region timestamp_updated
    001 2021-09-01 Murica 2021-09- 03 T19:00:01Z
    002 2021-09-01 Yurop 2021-09-02T19:00:01Z
    003 2021-09-01 Yurop 2021-09-03T19:00:01Z
    004 2021-09-02 Yurop 2021-09-03T19:00:01Z
  • Then group by and count:

    date region count
    2021-09-01 Murica 1
    2021-09-01 Yurop 2
    2021-09-02 Yurop 1

I know how to do those two things separately ( distinct on(order_num) + order by timestamp_updated desc ) to deduplicate and then select count(*) + group by date, region ) And even together with subqueries. But I'd like to try to avoid subqueries as much as possible and here's where window functions (seem) to come in handy and I don't know much anything about those.

The closest thing I've been able to get are groups, but they show one record by each order_num . The records are correct, but they are duplicated:

select distinct on (order_num) date, region, count(1)over (
    partition by order_num
)
from orders_table
order by order_num, timestamp_updated desc;

That query ^^ shows:

date region count
2021-09-01 Murica 1 I think this is the first 001
2021-09-01 Murica 1 I think this is the second 001
2021-09-01 Yurop 2 I think this is the first Yurop: 002
2021-09-01 Yurop 2 I think this is the second Yurop: 003
2021-09-02 Yurop 1

You could get the max timestamp_updated per order_num, date, region and then aggregate again to get the counts per date, region using window function

select distinct 
       date, 
       region, 
       count(max(timestamp_updated)) over (partition by date, region) as counts 
from t
group by order_num, date, region;

DEMO

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM