简体   繁体   中英

Grouping/aggregating data ranges across rows in SQL

I have the following table of product ownership with date ranges:

在此处输入图像描述

DDL for input table can be found HERE .

Each product can belong to only one group. Customer cannot have two instances of the same product at any point in time.

We can visualise the timeline of the product ownership above as follows:

产品所有权时间表

Now, I would like to calculate the Number of products owned in each group in the ranges, ie:

输出 1

Finally, the total number of products owned by customer and number of groups these products belong to:

在此处输入图像描述

This is in Oracle but it would be great to have the code in ANSI SQL.

Any hints?

You can get the cumulative number of products by unpivoting the dates into a column to keep track of "in"s and "out"s. Then a cumulative sum gets the number of products.

Getting the number of groups is more challenging. The following uses a subquery:

with dtes as (
      select customer_id, date_from as dte, 1 as inc
      from t
      union all
      select customer_id, date_to + 1, -1 as inc
      from t
     )
select customer_id, dte as date_from,
       lead(dte) over (partition by customer_id order by dte) - 1 as date_to,
       sum(sum(inc)) over (partition by customer_id order by dte),
       (select count(distinct t2.prd_grp_id)
        from t t2
        where dtes.customer_id = t2.customer_id and
              dtes.dte between t2.date_from and t2.date_to
       ) as num_groups
from dtes
group by customer_id, dte
order by customer_id, dte;

Here is a db<>fiddle.

Working solution

OUTPUT #1

with dtes as (
      select customer_id, prd_grp_id, date_from as dte, 1 as inc
      from t
      union all
      select customer_id, prd_grp_id, date_to + 1, -1 as inc
      from t
     ),
grps as (
select customer_id, prd_grp_id, dte as date_from,
       lead(dte) over (partition by customer_id, prd_grp_id order by dte) - 1 as date_to,
       sum(sum(inc)) over (partition by customer_id, prd_grp_id order by dte) as n_prods
from dtes
group by customer_id, prd_grp_id, dte
)
select * from grps where n_prods>0;

OUTPUT #2

with dtes as (
      select customer_id, date_from as dte, 1 as inc
      from t
      union all
      select customer_id, date_to + 1, -1 as inc
      from t
     ),
totals as (
select customer_id, dte as date_from,
       lead(dte) over (partition by customer_id order by dte) - 1 as date_to,
       sum(sum(inc)) over (partition by customer_id order by dte) as num_prods,
       (select count(distinct t2.prd_grp_id)
        from t t2
        where dtes.customer_id = t2.customer_id and
              dtes.dte between t2.date_from and t2.date_to
       ) as num_groups
from dtes
group by customer_id, dte )
select * from totals where num_groups>0 order by customer_id, date_from;

Fiddle is here

Thanks @ Gordon Linoff !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM