简体   繁体   中英

Converting PostgreSQL recursive CTE to SQL Server

I'm having trouble adapting some recursive CTE code from PostgreSQL to SQL Server, from the book "Fighting Churn with Data"

This is the working PostgreSQL code:

with recursive
    active_period_params as (
        select interval '30 days'  as allowed_gap,
        '2021-09-30'::date as calc_date
    ),
    active as (
        -- anchor
        select distinct account_id, min(start_date) as start_date    
        from subscription inner join active_period_params 
            on start_date <= calc_date    
            and (end_date > calc_date or end_date is null)
        group by account_id
    
        UNION
        
        -- recursive
        select s.account_id, s.start_date  
        from subscription s 
        cross join active_period_params 
        inner join active e on s.account_id=e.account_id  
            and s.start_date < e.start_date  
            and s.end_date >= (e.start_date-allowed_gap)::date  
    )
select account_id, min(start_date) as start_date
from active
group by account_id

This is my attempt at converting to SQL Server. It gets stuck in a loop. I believe the issue has to do with the UNION ALL required by SQL Server.

with
    active_period_params as (
        select 30 as allowed_gap,
        cast('2021-09-30' as date) as calc_date
    ),
    active as (
        -- anchor
        select distinct account_id, min(start_date) as start_date    
        from subscription inner join active_period_params 
            on start_date <= calc_date    
            and (end_date > calc_date or end_date is null)
        group by account_id
    
        UNION ALL
        
        -- recursive
        select s.account_id, s.start_date  
        from subscription s 
        cross join active_period_params 
        inner join active e on s.account_id=e.account_id  
            and s.start_date < e.start_date  
            and s.end_date >= dateadd(day, -allowed_gap, e.start_date)
    )
select account_id, min(start_date) as start_date
from active
group by account_id

The subscription table is a list of subscriptions belonging to customers. A customer can have multiple subscriptions with overlapping dates or gaps between dates. null end_date means the subscription is currently active and has no defined end_date. Example data for a single customer (account_id = 15) below:

subscription
 ---------------------------------------------------
|  id  |  account_id  |  start_date  |   end_date   |
 ---------------------------------------------------
|   6  |      15      |  01/06/2021  |    null    |
|   5  |      15      |  01/01/2021  |    null    |
|   4  |      15      |  01/06/2020  | 01/02/2021 |
|   3  |      15      |  01/04/2020  | 15/05/2020 |
|   2  |      15      |  01/03/2020  | 15/05/2020 |
|   1  |      15      |  01/06/2019  | 01/01/2020 |

Expected query result (as produced by PostgreSQL code):

 ------------------------------
|  account_id  |  start_date  |
 ------------------------------
|      15      |  01/03/2020  |

Issue: The SQL Server code above gets stuck in a loop and doesn't produce a result.

Description of the PostgreSQL code:

  1. anchor block finds subs that are active as at the calc_date (30/09/2021) (id 5 & 6), and returns the min start_date (01/01/2021)
  2. the recursion block then looks for any earlier subs that existed within the allowed_gap, which is 30 days prior to the min_start date found in 1). id 4 meets this criteria, so the new min start_date is 01/06/2020
  3. recursion repeats and finds two subs within the allowed_gap (01/06/2020 - 30 days). Of these subs (id 2 & 3), the new min start_date is 01/03/2020
  4. recursion fails to find an earlier sub within the allowed_gap (01/03/2020 - 30 days)
  5. query returns a start date of 01/03/2020 for account_id 15

Any help appreciated!

It seems the issue is related to the way SQL Server deals with recursive CTEs.

This is a type of gaps-and-islands problem, and does not actually require recursion.

There are a number of solutions, here is one. Given your requirement, there may be more efficient methods, but this should get you started.

  • Using LAG we identify rows which are within the specified gap of the next row
  • We use a running COUNT to give each consecutive set of rows an ID
  • We group by that ID, and take the minimum start_date , filtering out non-qualifying groups
  • Group again to get the minimum per account
DECLARE @allowed_gap int = 30,
        @calc_date datetime = cast('2021-09-30' as date);

WITH PrevValues AS (
    SELECT *,
      IsStart = CASE WHEN ISNULL(LAG(end_date) OVER (PARTITION BY account_id
                     ORDER BY start_date), '2099-01-01') < DATEADD(day, -@allowed_gap, start_date)
                     THEN 1 END
    FROM subscription
),
Groups AS (
    SELECT *,
      GroupId = COUNT(IsStart) OVER (PARTITION BY account_id
                     ORDER BY start_date ROWS UNBOUNDED PRECEDING)
    FROM PrevValues
),
ByGroup AS (
    SELECT
      account_id,
      GroupId,
      start_date = MIN(start_date)
    FROM Groups
    GROUP BY account_id, GroupId
    HAVING COUNT(CASE WHEN start_date <= @calc_date    
            and (end_date > @calc_date or end_date is null) THEN 1 END) > 0
)
SELECT
  account_id,
  start_date = MIN(start_date)
FROM ByGroup
GROUP BY account_id;

db<>fiddle

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM