I'm having trouble adapting some recursive CTE code from PostgreSQL to SQL Server, from the book "Fighting Churn with Data"
This is the working PostgreSQL code:
with recursive
active_period_params as (
select interval '30 days' as allowed_gap,
'2021-09-30'::date as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= (e.start_date-allowed_gap)::date
)
select account_id, min(start_date) as start_date
from active
group by account_id
This is my attempt at converting to SQL Server. It gets stuck in a loop. I believe the issue has to do with the UNION ALL required by SQL Server.
with
active_period_params as (
select 30 as allowed_gap,
cast('2021-09-30' as date) as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION ALL
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= dateadd(day, -allowed_gap, e.start_date)
)
select account_id, min(start_date) as start_date
from active
group by account_id
The subscription table is a list of subscriptions belonging to customers. A customer can have multiple subscriptions with overlapping dates or gaps between dates. null end_date means the subscription is currently active and has no defined end_date. Example data for a single customer (account_id = 15) below:
subscription
---------------------------------------------------
| id | account_id | start_date | end_date |
---------------------------------------------------
| 6 | 15 | 01/06/2021 | null |
| 5 | 15 | 01/01/2021 | null |
| 4 | 15 | 01/06/2020 | 01/02/2021 |
| 3 | 15 | 01/04/2020 | 15/05/2020 |
| 2 | 15 | 01/03/2020 | 15/05/2020 |
| 1 | 15 | 01/06/2019 | 01/01/2020 |
Expected query result (as produced by PostgreSQL code):
------------------------------
| account_id | start_date |
------------------------------
| 15 | 01/03/2020 |
Issue: The SQL Server code above gets stuck in a loop and doesn't produce a result.
Description of the PostgreSQL code:
Any help appreciated!
It seems the issue is related to the way SQL Server deals with recursive CTEs.
This is a type of gaps-and-islands problem, and does not actually require recursion.
There are a number of solutions, here is one. Given your requirement, there may be more efficient methods, but this should get you started.
LAG
we identify rows which are within the specified gap of the next rowCOUNT
to give each consecutive set of rows an IDstart_date
, filtering out non-qualifying groupsDECLARE @allowed_gap int = 30,
@calc_date datetime = cast('2021-09-30' as date);
WITH PrevValues AS (
SELECT *,
IsStart = CASE WHEN ISNULL(LAG(end_date) OVER (PARTITION BY account_id
ORDER BY start_date), '2099-01-01') < DATEADD(day, -@allowed_gap, start_date)
THEN 1 END
FROM subscription
),
Groups AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (PARTITION BY account_id
ORDER BY start_date ROWS UNBOUNDED PRECEDING)
FROM PrevValues
),
ByGroup AS (
SELECT
account_id,
GroupId,
start_date = MIN(start_date)
FROM Groups
GROUP BY account_id, GroupId
HAVING COUNT(CASE WHEN start_date <= @calc_date
and (end_date > @calc_date or end_date is null) THEN 1 END) > 0
)
SELECT
account_id,
start_date = MIN(start_date)
FROM ByGroup
GROUP BY account_id;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.