[英]Converting PostgreSQL recursive CTE to SQL Server
I'm having trouble adapting some recursive CTE code from PostgreSQL to SQL Server, from the book "Fighting Churn with Data"我在将一些递归 CTE 代码从 PostgreSQL 改编到 SQL Server 时遇到了麻烦,来自“Fighting Churn with Data”一书
This is the working PostgreSQL code:这是工作的 PostgreSQL 代码:
with recursive
active_period_params as (
select interval '30 days' as allowed_gap,
'2021-09-30'::date as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= (e.start_date-allowed_gap)::date
)
select account_id, min(start_date) as start_date
from active
group by account_id
This is my attempt at converting to SQL Server.这是我尝试转换为 SQL Server。 It gets stuck in a loop.
它陷入了一个循环。 I believe the issue has to do with the UNION ALL required by SQL Server.
我相信这个问题与 SQL Server 所需的 UNION ALL 有关。
with
active_period_params as (
select 30 as allowed_gap,
cast('2021-09-30' as date) as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION ALL
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= dateadd(day, -allowed_gap, e.start_date)
)
select account_id, min(start_date) as start_date
from active
group by account_id
The subscription table is a list of subscriptions belonging to customers.订阅表是属于客户的订阅列表。 A customer can have multiple subscriptions with overlapping dates or gaps between dates.
客户可以有多个具有重叠日期或日期间隔的订阅。 null end_date means the subscription is currently active and has no defined end_date.
null end_date 表示订阅当前处于活动状态并且没有定义的 end_date。 Example data for a single customer (account_id = 15) below:
下面是单个客户 (account_id = 15) 的示例数据:
subscription
---------------------------------------------------
| id | account_id | start_date | end_date |
---------------------------------------------------
| 6 | 15 | 01/06/2021 | null |
| 5 | 15 | 01/01/2021 | null |
| 4 | 15 | 01/06/2020 | 01/02/2021 |
| 3 | 15 | 01/04/2020 | 15/05/2020 |
| 2 | 15 | 01/03/2020 | 15/05/2020 |
| 1 | 15 | 01/06/2019 | 01/01/2020 |
Expected query result (as produced by PostgreSQL code):预期查询结果(由 PostgreSQL 代码生成):
------------------------------
| account_id | start_date |
------------------------------
| 15 | 01/03/2020 |
Issue: The SQL Server code above gets stuck in a loop and doesn't produce a result.问题:上面的 SQL Server 代码卡在循环中并且不会产生结果。
Description of the PostgreSQL code: PostgreSQL 代码说明:
Any help appreciated!任何帮助表示赞赏!
It seems the issue is related to the way SQL Server deals with recursive CTEs.问题似乎与 SQL Server 处理递归 CTE 的方式有关。
This is a type of gaps-and-islands problem, and does not actually require recursion.这是一种间隙和孤岛问题,实际上并不需要递归。
There are a number of solutions, here is one.有很多解决方案,这里是一个。 Given your requirement, there may be more efficient methods, but this should get you started.
根据您的要求,可能有更有效的方法,但这应该可以帮助您入门。
LAG
we identify rows which are within the specified gap of the next rowLAG
我们识别下一行的指定间隙内的行COUNT
to give each consecutive set of rows an IDCOUNT
给每一个连续的行集一个 IDstart_date
, filtering out non-qualifying groupsstart_date
,过滤掉不符合条件的组DECLARE @allowed_gap int = 30,
@calc_date datetime = cast('2021-09-30' as date);
WITH PrevValues AS (
SELECT *,
IsStart = CASE WHEN ISNULL(LAG(end_date) OVER (PARTITION BY account_id
ORDER BY start_date), '2099-01-01') < DATEADD(day, -@allowed_gap, start_date)
THEN 1 END
FROM subscription
),
Groups AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (PARTITION BY account_id
ORDER BY start_date ROWS UNBOUNDED PRECEDING)
FROM PrevValues
),
ByGroup AS (
SELECT
account_id,
GroupId,
start_date = MIN(start_date)
FROM Groups
GROUP BY account_id, GroupId
HAVING COUNT(CASE WHEN start_date <= @calc_date
and (end_date > @calc_date or end_date is null) THEN 1 END) > 0
)
SELECT
account_id,
start_date = MIN(start_date)
FROM ByGroup
GROUP BY account_id;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.