简体   繁体   English

将 PostgreSQL 递归 CTE 转换为 SQL Server

[英]Converting PostgreSQL recursive CTE to SQL Server

I'm having trouble adapting some recursive CTE code from PostgreSQL to SQL Server, from the book "Fighting Churn with Data"我在将一些递归 CTE 代码从 PostgreSQL 改编到 SQL Server 时遇到了麻烦,来自“Fighting Churn with Data”一书

This is the working PostgreSQL code:这是工作的 PostgreSQL 代码:

with recursive
    active_period_params as (
        select interval '30 days'  as allowed_gap,
        '2021-09-30'::date as calc_date
    ),
    active as (
        -- anchor
        select distinct account_id, min(start_date) as start_date    
        from subscription inner join active_period_params 
            on start_date <= calc_date    
            and (end_date > calc_date or end_date is null)
        group by account_id
    
        UNION
        
        -- recursive
        select s.account_id, s.start_date  
        from subscription s 
        cross join active_period_params 
        inner join active e on s.account_id=e.account_id  
            and s.start_date < e.start_date  
            and s.end_date >= (e.start_date-allowed_gap)::date  
    )
select account_id, min(start_date) as start_date
from active
group by account_id

This is my attempt at converting to SQL Server.这是我尝试转换为 SQL Server。 It gets stuck in a loop.它陷入了一个循环。 I believe the issue has to do with the UNION ALL required by SQL Server.我相信这个问题与 SQL Server 所需的 UNION ALL 有关。

with
    active_period_params as (
        select 30 as allowed_gap,
        cast('2021-09-30' as date) as calc_date
    ),
    active as (
        -- anchor
        select distinct account_id, min(start_date) as start_date    
        from subscription inner join active_period_params 
            on start_date <= calc_date    
            and (end_date > calc_date or end_date is null)
        group by account_id
    
        UNION ALL
        
        -- recursive
        select s.account_id, s.start_date  
        from subscription s 
        cross join active_period_params 
        inner join active e on s.account_id=e.account_id  
            and s.start_date < e.start_date  
            and s.end_date >= dateadd(day, -allowed_gap, e.start_date)
    )
select account_id, min(start_date) as start_date
from active
group by account_id

The subscription table is a list of subscriptions belonging to customers.订阅表是属于客户的订阅列表。 A customer can have multiple subscriptions with overlapping dates or gaps between dates.客户可以有多个具有重叠日期或日期间隔的订阅。 null end_date means the subscription is currently active and has no defined end_date. null end_date 表示订阅当前处于活动状态并且没有定义的 end_date。 Example data for a single customer (account_id = 15) below:下面是单个客户 (account_id = 15) 的示例数据:

subscription
 ---------------------------------------------------
|  id  |  account_id  |  start_date  |   end_date   |
 ---------------------------------------------------
|   6  |      15      |  01/06/2021  |    null    |
|   5  |      15      |  01/01/2021  |    null    |
|   4  |      15      |  01/06/2020  | 01/02/2021 |
|   3  |      15      |  01/04/2020  | 15/05/2020 |
|   2  |      15      |  01/03/2020  | 15/05/2020 |
|   1  |      15      |  01/06/2019  | 01/01/2020 |

Expected query result (as produced by PostgreSQL code):预期查询结果(由 PostgreSQL 代码生成):

 ------------------------------
|  account_id  |  start_date  |
 ------------------------------
|      15      |  01/03/2020  |

Issue: The SQL Server code above gets stuck in a loop and doesn't produce a result.问题:上面的 SQL Server 代码卡在循环中并且不会产生结果。

Description of the PostgreSQL code: PostgreSQL 代码说明:

  1. anchor block finds subs that are active as at the calc_date (30/09/2021) (id 5 & 6), and returns the min start_date (01/01/2021)锚块查找在 calc_date (30/09/2021) (id 5 & 6) 时处于活动状态的 subs,并返回 min start_date (01/01/2021)
  2. the recursion block then looks for any earlier subs that existed within the allowed_gap, which is 30 days prior to the min_start date found in 1).递归块然后查找存在于 allowed_gap 中的任何较早的 subs,这是在 1) 中找到的 min_start 日期之前 30 天。 id 4 meets this criteria, so the new min start_date is 01/06/2020 id 4 符合此条件,因此新的最小 start_date 是 01/06/2020
  3. recursion repeats and finds two subs within the allowed_gap (01/06/2020 - 30 days).递归重复并在 allowed_gap(01/06/2020 - 30 天)内找到两个子项。 Of these subs (id 2 & 3), the new min start_date is 01/03/2020在这些 subs (id 2 & 3) 中,新的最小 start_date 是 01/03/2020
  4. recursion fails to find an earlier sub within the allowed_gap (01/03/2020 - 30 days)递归未能在 allowed_gap 内找到较早的子项(01/03/2020 - 30 天)
  5. query returns a start date of 01/03/2020 for account_id 15查询为 account_id 15 返回 01/03/2020 的开始日期

Any help appreciated!任何帮助表示赞赏!

It seems the issue is related to the way SQL Server deals with recursive CTEs.问题似乎与 SQL Server 处理递归 CTE 的方式有关。

This is a type of gaps-and-islands problem, and does not actually require recursion.这是一种间隙和孤岛问题,实际上并不需要递归。

There are a number of solutions, here is one.有很多解决方案,这里是一个。 Given your requirement, there may be more efficient methods, but this should get you started.根据您的要求,可能有更有效的方法,但这应该可以帮助您入门。

  • Using LAG we identify rows which are within the specified gap of the next row使用LAG我们识别下一行的指定间隙内的行
  • We use a running COUNT to give each consecutive set of rows an ID我们使用一个正在运行的COUNT给每一个连续的行集一个 ID
  • We group by that ID, and take the minimum start_date , filtering out non-qualifying groups我们按该 ID 分组,并取最小start_date ,过滤掉不符合条件的组
  • Group again to get the minimum per account再次分组以获得每个帐户的最小值
DECLARE @allowed_gap int = 30,
        @calc_date datetime = cast('2021-09-30' as date);

WITH PrevValues AS (
    SELECT *,
      IsStart = CASE WHEN ISNULL(LAG(end_date) OVER (PARTITION BY account_id
                     ORDER BY start_date), '2099-01-01') < DATEADD(day, -@allowed_gap, start_date)
                     THEN 1 END
    FROM subscription
),
Groups AS (
    SELECT *,
      GroupId = COUNT(IsStart) OVER (PARTITION BY account_id
                     ORDER BY start_date ROWS UNBOUNDED PRECEDING)
    FROM PrevValues
),
ByGroup AS (
    SELECT
      account_id,
      GroupId,
      start_date = MIN(start_date)
    FROM Groups
    GROUP BY account_id, GroupId
    HAVING COUNT(CASE WHEN start_date <= @calc_date    
            and (end_date > @calc_date or end_date is null) THEN 1 END) > 0
)
SELECT
  account_id,
  start_date = MIN(start_date)
FROM ByGroup
GROUP BY account_id;

db<>fiddle 数据库<>小提琴

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM