简体   繁体   English

查找具有重叠范围的行

[英]Finding rows with overlapping ranges

Suppose I have data that looks like this: 假设我有这样的数据:

    create table tab(id smallint, nums int4range)
    insert into tab values (1, int4range(1,10)), (2, int4range(1,20)), (3,int4range(3,8)), (4,int4range(15,25)), (5,int4range(3,8))

So then select * from tab gives: 那么select * from tab给出:

 id |  nums
----+---------
  1 | [1,10)
  2 | [1,20)
  3 | [3,8)
  4 | [15,25)
  5 | [3,8)

I want a query that would find the ranges formed from the intersection of these ranges and the id's that fall into those sub-ranges. 我想要一个查询,它可以找到由这些范围的交集形成的范围以及属于这些子范围的id。 So the output would look like this in some form: 所以输出看起来像这样:

  nums  | ids
--------+------------
[1,3)   | 1, 2
[3,8)   | 1, 2, 3, 5
[8,10)  | 1, 2
[10,15) | 2
[15,20) | 2, 4
[20,25) | 4

I'm agnostic about the output of the 'ids' column -- an array is what seems logical, but I'm perfectly content with columns for the first, second, third ... nth id in a given range. 我对'ids'列的输出不可知 - 一个数组似乎是合乎逻辑的,但我完全满足给定范围内第一,第二,第三......第n个id的列。

I know that there won't be more than five IDs with overlapping ranges, so a fixed number of columns with nulls as needed is perfectly fine. 我知道不会有超过五个具有重叠范围的ID,因此根据需要使用空值的固定数量的列是完全正常的。 I also know that there won't be ranges with no IDs, if that matters. 我也知道,如果重要的话,不会有没有ID的范围。

Thanks for any help you can provide. 感谢您的任何帮助,您可以提供。

Overlapping ranges 重叠范围

If you want overlapping ranges: 如果您想要重叠范围:

WITH all_intersections
AS
(
SELECT
    t1.id AS id1, 
    t2.id AS id2, 
    t1.nums * /* intersection */ t2.nums AS nums 
FROM
    tab t1 CROSS JOIN tab t2
WHERE
    t1.id <= t2.id  /* Need only 1/2 + diagonal */
),
unique_nums AS
(
SELECT DISTINCT
    nums
FROM
    all_intersections
WHERE 
    nums <> 'empty' 
)
SELECT 
    nums, 
    array(SELECT DISTINCT id1 AS id 
            FROM all_intersections a1 
           WHERE a1.nums = a0.nums
          UNION
          SELECT DISTINCT id2 AS id 
            FROM all_intersections a2 
           WHERE a2.nums = a0.nums
          ORDER BY id
         ) AS ids
FROM
    unique_nums a0 
ORDER BY
    nums ;

That gives the result: 结果如下:

|    nums |     ids |
|---------|---------|
|  [1,10) |     1,2 |
|  [1,20) |       2 |
|   [3,8) | 1,2,3,5 |
| [15,20) |     2,4 |
| [15,25) |       4 |

You can check it at http://sqlfiddle.com/#!15/f83d5/5/0 您可以在http://sqlfiddle.com/#!15/f83d5/5/0查看

Non-overlapping ranges 非重叠范围

If you want to get non-overlapping ranges (like in your example), this can be done with the following CTE: 如果要获得非重叠范围(如示例所示),可以使用以下CTE完成此操作:

WITH bounds AS         /* all bounds */
(
SELECT DISTINCT
    lower(nums) AS b
FROM
    tab
UNION
SELECT DISTINCT
    upper(nums) AS b
FROM 
    tab
),
range_bounds AS        /* pairs of consecutive bounds */
(
SELECT
    b, lead(b) OVER (ORDER BY b) AS next_b 
FROM
    bounds
),
ranges AS              /* convert the pairs to ranges */
(
SELECT
    int4range(b, next_b) AS nums
FROM
    range_bounds 
WHERE
    next_b is not null  -- ignore last
)
SELECT                 /* take every range and find intersection with originals */
    nums, 
    ARRAY
      (SELECT id 
        FROM tab
       WHERE tab.nums && ranges.nums
      ) AS ids
FROM 
    ranges ;

The result of execution is: 执行结果是:

|    nums |     ids |
|---------|---------|
|   [1,3) |     1,2 |
|   [3,8) | 1,2,3,5 |
|  [8,10) |     1,2 |
| [10,15) |       2 |
| [15,20) |     2,4 |
| [20,25) |       4 |

Which is the result of your example. 这是你的例子的结果。

This assumes: 这假定:

  • All your ranges are constructed with including lower bound [ and excluding upper bound ) . 所有范围都包含下限[和上限除外) [It won't produce the right results in other cases.] [在其他情况下,它不会产生正确的结果。]

The idea is: 这个想法是:

  1. You take all bounds of the ranges (no matter whether lower or upper) 您可以获取范围的所有边界(无论是低位还是高位)
  2. Sort them 排序他们
  3. Make ranges from any two consecutive bounds 从任意两个连续边界创建范围
  4. See which original ranges they overlap with to construct the ids 查看它们重叠的原始范围以构建ids

Check it at http://sqlfiddle.com/#!15/f83d5/10/0 请访问http://sqlfiddle.com/#!15/f83d5/10/0查看

NOTE: This can be further compressed if you want to avoid the CTEs, by pure substitution: 注意:如果您想通过纯替换来避免CTE,可以进一步压缩

SELECT 
    nums, ARRAY
          (SELECT id 
             FROM tab
            WHERE tab.nums && ranges.nums
           ) AS ids
FROM 
    (SELECT
        int4range(b, next_b) AS nums
    FROM
        (SELECT
            b, lead(b) OVER (ORDER BY b) AS next_b 
        FROM
            (SELECT DISTINCT lower(nums) AS b FROM tab
             UNION
             SELECT DISTINCT upper(nums) AS b FROM tab
            ) AS bounds
        ) AS range_bounds 
    WHERE
        next_b is not null
    ) AS ranges 
ORDER BY
  nums ;

Check it at http://sqlfiddle.com/#!15/f83d5/15/0 请访问http://sqlfiddle.com/#!15/f83d5/15/0查看

SELECT uniquenums.nums, array_agg(id) ids
FROM (
        SELECT numsgroup, int4range(min(boundary), max(boundary)) nums
        FROM (
                SELECT boundary, row_number() OVER (ORDER BY boundary, seriesvalue) / 2 AS numsgroup
                FROM (
                        SELECT DISTINCT upper(nums) AS boundary FROM tab
                        UNION
                        SELECT DISTINCT lower(nums) AS boundary FROM tab
                ) AS A
                JOIN (
                        SELECT generate_series(1, 2) AS seriesvalue
                ) AS B ON true
        ) AS A
        GROUP BY numsgroup
        HAVING COUNT(*) > 1
) AS uniquenums
JOIN tab ON tab.nums && uniquenums.nums
GROUP BY uniquenums.nums
ORDER BY uniquenums.nums

How does it work? 它是如何工作的?

  1. Extract all distinct boundaries regardless of lower or upper 无论下层还是上层,都可以提取所有不同的边界
  2. Duplicate each boundary by joining a helper table expression with two rows 通过将辅助表表达式与两行连接来复制每个边界
  3. Assing each resulting row a group number such that two consecutive boundaries are assigned the same group number 为每个结果行分配一个组号,以便为两个连续的边界分配相同的组号
  4. Group by these numbers and build new ranges using the consecutive boundaries 按这些数字分组并使用连续边界构建新范围
  5. Find ranges in tab that are overlapping the ranges just calculated 在选项卡中查找与刚刚计算的范围重叠的范围
  6. Aggregate the IDs of the found ranges in an array 聚合数组中找到的范围的ID
select rng as nums, array_agg(id) as ids
from (  
    select int4range(n, lead(n) over (order by n)) as rng
    from (  
        select distinct lower(nums) n
        from tab
        union
        select distinct upper(nums) n
        from tab
        ) s
    ) s
join tab on rng && nums
group by 1
order by 1;

  nums   |    ids    
---------+-----------
 [1,3)   | {1,2}
 [3,8)   | {1,2,3,5}
 [8,10)  | {1,2}
 [10,15) | {2}
 [15,20) | {2,4}
 [20,25) | {4}
(6 rows)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM