[英]Finding rows with overlapping ranges
Suppose I have data that looks like this: 假设我有这样的数据:
create table tab(id smallint, nums int4range)
insert into tab values (1, int4range(1,10)), (2, int4range(1,20)), (3,int4range(3,8)), (4,int4range(15,25)), (5,int4range(3,8))
So then select * from tab
gives: 那么
select * from tab
给出:
id | nums
----+---------
1 | [1,10)
2 | [1,20)
3 | [3,8)
4 | [15,25)
5 | [3,8)
I want a query that would find the ranges formed from the intersection of these ranges and the id's that fall into those sub-ranges. 我想要一个查询,它可以找到由这些范围的交集形成的范围以及属于这些子范围的id。 So the output would look like this in some form:
所以输出看起来像这样:
nums | ids
--------+------------
[1,3) | 1, 2
[3,8) | 1, 2, 3, 5
[8,10) | 1, 2
[10,15) | 2
[15,20) | 2, 4
[20,25) | 4
I'm agnostic about the output of the 'ids' column -- an array is what seems logical, but I'm perfectly content with columns for the first, second, third ... nth id in a given range. 我对'ids'列的输出不可知 - 一个数组似乎是合乎逻辑的,但我完全满足给定范围内第一,第二,第三......第n个id的列。
I know that there won't be more than five IDs with overlapping ranges, so a fixed number of columns with nulls as needed is perfectly fine. 我知道不会有超过五个具有重叠范围的ID,因此根据需要使用空值的固定数量的列是完全正常的。 I also know that there won't be ranges with no IDs, if that matters.
我也知道,如果重要的话,不会有没有ID的范围。
Thanks for any help you can provide. 感谢您的任何帮助,您可以提供。
If you want overlapping ranges: 如果您想要重叠范围:
WITH all_intersections
AS
(
SELECT
t1.id AS id1,
t2.id AS id2,
t1.nums * /* intersection */ t2.nums AS nums
FROM
tab t1 CROSS JOIN tab t2
WHERE
t1.id <= t2.id /* Need only 1/2 + diagonal */
),
unique_nums AS
(
SELECT DISTINCT
nums
FROM
all_intersections
WHERE
nums <> 'empty'
)
SELECT
nums,
array(SELECT DISTINCT id1 AS id
FROM all_intersections a1
WHERE a1.nums = a0.nums
UNION
SELECT DISTINCT id2 AS id
FROM all_intersections a2
WHERE a2.nums = a0.nums
ORDER BY id
) AS ids
FROM
unique_nums a0
ORDER BY
nums ;
That gives the result: 结果如下:
| nums | ids |
|---------|---------|
| [1,10) | 1,2 |
| [1,20) | 2 |
| [3,8) | 1,2,3,5 |
| [15,20) | 2,4 |
| [15,25) | 4 |
You can check it at http://sqlfiddle.com/#!15/f83d5/5/0 您可以在http://sqlfiddle.com/#!15/f83d5/5/0查看
If you want to get non-overlapping ranges (like in your example), this can be done with the following CTE: 如果要获得非重叠范围(如示例所示),可以使用以下CTE完成此操作:
WITH bounds AS /* all bounds */
(
SELECT DISTINCT
lower(nums) AS b
FROM
tab
UNION
SELECT DISTINCT
upper(nums) AS b
FROM
tab
),
range_bounds AS /* pairs of consecutive bounds */
(
SELECT
b, lead(b) OVER (ORDER BY b) AS next_b
FROM
bounds
),
ranges AS /* convert the pairs to ranges */
(
SELECT
int4range(b, next_b) AS nums
FROM
range_bounds
WHERE
next_b is not null -- ignore last
)
SELECT /* take every range and find intersection with originals */
nums,
ARRAY
(SELECT id
FROM tab
WHERE tab.nums && ranges.nums
) AS ids
FROM
ranges ;
The result of execution is: 执行结果是:
| nums | ids |
|---------|---------|
| [1,3) | 1,2 |
| [3,8) | 1,2,3,5 |
| [8,10) | 1,2 |
| [10,15) | 2 |
| [15,20) | 2,4 |
| [20,25) | 4 |
Which is the result of your example. 这是你的例子的结果。
This assumes: 这假定:
[
and excluding upper bound )
. [
和上限除外)
。 [It won't produce the right results in other cases.] The idea is: 这个想法是:
ids
ids
Check it at http://sqlfiddle.com/#!15/f83d5/10/0 请访问http://sqlfiddle.com/#!15/f83d5/10/0查看
NOTE: This can be further compressed if you want to avoid the CTEs, by pure substitution: 注意:如果您想通过纯替换来避免CTE,可以进一步压缩 :
SELECT
nums, ARRAY
(SELECT id
FROM tab
WHERE tab.nums && ranges.nums
) AS ids
FROM
(SELECT
int4range(b, next_b) AS nums
FROM
(SELECT
b, lead(b) OVER (ORDER BY b) AS next_b
FROM
(SELECT DISTINCT lower(nums) AS b FROM tab
UNION
SELECT DISTINCT upper(nums) AS b FROM tab
) AS bounds
) AS range_bounds
WHERE
next_b is not null
) AS ranges
ORDER BY
nums ;
Check it at http://sqlfiddle.com/#!15/f83d5/15/0 请访问http://sqlfiddle.com/#!15/f83d5/15/0查看
SELECT uniquenums.nums, array_agg(id) ids
FROM (
SELECT numsgroup, int4range(min(boundary), max(boundary)) nums
FROM (
SELECT boundary, row_number() OVER (ORDER BY boundary, seriesvalue) / 2 AS numsgroup
FROM (
SELECT DISTINCT upper(nums) AS boundary FROM tab
UNION
SELECT DISTINCT lower(nums) AS boundary FROM tab
) AS A
JOIN (
SELECT generate_series(1, 2) AS seriesvalue
) AS B ON true
) AS A
GROUP BY numsgroup
HAVING COUNT(*) > 1
) AS uniquenums
JOIN tab ON tab.nums && uniquenums.nums
GROUP BY uniquenums.nums
ORDER BY uniquenums.nums
How does it work? 它是如何工作的?
select rng as nums, array_agg(id) as ids
from (
select int4range(n, lead(n) over (order by n)) as rng
from (
select distinct lower(nums) n
from tab
union
select distinct upper(nums) n
from tab
) s
) s
join tab on rng && nums
group by 1
order by 1;
nums | ids
---------+-----------
[1,3) | {1,2}
[3,8) | {1,2,3,5}
[8,10) | {1,2}
[10,15) | {2}
[15,20) | {2,4}
[20,25) | {4}
(6 rows)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.