MySQL 5.6 具有可选深度限制的递归查询

Question

我有两个表模式（MySQL 5.6 所以没有 CTE），大致如下所示：

CREATE TABLE nodes (
  node_id INT PRIMARY KEY,
  name VARCHAR(10)
);

CREATE TABLE edges (
  edge_id INT PRIMARY KEY,
  source INT,
  target INT,
  FOREIGN KEY (source) REFERENCES nodes(node_id),
  FOREIGN KEY (target) REFERENCES nodes(node_id)
);

在我们的设计中，两个节点之间的逻辑边（逻辑上n1 -> n2 ）实际上在数据库中表示为（ n1 -> proxy node -> n2 ）。 我们使用两条边和一个代理节点作为逻辑边的原因是我们可以在边上存储属性。 因此，当客户端查询由一条边连接的两个节点时，查询被转换为查询三个连接的节点。

我写了一个查询来获取固定长度的路径。 例如，“给我所有路径，这些路径以具有某些属性的节点开始，并以具有某些属性的节点结束，路径上正好有 5 条边。” 这是在 SQL 端不使用递归的情况下完成的； 我只是以指定的固定长度以编程方式生成一个长查询。

挑战在于，我们希望支持对可变长度路径的查询。 例如，“给我所有以具有某些属性的节点开始，并以具有某些属性的节点结束的所有路径，路径上的边不少于 3 条，边不超过 10 条。” 这在没有（或什至有）CTE 的情况下可行吗？

编辑：

一些示例数据：

-- Logical nodes are 1, 3, 5, 7, 9, 11. The rest are proxy nodes.
INSERT INTO nodes VALUES
  (1, 'foo'),
  (2, '_proxy_'),
  (3, 'foo'),
  (4, '_proxy_'),
  (5, 'bar'),
  (6, '_proxy_'),
  (7, 'bar'),
  (8, '_proxy_'),
  (9, 'bar'),
  (10, '_proxy_'),
  (11, 'bar');

-- Connects 1 -> 2 -> ... -> 11.
INSERT INTO edges VALUES
  (1, 1, 2),
  (2, 2, 3),
  (3, 3, 4),
  (4, 4, 5),
  (5, 5, 6),
  (6, 6, 7),
  (7, 7, 8),
  (8, 8, 9),
  (9, 9, 10),
  (10, 10, 11);

查询可以是，“选择路径上所有节点的 ID 和名称，使得路径以名为 'foo' 的节点开始，以名为 'bar' 的节点结束，至少有 2 个节点，最多有 4 个节点在路上。” 此类路径包括1 -> 3 -> 5 、 1 -> 3 -> 5 -> 7 、 3 -> 5 、 3 -> 5 -> 7和3 -> 5 -> 7 -> 9 。 所以结果集应该包括节点 1、3、5、7、9 的 ID 和名称。

Answer 1

以下查询以逗号分隔的字符串形式返回所有感兴趣的路径。

with recursive rcte as (
  select e.source, e.target, 1 as depth, concat(e.source) as path
  from nodes n
  join edges e on e.source = n.node_id
  where n.name = 'foo' -- start node name
  union all
  select e.source, e.target, r.depth + 1 as depth, concat_ws(',', r.path, e.source)
  from rcte r
  join edges p on p.source = r.target -- p for proxy
  join edges e on e.source = p.target
  where r.depth < 4 -- max path nodes
) 
select r.path
from rcte r
join nodes n on n.node_id = r.source
where r.depth >= 2 -- min path nodes
  and n.name = 'bar' -- end node name

结果如下所示：

| path    |
| ------- |
| 3,5     |
| 1,3,5   |
| 3,5,7   |
| 1,3,5,7 |
| 3,5,7,9 |

在 DB Fiddle 上查看

您现在可以解析应用程序代码中的字符串并合并/联合数组。 如果您只想要包含的节点 ID，您还可以将外部查询更改为：

select distinct r2.source
from rcte r
join nodes n on n.node_id = r.source
join rcte r2 on find_in_set(r2.source, r.path)
where r.depth >= 2 -- min path nodes
  and n.name = 'bar' -- end node name

结果：

| source |
| ------ |
| 1      |
| 3      |
| 5      |
| 7      |
| 9      |

在 DB Fiddle 上查看

请注意，如果rcte包含太多行，则rcte FIND_IN_SET()上的 JOIN 可能会很慢。 我宁愿在应用程序代码中执行这一步，这在程序语言中应该非常简单。

MySQL 5.6 解决方案

在 MySQL 8.0 和 MariaDB 10.2 之前，无法进行递归。 此外还有许多其他限制，这使得解决方法变得困难。 例如：

存储函数中没有动态查询
无法在单个语句中两次使用临时表
memmory引擎中没有 TEXT 类型

但是 - 可以在两个（临时）表之间移动行的存储过程中模拟 RCTE。 以下过程是这样做的：

delimiter //
create procedure get_path(
  in source_name text,
  in target_name text,
  in min_depth int,
  in max_depth int
)
begin
  create temporary table tmp_sources (id int, depth int, path text) engine=innodb;
  create temporary table tmp_targets like tmp_sources;

  insert into tmp_sources (id, depth, path)
    select n.node_id, 1, n.node_id
    from nodes n
    where n.name = source_name;

  set @depth = 1;
  while @depth < max_depth do
    set @depth = @depth+1;
    insert into tmp_targets(id, depth, path)
      select e.target, @depth, concat_ws(',', t.path, e.target)
      from tmp_sources t
      join edges p on p.source = t.id
      join edges e on e.source = p.target
      where t.depth = @depth - 1;

    insert into tmp_sources (id, depth, path)
      select id, depth, path
      from tmp_targets;

    truncate tmp_targets;
  end while;

  select t.path
    from tmp_sources t
    join nodes n on n.node_id = t.id
    where n.name = target_name
      and t.depth >= min_depth;
end //
delimiter ;

将其用作：

call get_path('foo', 'bar', 2, 4)

结果：

| path    |
| ------- |
| 3,5     |
| 1,3,5   |
| 3,5,7   |
| 1,3,5,7 |
| 3,5,7,9 |

在 DB Fiddle 上查看

这远非最佳。 如果结果有很多或很长的路径，您可能需要在临时表上定义一些索引。 我也不喜欢在 stroed 过程中创建（临时）表的想法。 将其视为“概念证明”。 自行承担使用风险。

Answer 2

我已经用传递闭包表解决了这类问题。 这枚举了通过您的节点的每条直接和间接路径。 您当前拥有的边是长度为 1 的路径。但您还需要长度为 0 的路径（即，一个节点有一条到自身的路径），然后是从一个源节点到最终目标节点的每条路径，对于长度更大的路径比 1。

create table closure (
  source int,
  target int,
  length int,
  is_direct bool,
  primary key (source, target)
);

insert into closure values
  (1, 1, 0, false), (1, 2, 1, true), (1, 3, 2, false), (1, 4, 3, false), (1, 5, 4, false), (1, 6, 5, false), (1, 7, 6, false), (1, 8, 7, false), (1, 9, 8, false), (1, 10, 9, false), (1, 11, 10, false),
  (2, 2, 0, false), (2, 3, 1, true), (2, 4, 2, false), (2, 5, 3, false), (2, 6, 4, false), (2, 7, 5, false), (2, 8, 6, false), (2, 9, 7, false), (2, 10, 8, false), (2, 11, 9, false),
  (3, 3, 0, false), (3, 4, 1, true), (3, 5, 2, false), (3, 6, 3, false), (3, 7, 4, false), (3, 8, 5, false), (3, 9, 6, false), (3, 10, 7, false), (3, 11, 8, false),
  (4, 4, 0, false), (4, 5, 1, true), (4, 6, 2, false), (4, 7, 3, false), (4, 8, 4, false), (4, 9, 5, false), (4, 10, 6, false), (4, 11, 7, false),
  (5, 5, 0, false), (5, 6, 1, true), (5, 7, 2, false), (5, 8, 3, false), (5, 9, 4, false), (5, 10, 5, false), (5, 11, 6, false),
  (6, 6, 0, false), (6, 7, 1, true), (6, 8, 2, false), (6, 9, 3, false), (6, 10, 4, false), (6, 11, 5, false),
  (7, 7, 0, false), (7, 8, 1, true), (7, 9, 2, false), (7, 10, 3, false), (7, 11, 4, false),
  (8, 8, 0, false), (8, 9, 1, true), (8, 10, 2, false), (8, 11, 3, false),
  (9, 9, 0, false), (9, 10, 1, true), (9, 11, 2, true),
  (10, 10, 0, false), (10, 11, 1, true),
  (11, 11, 0, false);

现在我们可以编写您的查询：

选择路径上所有节点的 ID 和名称，使得路径以名为“foo”的节点开始，以名为“bar”的节点结束，路径上至少有 2 个节点，最多有 4 个节点。

我将其转换为长度为 4、6、8 的路径，因为每个路径之间都有一个代理节点，因此在节点之间移动确实需要两跳。

select source.node_id as source_node, target.node_id as target_node, c.length
from nodes as source
join closure as c on source.node_id = c.source
join nodes as target on c.target = target.node_id
where source.name='foo' and target.name = 'bar' and c.length in (4,6,8)

这是结果，实际上还包括节点 11：

+-------------+-------------+--------+
| source_node | target_node | length |
+-------------+-------------+--------+
|           1 |           5 |      4 |
|           1 |           7 |      6 |
|           1 |           9 |      8 |
|           3 |           7 |      4 |
|           3 |           9 |      6 |
|           3 |          11 |      8 |
+-------------+-------------+--------+

保罗·斯皮格尔 (Paul Spiegel) 的评论：

一旦你有了路径的端点，你就可以查询从源开始的所有路径的闭包，结束于一个也有到目标路径的节点。

select source.node_id as source_node, target.node_id as target_node,
  group_concat(i1.target order by i1.target) as interim_nodes
from nodes as source
join closure as c on source.node_id = c.source
join nodes as target on c.target = target.node_id
join closure as i1 on source.node_id = i1.source
join closure as i2 on target.node_id = i2.target and i1.target = i2.source
where source.name='foo' and target.name = 'bar' and c.length in (4,6,8)
group by source.node_id, target.node_id

+-------------+-------------+---------------------+
| source_node | target_node | interim_nodes       |
+-------------+-------------+---------------------+
|           1 |           5 | 1,2,3,4,5           |
|           1 |           7 | 1,2,3,4,5,6,7       |
|           1 |           9 | 1,2,3,4,5,6,7,8,9   |
|           3 |           7 | 3,4,5,6,7           |
|           3 |           9 | 3,4,5,6,7,8,9       |
|           3 |          11 | 3,4,5,6,7,8,9,10,11 |
+-------------+-------------+---------------------+

MySQL 5.6 具有可选深度限制的递归查询

问题描述

2 个解决方案

解决方案1
2 2020-02-01 10:29:02

MySQL 5.6 解决方案

解决方案2
2 2020-02-01 22:37:45

MySQL 5.6 具有可选深度限制的递归查询

问题描述

2 个解决方案

解决方案1 2 2020-02-01 10:29:02

MySQL 5.6 解决方案

解决方案2 2 2020-02-01 22:37:45

解决方案1
2 2020-02-01 10:29:02

解决方案2
2 2020-02-01 22:37:45