简体   繁体   English

MySQL 5.6 具有可选深度限制的递归查询

[英]Recursive query with optional depth limit with MySQL 5.6

I have two table schemas (MySQL 5.6 so no CTE), roughly looking like this:我有两个表模式(MySQL 5.6 所以没有 CTE),大致如下所示:

CREATE TABLE nodes (
  node_id INT PRIMARY KEY,
  name VARCHAR(10)
);

CREATE TABLE edges (
  edge_id INT PRIMARY KEY,
  source INT,
  target INT,
  FOREIGN KEY (source) REFERENCES nodes(node_id),
  FOREIGN KEY (target) REFERENCES nodes(node_id)
);

In our design, a logical edge between two nodes (logically n1 -> n2 ) is actually represented as ( n1 -> proxy node -> n2 ) in the db.在我们的设计中,两个节点之间的逻辑边(逻辑上n1 -> n2 )实际上在数据库中表示为( n1 -> proxy node -> n2 )。 The reason we use two edges and a proxy node for a logical edge is so that we can store properties on the edge.我们使用两条边和一个代理节点作为逻辑边的原因是我们可以在边上存储属性。 Therefore, when a client queries for two nodes connected by an edge, the query is translated to query three connected nodes instead.因此,当客户端查询由一条边连接的两个节点时,查询被转换为查询三个连接的节点。

I have written a query to get a path with a fixed length.我写了一个查询来获取固定长度的路径。 For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with exactly 5 edges on the path."例如,“给我所有路径,这些路径以具有某些属性的节点开始,并以具有某些属性的节点结束,路径上正好有 5 条边。” This is done without using recursion on the SQL side;这是在 SQL 端不使用递归的情况下完成的; I just generate a long query programmatically with the specified fixed length.我只是以指定的固定长度以编程方式生成一个长查询。

The challenge is, we want to support querying of a variable-length path.挑战在于,我们希望支持对可变长度路径的查询。 For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with no fewer than 3 edges and no more than 10 edges on the path."例如,“给我所有以具有某些属性的节点开始,并以具有某些属性的节点结束的所有路径,路径上的边不少于 3 条,边不超过 10 条。” Is this feasible without (or even with) CTE?这在没有(或什至有)CTE 的情况下可行吗?

EDIT:编辑:

Some sample data:一些示例数据:

-- Logical nodes are 1, 3, 5, 7, 9, 11. The rest are proxy nodes.
INSERT INTO nodes VALUES
  (1, 'foo'),
  (2, '_proxy_'),
  (3, 'foo'),
  (4, '_proxy_'),
  (5, 'bar'),
  (6, '_proxy_'),
  (7, 'bar'),
  (8, '_proxy_'),
  (9, 'bar'),
  (10, '_proxy_'),
  (11, 'bar');

-- Connects 1 -> 2 -> ... -> 11.
INSERT INTO edges VALUES
  (1, 1, 2),
  (2, 2, 3),
  (3, 3, 4),
  (4, 4, 5),
  (5, 5, 6),
  (6, 6, 7),
  (7, 7, 8),
  (8, 8, 9),
  (9, 9, 10),
  (10, 10, 11);

The query can be, "select the ID and names of all the nodes on a path such that the path starts with a node named 'foo' and ends with a node named 'bar', with at least 2 nodes and at most 4 nodes on the path."查询可以是,“选择路径上所有节点的 ID 和名称,使得路径以名为 'foo' 的节点开始,以名为 'bar' 的节点结束,至少有 2 个节点,最多有 4 个节点在路上。” Such paths include 1 -> 3 -> 5 , 1 -> 3 -> 5 -> 7 , 3 -> 5 , 3 -> 5 -> 7 , and 3 -> 5 -> 7 -> 9 .此类路径包括1 -> 3 -> 51 -> 3 -> 5 -> 73 -> 53 -> 5 -> 73 -> 5 -> 7 -> 9 So the result set should include the IDs and names of nodes 1, 3, 5, 7, 9.所以结果集应该包括节点 1、3、5、7、9 的 ID 和名称。

The following query returns all paths of interest in comma separated strings.以下查询以逗号分隔的字符串形式返回所有感兴趣的路径。

with recursive rcte as (
  select e.source, e.target, 1 as depth, concat(e.source) as path
  from nodes n
  join edges e on e.source = n.node_id
  where n.name = 'foo' -- start node name
  union all
  select e.source, e.target, r.depth + 1 as depth, concat_ws(',', r.path, e.source)
  from rcte r
  join edges p on p.source = r.target -- p for proxy
  join edges e on e.source = p.target
  where r.depth < 4 -- max path nodes
) 
select r.path
from rcte r
join nodes n on n.node_id = r.source
where r.depth >= 2 -- min path nodes
  and n.name = 'bar' -- end node name

The result looks like this:结果如下所示:

| path    |
| ------- |
| 3,5     |
| 1,3,5   |
| 3,5,7   |
| 1,3,5,7 |
| 3,5,7,9 |

View on DB Fiddle在 DB Fiddle 上查看

You can now parse the strings in application code and merge/union the arrays.您现在可以解析应用程序代码中的字符串并合并/联合数组。 If you only want the contained node ids, you can also change the outer query to:如果您只想要包含的节点 ID,您还可以将外部查询更改为:

select distinct r2.source
from rcte r
join nodes n on n.node_id = r.source
join rcte r2 on find_in_set(r2.source, r.path)
where r.depth >= 2 -- min path nodes
  and n.name = 'bar' -- end node name

Result:结果:

| source |
| ------ |
| 1      |
| 3      |
| 5      |
| 7      |
| 9      |

View on DB Fiddle在 DB Fiddle 上查看

Note that a JOIN on FIND_IN_SET() might be slow, if rcte contains too many rows.请注意,如果rcte包含太多行,则rcte FIND_IN_SET()上的 JOIN 可能会很慢。 I would rather do this step in application code, which should be quite simple in a procedural language.我宁愿在应用程序代码中执行这一步,这在程序语言中应该非常简单。

MySQL 5.6 solution MySQL 5.6 解决方案

Prior to MySQL 8.0 and MariaDB 10.2 there was no way for recursions.在 MySQL 8.0 和 MariaDB 10.2 之前,无法进行递归。 Farther there are many other limitations, which make a workaround difficult.此外还有许多其他限制,这使得解决方法变得困难。 For example:例如:

  • No dynamic queries in stored functions存储函数中没有动态查询
  • No way to use a temporary table twice in a single statement无法在单个语句中两次使用临时表
  • No TEXT type in memmory engine memmory引擎中没有 TEXT 类型

However - an RCTE can be emulated in a stored procedure moving rows between two (temporary) tables.但是 - 可以在两个(临时)表之间移动行的存储过程中模拟 RCTE。 The following procedure does that:以下过程是这样做的:

delimiter //
create procedure get_path(
  in source_name text,
  in target_name text,
  in min_depth int,
  in max_depth int
)
begin
  create temporary table tmp_sources (id int, depth int, path text) engine=innodb;
  create temporary table tmp_targets like tmp_sources;

  insert into tmp_sources (id, depth, path)
    select n.node_id, 1, n.node_id
    from nodes n
    where n.name = source_name;

  set @depth = 1;
  while @depth < max_depth do
    set @depth = @depth+1;
    insert into tmp_targets(id, depth, path)
      select e.target, @depth, concat_ws(',', t.path, e.target)
      from tmp_sources t
      join edges p on p.source = t.id
      join edges e on e.source = p.target
      where t.depth = @depth - 1;

    insert into tmp_sources (id, depth, path)
      select id, depth, path
      from tmp_targets;

    truncate tmp_targets;
  end while;

  select t.path
    from tmp_sources t
    join nodes n on n.node_id = t.id
    where n.name = target_name
      and t.depth >= min_depth;
end //
delimiter ;

Use it as:将其用作:

call get_path('foo', 'bar', 2, 4)

Result:结果:

| path    |
| ------- |
| 3,5     |
| 1,3,5   |
| 3,5,7   |
| 1,3,5,7 |
| 3,5,7,9 |

View on DB Fiddle在 DB Fiddle 上查看

This is far from being optimal.这远非最佳。 If the result has many or long paths, you might need to define some indexes on the temprary tables.如果结果有很多或很长的路径,您可能需要在临时表上定义一些索引。 Also I don't like the idea of creating (temporary) tables in stroed procedures.我也不喜欢在 stroed 过程中创建(临时)表的想法。 See it as "proof of concept".将其视为“概念证明”。 Use it on your own risk.自行承担使用风险。

I've solved this sort of problem with a transitive closure table .我已经用传递闭包表解决了这类问题。 This enumerates every direct and indirect path through your nodes.这枚举了通过您的节点的每条直接和间接路径。 The edges you currently have are paths of length 1. But you also need paths of length 0 (ie, a node has a path to itself), and then every path from one source node to an eventual target node, for paths with length greater than 1.您当前拥有的边是长度为 1 的路径。但您还需要长度为 0 的路径(即,一个节点有一条到自身的路径),然后是从一个源节点到最终目标节点的每条路径,对于长度更大的路径比 1。

create table closure (
  source int,
  target int,
  length int,
  is_direct bool,
  primary key (source, target)
);

insert into closure values
  (1, 1, 0, false), (1, 2, 1, true), (1, 3, 2, false), (1, 4, 3, false), (1, 5, 4, false), (1, 6, 5, false), (1, 7, 6, false), (1, 8, 7, false), (1, 9, 8, false), (1, 10, 9, false), (1, 11, 10, false),
  (2, 2, 0, false), (2, 3, 1, true), (2, 4, 2, false), (2, 5, 3, false), (2, 6, 4, false), (2, 7, 5, false), (2, 8, 6, false), (2, 9, 7, false), (2, 10, 8, false), (2, 11, 9, false),
  (3, 3, 0, false), (3, 4, 1, true), (3, 5, 2, false), (3, 6, 3, false), (3, 7, 4, false), (3, 8, 5, false), (3, 9, 6, false), (3, 10, 7, false), (3, 11, 8, false),
  (4, 4, 0, false), (4, 5, 1, true), (4, 6, 2, false), (4, 7, 3, false), (4, 8, 4, false), (4, 9, 5, false), (4, 10, 6, false), (4, 11, 7, false),
  (5, 5, 0, false), (5, 6, 1, true), (5, 7, 2, false), (5, 8, 3, false), (5, 9, 4, false), (5, 10, 5, false), (5, 11, 6, false),
  (6, 6, 0, false), (6, 7, 1, true), (6, 8, 2, false), (6, 9, 3, false), (6, 10, 4, false), (6, 11, 5, false),
  (7, 7, 0, false), (7, 8, 1, true), (7, 9, 2, false), (7, 10, 3, false), (7, 11, 4, false),
  (8, 8, 0, false), (8, 9, 1, true), (8, 10, 2, false), (8, 11, 3, false),
  (9, 9, 0, false), (9, 10, 1, true), (9, 11, 2, true),
  (10, 10, 0, false), (10, 11, 1, true),
  (11, 11, 0, false);

Now we can write your query:现在我们可以编写您的查询:

select the ID and names of all the nodes on a path such that the path starts with a node named 'foo' and ends with a node named 'bar', with at least 2 nodes and at most 4 nodes on the path.选择路径上所有节点的 ID 和名称,使得路径以名为“foo”的节点开始,以名为“bar”的节点结束,路径上至少有 2 个节点,最多有 4 个节点。

I translate this into paths of length 4,6,8 because you have a proxy node in between each, so it really takes two hops to go between nodes.我将其转换为长度为 4、6、8 的路径,因为每个路径之间都有一个代理节点,因此在节点之间移动确实需要两跳。

select source.node_id as source_node, target.node_id as target_node, c.length
from nodes as source
join closure as c on source.node_id = c.source
join nodes as target on c.target = target.node_id
where source.name='foo' and target.name = 'bar' and c.length in (4,6,8)

Here's the result, which in fact also includes node 11:这是结果,实际上还包括节点 11:

+-------------+-------------+--------+
| source_node | target_node | length |
+-------------+-------------+--------+
|           1 |           5 |      4 |
|           1 |           7 |      6 |
|           1 |           9 |      8 |
|           3 |           7 |      4 |
|           3 |           9 |      6 |
|           3 |          11 |      8 |
+-------------+-------------+--------+

Re comment from Paul Spiegel:保罗·斯皮格尔 (Paul Spiegel) 的评论:

Once you have the endpoints of the path, you can query the closure for all paths that start at the source, and end at a node that also has a path to the target.一旦你有了路径的端点,你就可以查询从源开始的所有路径的闭包,结束于一个也有到目标路径的节点。

select source.node_id as source_node, target.node_id as target_node,
  group_concat(i1.target order by i1.target) as interim_nodes
from nodes as source
join closure as c on source.node_id = c.source
join nodes as target on c.target = target.node_id
join closure as i1 on source.node_id = i1.source
join closure as i2 on target.node_id = i2.target and i1.target = i2.source
where source.name='foo' and target.name = 'bar' and c.length in (4,6,8)
group by source.node_id, target.node_id

+-------------+-------------+---------------------+
| source_node | target_node | interim_nodes       |
+-------------+-------------+---------------------+
|           1 |           5 | 1,2,3,4,5           |
|           1 |           7 | 1,2,3,4,5,6,7       |
|           1 |           9 | 1,2,3,4,5,6,7,8,9   |
|           3 |           7 | 3,4,5,6,7           |
|           3 |           9 | 3,4,5,6,7,8,9       |
|           3 |          11 | 3,4,5,6,7,8,9,10,11 |
+-------------+-------------+---------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM