简体   繁体   English

Postgresql递归自联接

[英]Postgresql recursive self join

My table in postgres looks like below, Table stores a chain sort of relation between IDs and I want to have a query which can produce the result like "vc1" -> "rc7" or "vc3"->"rc7", I will only query on the IDs in first column ID1 我在postgres中的表如下所示,表存储ID之间的链式关系,我希望有一个查询可以产生结果,如“vc1” - >“rc7”或“vc3” - >“rc7”,我会仅查询第一列ID1中的ID

ID1     ID2
"vc1"   "vc2"
"vc2"   "vc3"
"vc3"   "vc4"
"vc4"   "rc7"

So I want to supply some "head" id here for which I have to fetch the tail(last in the chain) id. 所以我想在这里提供一些“头”ID,我必须获取尾部(链中的最后一个)id。

This is a classic use of a simple recursive common table expression ( WITH RECURSIVE ), available in PostgreSQL 8.4 and later. 这是PostgreSQL 8.4及更高版本中提供的简单递归公用表表达式( WITH RECURSIVE )的经典用法。

Demonstrated here: http://sqlfiddle.com/#!12/78e15/9 在此处演示: http//sqlfiddle.com/#!12/78e15/9

Given the sample data as SQL: 给定样本数据为SQL:

CREATE TABLE Table1
    ("ID1" text, "ID2" text)
;

INSERT INTO Table1
    ("ID1", "ID2")
VALUES
    ('vc1', 'vc2'),
    ('vc2', 'vc3'),
    ('vc3', 'vc4'),
    ('vc4', 'rc7')
;

You could write: 你可以写:

WITH RECURSIVE chain(from_id, to_id) AS (
  SELECT NULL, 'vc2'
  UNION
  SELECT c.to_id, t."ID2"
  FROM chain c
  LEFT OUTER JOIN Table1 t ON (t."ID1" = to_id)
  WHERE c.to_id IS NOT NULL
)
SELECT from_id FROM chain WHERE to_id IS NULL;

What this does is iteratively walk the chain, adding each row to the chain table as from- and to-pointers. 这样做是迭代地遍历链,将每一行添加到chain中作为从指针到指针。 When it encounters a row for which the 'to' reference doesn't exist it will add a null 'to' reference for that row. 当遇到“to”引用不存在的行时,它将为该行添加一个null''引用。 The next iteration will notice that the 'to' reference is null and produce zero rows, which causes the iteration to end. 下一次迭代将注意到'to'引用为null并产生零行,这导致迭代结束。

The outer query then picks up rows that've been determined to be the end of the chain by having a non-existent to_id. 然后,外部查询通过不存在的to_id来获取已被确定为链的末尾的行。

It takes a bit of effort to get your head around recursive CTEs. 需要花费一些精力来了解递归CTE。 They key things to understand are: 他们要理解的关键事项是:

  • They start with the output of an initial query, which they repeatedly union with the output of the "recursive part" (the query after the UNION or UNION ALL ) until the recursive part adds no rows. 它们以初始查询的输出开始,它们反复与“递归部分”( UNIONUNION ALL之后的查询)的输出结合,直到递归部分不添加任何行。 That stops iteration. 这会停止迭代。

  • They aren't really recursive, more iterative, though they're good for the sorts of things you might use recursion for. 它们并不是真正的递归,更具迭代性,尽管它们对于您可能使用递归的各种事物都有好处。

So you're basically building a table in a loop. 所以你基本上是在循环中构建一个表。 You can't delete rows or change them, only add new ones, so you generally need an outer query that filters the results to get the result rows you want. 您不能删除行或更改它们,只能添加新行,因此通常需要一个外部查询来过滤结果以获得所需的结果行。 You'll often add extra columns containing intermediate data that you use to track the state of the iteration, control stop-conditions, etc. 您经常会添加包含中间数据的额外列,您可以使用这些列来跟踪迭代状态,控制停止条件等。

It can help to look at the unfiltered result. 它可以帮助查看未过滤的结果。 If I replace the final summary query with a simple SELECT * FROM chain I can see the table that's been generated: 如果我用简单的SELECT * FROM chain替换最终的摘要查询,我可以看到生成的表:

 from_id | to_id 
---------+-------
         | vc2
 vc2     | vc3
 vc3     | vc4
 vc4     | rc7
 rc7     | 
(5 rows)

The first row is the manually added starting point row, where you specify what you want to look up - in this case that was vc2 . 第一行是手动添加的起始点行,您可以在其中指定要查找的内容 - 在本例中为vc2 Each subsequent row was added by the UNION ed recursive term, which does a LEFT OUTER JOIN on the previous result and returns a new set of rows that pair up the previous to_id (now in the from_id column) to the next to_id . 每个后续行都是由UNION ed递归项添加的,它对前一个结果执行LEFT OUTER JOIN并返回一组新的行,这些行将前一个to_id (现在在from_id列中)与下一个to_id If the LEFT OUTER JOIN doesn't match then the to_id will be null, causing the next invocation to return now rows and end iteration. 如果LEFT OUTER JOIN不匹配,则to_id将为null,导致下一次调用现在返回行并结束迭代。

Because this query doesn't attempt to add only the last row each time, it's actually repeating a fair bit of work each iteration. 因为此查询不会尝试每次只添加最后一行,所以实际上每次迭代都会重复一些工作。 To avoid that you would need to use an approach more like Gordon's, but additionally filter on the previous depth field when you scanned input table, so you joined only the most recent row. 为了避免这种情况,您需要使用更像Gordon的方法,但是当您扫描输入表时,还要在前一个深度字段上进行过滤,因此您只加入了最新的行。 In practice this usually isn't necessary, but it can be a concern for very big data sets or where you can't create appropriate indexes. 在实践中,这通常不是必需的,但它可能是非常大的数据集或无法创建适当索引的问题。

More can be learned in the PostgreSQL documentation on CTEs . 可以在关于CTE的PostgreSQL文档中学到更多。

Here is the SQL using a recursive CTE: 这是使用递归CTE的SQL:

with recursive tr(id1, id2, level) as (
      select t.id1, t.id2, 1 as level
      from t union all
      select t.id1, tr.id2, tr.level + 1
      from t join
           tr
           on t.id2 = tr.id1
     )
select *
from (select tr.*,
             max(level) over (partition by id1) as maxlevel
      from tr
     ) tr
where level = maxlevel;

Here is the SQLFiddle 是SQLFiddle

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM