简体   繁体   中英

Hierarchical table - how to get paths of the items [linked lists in MySQL]

I have a hierarchical table in MySQL: parent field of each item points to the id field of its parent item. For each item I can get the list of all its parents [regardless the depth] using the query described here . With GROUP_CONCAT I get the full path as a single string:

SELECT GROUP_CONCAT(_id SEPARATOR ' > ') FROM (
SELECT  @r AS _id,
         (
         SELECT  @r := parent
         FROM    t_hierarchy
         WHERE   id = _id
         ) AS parent,
         @l := @l + 1 AS lvl
 FROM    (
         SELECT  @r := 200,
                 @l := 0
         ) vars,
         t_hierarchy h
WHERE    @r <> 0
ORDER BY lvl DESC
) x

I can make this work only if the id of the item is fixed [it's 200 in this case].

I want to do the same for all rows: retrieve the whole table with one additional field ( path ) which will display the full path. The only solution that comes to my mind is to wrap this query in another select, set a temporary variable @id and use it inside the subquery. But it doesn't work. I get NULL s in the path field.

SELECT @id := id, parent, (
    SELECT GROUP_CONCAT(_id SEPARATOR ' > ') FROM (
    SELECT  @r AS _id,
             (
             SELECT  @r := parent
             FROM    t_hierarchy
             WHERE   id = _id
             ) AS parent,
             @l := @l + 1 AS lvl
     FROM    (
             SELECT  @r := @id,
                     @l := 0
             ) vars,
             t_hierarchy h
    WHERE    @r <> 0
    ORDER BY lvl DESC
    ) x
) as path
 FROM t_hierarchy

PS I know I can store the paths in a separate field and update them when inserting/updating, but I need a solution based on the linked list technique .

UPDATE: I would like to see a solution that will not use recursion or constructs like for and while . The above method for finding paths doesn't use any loops or functions. I want to find a solution in the same logic. Or, if it's impossible, please try to explain why!

Consider the difference between the following two queries:

SELECT @id := id as id, parent, (
    SELECT concat(id, ': ', @id)
) as path
FROM t_hierarchy;

SELECT @id := id as id, parent, (
    SELECT concat(id, ': ', _id)
    FROM (SELECT @id as _id) as x
) as path
FROM t_hierarchy;

They look nearly identical, but give dramatically different results. On my version of MySQL, _id in the second query is the same for each row in its result set, and equal to the id of the last row. However, that last bit is only true because I executed the two queries in the order given; after SET @id := 1 , for example, I can see that _id is always equal to the value in the SET statement.

So what's going on here? An EXPLAIN yields a clue:

mysql>     explain SELECT @id := id as id, parent, (
    ->         SELECT concat(id, ': ', _id)
    ->         FROM (SELECT @id as _id) as x
    ->     ) as path
    ->     FROM t_hierarchy;
+----+--------------------+-------------+--------+---------------+------------------+---------+------+------+----------------+
| id | select_type        | table       | type   | possible_keys | key              | key_len | ref  | rows | Extra          |
+----+--------------------+-------------+--------+---------------+------------------+---------+------+------+----------------+
|  1 | PRIMARY            | t_hierarchy | index  | NULL          | hierarchy_parent | 9       | NULL | 1398 | Using index    |
|  2 | DEPENDENT SUBQUERY | <derived3>  | system | NULL          | NULL             | NULL    | NULL |    1 |                |
|  3 | DERIVED            | NULL        | NULL   | NULL          | NULL             | NULL    | NULL | NULL | No tables used |
+----+--------------------+-------------+--------+---------------+------------------+---------+------+------+----------------+
3 rows in set (0.00 sec)

That third row, the DERIVED table with no tables used, indicates to MySQL that it can be calculated exactly once, at any time. The server doesn't notice that the derived table uses a variable defined elsewhere in the query, and has no clue that you want it to be run once per row. You're being bitten by a behavior mentioned in the MySQL documentation on user-defined variables :

As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.

In my case, it chooses to do calculate that table first, before @id is (re)defined by the outer SELECT . In fact, that's exactly why the original hierarchical data query works; the @r definition is computed by MySQL before anything else in the query, precisely because it's that kind of derived table. However, we need here a way to reset @r once per table row, not just once for the whole query. To do that, we need a query that looks like the original one, resetting @r by hand.

SELECT  @r := if(
          @c = th1.id,
          if(
            @r is null,
            null,
            (
              SELECT  parent
              FROM    t_hierarchy
              WHERE   id = @r
            )
          ),
          th1.id
        ) AS parent,
        @l := if(@c = th1.id, @l + 1, 0) AS lvl,
        @c := th1.id as _id
FROM    (
        SELECT  @c := 0,
                @r := 0,
                @l := 0
        ) vars
        left join t_hierarchy as th1 on 1
        left join t_hierarchy as th2 on 1
HAVING  parent is not null

This query uses the second t_hierarchy the same way the original query does, to ensure there are enough rows in the result for the parent subquery to loop over. It also adds a row for each _id that includes itself as a parent; without that, any root objects (with NULL in the parent field) would fail to appear in the results at all.

Oddly, running the result through GROUP_CONCAT seems to disrupt ordering. Fortunately, that function has its own ORDER BY clause:

SELECT  _id,
        GROUP_CONCAT(parent ORDER BY lvl desc SEPARATOR ' > ') as path,
        max(lvl) as depth
FROM    (
  SELECT  @r := if(
            @c = th1.id,
            if(
              @r is null,
              null,
              (
                SELECT  parent
                FROM    t_hierarchy
                WHERE   id = @r
              )
            ),
            th1.id
          ) AS parent,
          @l := if(@c = th1.id, @l + 1, 0) AS lvl,
          @c := th1.id as _id
  FROM    (
          SELECT  @c := 0,
                  @r := 0,
                  @l := 0
          ) vars
          left join t_hierarchy as th1 on 1
          left join t_hierarchy as th2 on 1
  HAVING  parent is not null
  ORDER BY th1.id
) as x
GROUP BY _id;

Fair warning: These queries implicitly rely on the @r and @l updates happening before the @c update. That order is not guaranteed by MySQL, and may change with any version of the server.

Define the getPath function and run the following query:

select id, parent, dbo.getPath(id) as path from t_hierarchy 

Defining the getPath function:

create function dbo.getPath( @id int)
returns varchar(400)
as
begin
declare @path varchar(400)
declare @term int
declare @parent varchar(100)
set @path = ''
set @term = 0
while ( @term <> 1 )
begin
   select @parent = parent from t_hierarchy where id = @id
   if ( @parent is null or @parent = '' or  @parent = @id )
        set @term = 1
   else
        set @path = @path + @parent   
   set @id = @parent     
end
return @path
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM