简体   繁体   English

在数据库中实现分层数据结构

[英]Implementing a hierarchical data structure in a database

I know there are two approaches: adjacency list and nested tree. 我知道有两种方法:邻接列表和嵌套树。 It's said that adjacency list can become slow to use on traversal because of numerous queries. 据说由于大量查询,邻接列表在遍历上使用会很慢。 But I don't know any realistic figures for this. 但我不知道这方面的任何实际数字。 The site I'm making will have in the region of 200 pages. 我正在制作的网站将有200页。 Is traversal to generate (for example) a sitemap going to take longer than about 0.3 seconds? 遍历生成(例如)站点地图需要花费超过0.3秒的时间吗?

Running on MySQL (innoDB) with LAMP stack. 使用LAMP堆栈在MySQL(innoDB)上运行。

I'd prefer to implement adjacency if possible because of the more simplistic design. 如果可能的话,我更愿意实现邻接,因为设计更简单。

Thanks. 谢谢。

There are more options than just the two you mention. 除了你提到的两个选项之外,还有更多的选择。 There are: 有:

  • Adjacency List (the "parent_id" one almost everyone uses) 邻接列表(几乎每个人都使用的“parent_id”)
  • Nested Sets 嵌套集
  • Path Enumeration 路径枚举
  • Closure Table (aka Adjacency Relation) 关闭表(又称邻接关系)

See my answer to " What is the most efficient/elegant way to parse a flat table into a tree? " 请参阅我的回答“ 什么是将平台解析成树的最有效/优雅的方法?

Or a couple of books: 或者几本书:

The article Managing Hierarchical Data in MySQL goes in details about this. MySQL中管理分层数据的文章详细介绍了这一点。

I would recommend the "nested set" technique, as it allows you to get the whole tree (and its children) in one query. 我建议使用“嵌套集”技术,因为它允许您在一个查询中获取整个树(及其子节点)。 Basically reads are cheap but writes are expensive because the whole tree has to be re-balanced. 基本上读取是便宜的,但写入是昂贵的,因为整个树必须重新平衡。 But in cases where you have 99% reads then its totally justifiable. 但是如果你有99%的读数,那么它是完全合理的。

The naive approach to parsing an adjacency list requires a lot of queries, and for large lists may take a significant amount of time to build in memory. 解析邻接列表的天真方法需要大量查询,而对于大型列表,可能需要花费大量时间来构建内存。 For reference, the naive approach I'm referring to could be summarized as: Select all items with no parent, Then for each item recursively get it's children. 作为参考,我所指的天真方法可以概括为:选择没有父项的所有项目,然后为每个项目递归地获取它的子项。 This approach requires n+1 database queries. 此方法需要n + 1个数据库查询。

I've used the following approach to build an adjacency list with 1 query. 我使用以下方法构建一个带有1个查询的邻接列表。 Select all items form the database. 从数据库中选择所有项目。 Transfer all items into an array indexed by their key. 将所有项目转移到由其键索引的数组中。 Traverse the array and assign a reference from the parent object to each of it's children. 遍历数组并将父对象的引用分配给每个子对象。 Traverse the array a second time and remove all of the child objects leaving behind only the root level objects. 第二次遍历数组并删除仅留下根级对象的所有子对象。

Since you mentioned LAMP stack, PHP code to do this is roughly as follows: 既然你提到了LAMP堆栈,那么PHP代码大致如下:

<?php
// Assumes $src is the array if items from the database.
$tmp = array();

// Traverse the array and index it by id, ensuing each item has an empty array of children.
foreach ($src as $item) {
  $item['children'] = array();
  $tmp[$item['id']] = $item;
}

// Now traverse the array a second time and link children to their parents.
foreach ($tmp as $id => $item) {
  if ($item['parent_id'] != 0 || $item['parent_id'] !== NULL) {
    $tmp[$item['parent_id']]['children'][$id] = &$tmp[$id];
  }
}

// Finally create an array with just root level items.
$tree = array();
foreach ($tmp as $id => $item) {
  if ($item['parent_id'] == 0 || $item['parent_id'] === NULL) {
    $tree[$id] = $item;
  }
}

// $tree now contains our adjacency list in tree form.
?>

Please note this code is intended to illustrate a technique for building an adjacency list from a single database query. 请注意,此代码旨在说明从单个数据库查询构建邻接列表的技术。 It could probably be optimized for less memory consumption, etc. It also hasn't been tested. 它可能可以针对更少的内存消耗等进行优化。它还没有经过测试。

Jim, 吉姆

The other approach is called "nested set", I think, not "nested tree". 我认为另一种方法称为“嵌套集”,而不是“嵌套树”。

Anyway, a good thing about a site map is that you might know its maximum depth. 无论如何,站点地图的一个好处是你可能知道它的最大深度。 I think that the problem with the adjacency model is that the corresponding SQL works on one level at a time, so if you have 'n' levels then you need a loop of 'n' SQL statements ... but I think (I'm not sure) that if you know the maximum 'n' in advance then you can code the corresponding fixed-number-of-multiple-levels SQL. 我认为邻接模型的问题是相应的SQL一次只能在一个层面上工作,所以如果你有'n'个级别,那么你需要一个'n'个SQL语句的循环...但我认为(我'我不确定)如果您事先知道最大'n',那么您可以编写相应的固定数量级别的SQL。

0.3 seconds sounds to me like a very long time to figure 200 pages, so that's probably OK. 0.3秒听起来像很长一段时间才能看到200页,所以这可能还行。

Also a site map isn't updated very often; 站点地图也不经常更新; so even if it does take a long time to retrieve from SQL, you can probably cache the retrieved/calculated tree in RAM. 因此,即使从SQL检索确实需要很长时间,您也可以将检索/计算的树缓存在RAM中。

ALternatively, instead of worrying about the SQL to build a tree, you could just store it as simply as possible (as adjacency list), retrieve it from the database as a simple set of rows, and build the tree in RAM (using loops in your high-level programming language) instead of using loops in SQL to build the tree using SQL statements. 或者,不要担心构建树的SQL,您可以尽可能简单地存储它(作为邻接列表),从数据库中将其作为一组简单的行检索,并在RAM中构建树(使用循环)您的高级编程语言)而不是使用SQL中的循环来使用SQL语句构建树。

为了完整性:Oracle具有START_WITHCONNECT_BY运算符:请参阅此分层查询文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM