简体   繁体   English

存储和查询树状分层数据的有效方法

[英]Efficient way to store and query tree-like hierarchical data

Please see the image here: 请在此处查看图片:

https://picasaweb.google.com/108987384888529766314/CS3217Project#5717590602842112850 https://picasaweb.google.com/108987384888529766314/CS3217Project#5717590602842112850

So, as you can see from the image, we are trying to store hierarchical data into a database. 因此,正如您从图像中看到的,我们正在尝试将分层数据存储到数据库中。 1 publisher has may articles, 1 article has many comments and so on. 1个出版商可能有文章,1个文章有很多评论等。 Thus, if I use a relational database like SQL Server, I will have a publisher table, then an articles table and a comments table. 因此,如果我使用像SQL Server这样的关系数据库,我将有一个发布者表,然后是一个文章表和一个注释表。 But the comments table will grow very quickly and become very large. 但评论表会迅速增长并变得非常大。

Thus, is there any alternative which allows me to store and query such tree like data efficiently? 因此,有没有其他方法可以让我有效地存储和查询这样的树状数据? How about NoSQL (MongoDB)? NoSQL(MongoDB)怎么样?

You can use adjacent lists for hierarchical data. 您可以使用相邻列表来分层数据。 It's efficient and easy to implement. 它高效且易于实施。 It works also with MySQL. 它也适用于MySQL。 Here a link: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/ . 这里有一个链接: http//mikehillyer.com/articles/managing-hierarchical-data-in-mysql/

Here is good survey of 8 NoSQL distributed databases and the needs that they fill. 以下是对8个NoSQL分布式数据库及其填充需求的良好调查。

Do you anticipate you will write more than you read? 你预计你会写的比你读的更多吗?
Do you anticipate you will need low-latency data access, high concurrency support and high availability is a requirement? 您是否预计需要低延迟数据访问,高并发支持和高可用性是必需的?
Do you need dynamic queries? 你需要动态查询吗?
Do you prefer to define indexes, not map/reduce functions? 您更喜欢定义索引,而不是map / reduce函数吗?
Is versioning important? 版本控制很重要吗?
Do you anticipate you will accumulate occasionally changing data, on which pre-defined queries are to be run? 您是否预计会偶尔积累更改数据,以便运行预定义的查询?
Do you anticipate you will rapidly changing data with a foreseeable database size (should fit mostly in memory)? 您是否预计您将以可预见的数据库大小快速更改数据(应该主要适合内存)?
Do you anticipate graph-style, rich or complex, interconnected data? 您是否预期图形式,丰富或复杂的互连数据?
Do you anticipate you will need random, realtime read/write access to BigTable-like data? 您是否预计您将需要对类似BigTable的数据进行随机,实时的读/写访问?

I found this SO post when searching the same thing, The URL posted by Phpdevpad is a great read to understand how Adjacency List Model and Nested Set Model work and compare against each other. 我在搜索同样的东西时发现了这个SO帖子,Phpdevpad发布URL是一个很好的阅读,以了解邻接列表模型嵌套集模型如何工作和相互比较。 The article is very much in favor of the Nested Set Model and explains many draw backs to the Adjacency List Model, however I was greatly concerned about the mass updates the nested method would cause . 这篇文章非常支持嵌套集模型,并解释了对邻接列表模型的许多缺点,但是我非常关注嵌套方法会导致的大量更新

The main limitation to adjacency lists outlined in the article was that an additional self join was required for each layer of depth. 文章中概述的邻接列表的主要限制是每个深度层都需要额外的自联接。 However this limitation is easily overcome with the use of another language (such as php) and a recessive function for finding children such as outlined here: http://www.sitepoint.com/hierarchical-data-database/ 然而,使用另一种语言(例如php)和用于查找孩子的隐性功能可以轻松克服这种限制,如下所述: http//www.sitepoint.com/hierarchical-data-database/

snippet from url above using the Adjacency List Model 使用邻接列表模型从上面的URL获取片段

<?php
// $parent is the parent of the children we want to see
// $level is increased when we go deeper into the tree,
//        used to display a nice indented tree 
function display_children($parent, $level) {

  // retrieve all children of $parent
  $result = mysql_query('SELECT title FROM tree WHERE parent="'.$parent.'";');

  // display each child
  while ($row = mysql_fetch_array($result)) {

    // indent and display the title of this child
    echo str_repeat('  ',$level).$row['title']."n";

    // call this function again to display this
    display_children($row['title'], $level+1);
  }
}

// $node is the name of the node we want the path of
function get_path($node) {

  // look up the parent of this node
  $result = mysql_query('SELECT parent FROM tree WHERE title="'.$node.'";');
  $row = mysql_fetch_array($result);

  // save the path in this array
  $path = array();

  // only continue if this $node isn't the root node
  // (that's the node with no parent)
  if ($row['parent']!='') {

    // the last part of the path to $node, is the name
    // of the parent of $node
    $path[] = $row['parent'];

    // we should add the path to the parent of this node
    // to the path
    $path = array_merge(get_path($row['parent']), $path);
  }

  // return the path
  return $path;
}
display_children('',0);

Conclusion 结论

As a result I am now convinced that the Adjacency List Model will be far easier to use and manage moving forward. 因此,我现在确信邻接列表模型将更容易使用和管理向前发展。

Most NOSQL database design involves a mix of the following techniques: 大多数NOSQL数据库设计涉及以下技术的混合:

  • Embedding - nesting of objects and arrays inside a document 嵌入 - 在文档中嵌套对象和数组
  • Linking - references between documents 链接 - 文档之间的引用

The schema you craft depends on various aspects of you data. 您制作的架构取决于您数据的各个方面。 One solution to your problem may be the following schema: 您的问题的一个解决方案可能是以下架构:

db.articles { _id: ARTICLE_ID;  publisher: "publisher name";     ...    }
db.comments { _id: COMMENT_ID; article_id: ARTICLE_ID;    ... }

Here the publisher is embedded in an article document. 这里的发布者嵌入在文章文档中。 We can do this because it's unlikely the publisher name will change. 我们可以这样做,因为发布商名称不太可能发生变化。 It also saves us having to look up publisher details every time we need to access an article. 它还节省了我们每次需要访问文章时都必须查找发布者详细信息。

The comments are stored in their own documents, with each comment linking to an article. 评论存储在他们自己的文档中,每个评论链接到一篇文章。 To find all comments associated to an article you can 要查找与文章相关的所有评论,您可以

db.comments.find({article_id:"My Atticle ID"}]

and to speed things up you could always add "article_id" to the index 为了加快速度,你总是可以在索引中添加“article_id”

db.comments.ensureIndex({article_id:1})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM