简体   繁体   English

为什么图 DB 比 RDB 更快进行图遍历?

[英]Why are graph DBs faster then RDBs for graph traversals?


I've read several articles (like this ) stating that graph DBs are inherently faster then RDBs when running graph traversal algorithms because of the index-free adjacency.我读过几篇文章(像这样)指出,由于无索引邻接,在运行图遍历算法时,图 DB 本质上比 RDB 快。 However, I'm having trouble understanding the theoretical justification for it.但是,我无法理解其理论依据。 It seems to me that if you construct a hash-indexed adjacency tables, you should reach the same complexity performance.在我看来,如果你构造一个散列索引的邻接表,你应该达到相同的复杂度性能。

For example, finding the friends of a person (given the person id) using an RDB with 2 tables: people and friendships例如,使用具有 2 个表的 RDB 查找一个人的朋友(给定人 id):人和朋友

1) Locating the friends: O(m) - where m is the number of friends. 1) 定位朋友:O(m) - 其中 m 是朋友的数量。
2) For each friend Id, locating in people: O(1) 2)对于每个朋友Id,在people中定位:O(1)
Total: O(m)总计:O(m)

In a graph DB, this should be the same, no?在图形数据库中,这应该是相同的,不是吗?

No, queries are executed differently in RDBMS than in a graph database.不,查询在 RDBMS 中的执行方式与在图形数据库中的执行方式不同。

  1. The example you gave is finding a friend of a given person, which is a one-hop query (in graph terms) and is quite easy in both kinds of databases.您给出的示例是查找给定人的朋友,这是一个单跳查询(在图形方面)并且在两种数据库中都很容易。

    However, if you want to perform an n-hop query (n > 3), in RDBMS you can use subquery or join and the performance would be dependent on your optimizer.但是,如果您想执行 n 跳查询 (n > 3),在 RDBMS 中您可以使用子查询或连接,性能将取决于您的优化器。

    Below is an example:下面是一个例子:

    Assume that we have tables class with fields id (PRIMARY KEY) and name , student with fields id (PRIMARY KEY), name and class_id .假设我们有包含字段id (PRIMARY KEY) 和name表类,包含字段id (PRIMARY KEY)、 nameclass_id

In order to find the class name whose id is 2, and the corresponding students, we need to join between two tables table class and student为了找到id为2的班级名称,以及对应的students,我们需要在两个表表class和student之间进行join

SELECT c.name as c_name, s.name as s_name 
  FROM class as c 
    LEFT JOIN student as s 
      ON c.id = s.class_id 
        WHERE c.id = 2;

Explain the query: Query explained in table解释查询:查询在表中解释

The whole student table will be scanned in order to find class_id=2.将扫描整个学生表以找到 class_id=2。

Of course we can create an index on student class_id column.当然,我们可以在学生class_id列上创建索引。

table of the index索引表

It reads the student_class index to get the pointers to the physical rows and then read the records as it is a non clustered index.它读取student_class索引以获取指向物理行的指针,然后读取记录,因为它是非聚集索引。

In graph database, data are modeled as nodes and connections.在图数据库中,数据被建模为节点和连接。

graph database model图数据库模型

To find the class name whose id is 2, and the corresponding students, just get the class node and traverse in backwards direction on select connections.要找到 id 为 2 的班级名称和相应的学生,只需获取班级节点并在select连接上向后遍历即可。 And avoid the join index lookup performance problem.并避免连接索引查找性能问题。

  1. If you want to find the shortest path and all possible paths between two points (but you don't know how many hops there are for the query), then there would be much trouble using RDBMS.如果你想找到最短路径和两点之间所有可能的路径(但你不知道查询有多少跳),那么使用RDBMS会很麻烦。 The query would be quite long.查询会很长。 LDBC has some nice cases using SQL, GQL (Cypher) and SparQL respectively. LDBC有一些很好的案例,分别使用 SQL、GQL(Cypher)和 SparQL。 Unfortunately I haven't found the runtime differences among the different languages.不幸的是,我还没有发现不同语言之间的运行时差异。

  2. It is difficult to do graph computing like LPA (Label Propagation Algorithm) and Page Rank algorithm with RDBMS.使用 RDBMS 很难像 LPA(标签传播算法)和 Page Rank 算法那样进行图计算。 But would be much easier to do so in some(most) Graph DBMS但是在某些(大多数)图形 DBMS 中这样做会容易得多

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM