简体   繁体   English

ArangoDB图遍历查询如何在群集中执行?

[英]How do ArangoDB Graph Traversal Queries Execute in a Cluster?

In the description of SmartGraphs here it seems to imply that graph traversal queries actually follow edges from machine to machine until the query finishes executing. 这里对SmartGraphs的描述中,似乎暗示着图遍历查询实际上沿着机器的边缘,直到查询完成执行为止。 Is that how it actually works? 那实际上是这样吗? For example, suppose that you have the following query that retrieves 1-hop, 2-hop, and 3-hop friends starting from the person with id 12345: 例如,假设您具有以下查询,该查询从ID为12345的人开始检索1-hop,2-hop和3-hop朋友:

FOR p IN Person
  FILTER p._key == 12345
  FOR friend IN 1..3 OUTBOUND p knows
    RETURN friend

Can someone please walk me through the lifetime of this query starting from the client and ending with the results on the client? 有人可以指导我从客户端开始一直到客户端结果结束整个查询过程吗?

what actually happens can be a bit different compared to the schemas on our website. 与我们网站上的架构相比,实际发生的情况可能有所不同。 What we show there is kind of a "worst case" where the data can not be sharded perfectly (just to make it a bit more fun). 我们展示的是一种“最坏的情况”,其中数据无法完美分片(只是为了使其更加有趣)。 But let's take a quick step back first to describe the different roles within an ArangoDB cluster. 但是,让我们先快速退一步来描述ArangoDB集群中的不同角色。 If you are already aware of our cluster lingo/architecture, please skip the next paragraph. 如果您已经了解我们的群集语言/体系结构,请跳过下一段。

You have the coordinator which, as the name says, coordinates the query execution and is also the place where the final result set gets built up to send it back to the client. 顾名思义,您有一个协调器,它协调查询的执行,并且也是建立最终结果集以将其发送回客户端的地方。 Coordinators are stateless, host a query engine and is are the place where Foxx services live. 协调器是无状态的,承载查询引擎,并且是Foxx服务所在的地方。 The actual data is stored on the DBservers in a stateful fashion but DBservers also have a distributed query engine which plays a vital role in all our distributed query processing. 实际数据以有状态方式存储在DB服务器上,但是DB服务器还具有分布式查询引擎,该引擎在我们所有的分布式查询处理中都起着至关重要的作用。 The brain of the cluster is the agency with at least three agents running the RAFT consensus protocol. 集群的大脑是具有至少三个运行RAFT共识协议的代理的代理。

When you sharded your graph data set as a SmartGraph, then the following happens when a query is being sent to a Coordinator. 将图形数据集拆分为SmartGraph时,将查询发送给Coordinator时会发生以下情况。 - The Coordinator knows which data needed for the query resides on which machine and distributes the query accordingly to the respective DBservers. -协调器知道查询所需的数据驻留在哪台计算机上,并将查询相应地分发到各个DB服务器。 - Each DBserver has its own query engine and processes the incoming query from the Coordinator locally and then sends the intermediate result back to the coordinator where the final result set gets put together. -每个数据库服务器都有自己的查询引擎,并在本地处理来自协调器的传入查询,然后将中间结果发送回协调器,最终结果集将合并到该协调器中。 This runs in parallel. 这是并行运行的。 - The Coordinator sends then result back to the client. -协调器然后将结果发送回客户端。

In case you have a perfectly shardable graph (eg a hierarchy with its branches being the shards //Use Case could be eg Bill of Materials or Network Analytics) then you can achieve the performance close to a single instance because queries can be sent to the right DBservers and no network hops are required. 如果您有一个完全可分割的图(例如,其分支为分片的层次结构,//用例可以是例如物料清单或网络分析),则可以实现接近单个实例的性能,因为可以将查询发送到正确的DBserver,并且不需要网络跃点。 If you have a much more "unstructured" graph like a social network where connections can occur among any two given vertices, sharding becomes an optimization question and, depending on the query, it is more likely that network hops between servers occur. 如果您有一个更“非结构化”的图(例如社交网络),其中可以在任意两个给定的顶点之间发生连接,则分片将成为一个优化问题,并且根据查询的不同,服务器之间的网络跳跃的可能性更大。 This latter case is shown in the schemas on our website. 后一种情况显示在我们网站的架构中。 In his case, the SmartGraph feature can minimize the network hops needed to a minimum but not completely. 在他的案例中,SmartGraph功能可以将所需的网络跃点最小化,但不能完全减少。

Hope this helped a bit. 希望这能有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM