简体   繁体   English

在ArangoDB中的大型二部图上使用AQL进行有效的路径遍历

[英]Efficient path traversal using AQL on a large bipartite graph in ArangoDB

I have a large bipartite graph (300M+ nodes) stored in ArangoDB using two collections and an edgelist. 我有一个使用两个集合和一个边缘列表存储在ArangoDB中的大型二部图(300M +节点)。 I'm trying to do an efficient traversal using AQL that starts from a node of one type with a particular label to find all other connected nodes of the same type with the same label. 我正在尝试使用AQL进行有效遍历,该AQL从具有特定标签的一种类型的节点开始,以查找具有相同标签的相同类型的所有其他连接节点。 The resulting traversal could find anywhere between 2 and 150K nodes, though on average it will be around 10-20 nodes. 遍历的结果可能会发现2到15万个节点之间的任何地方,尽管平均而言大约是10到20个节点。 It is important that a) I specify a large default max traversal depth (ie. 0..50) to ensure I find everything, but that b) AQL prunes paths so that most of the time it never reaches this max depth. 重要的是,a)我指定一个较大的默认最大遍历深度(即0..50)以确保找到所有内容,但是b)AQL修剪路径,以便在大多数情况下它永远不会达到该最大深度。

I have a query that gets the right results, but it does not appear to prune the paths, as it gets slower as I increase the max depth, even though the results do not change. 我有一个查询,可以得到正确的结果,但是它似乎没有修剪路径,因为即使增加了最大深度,它也会变慢,即使结果没有变化。

Here is the problem in miniature ( picture here ): 这是缩影的问题( 图片在这里 ):

var cir = db._create("circles");
var dia = db._create("diamonds");
var owns = db._createEdgeCollection("owns");

var A = cir.save({_key: "A", color:'blue'});
var B = cir.save({_key: "B", color:'blue'});
var C = cir.save({_key: "C", color:'blue'});
var D = cir.save({_key: "D", color:'yellow'});
var E = cir.save({_key: "E", color:'yellow'});
var F = cir.save({_key: "F", color:'yellow'});
var G = cir.save({_key: "G", color:'red'});
var H = cir.save({_key: "H", color:'red'});

var d1 = dia.save({_key: "1"})_id;
var d2 = dia.save({_key: "2"})_id;
var d3 = dia.save({_key: "3"})_id;
var d4 = dia.save({_key: "4"})_id;
var d5 = dia.save({_key: "5"})_id;
var d6 = dia.save({_key: "6"})_id;

owns.save(A, d2, {});
owns.save(A, d5, {});
owns.save(A, d4, {});
owns.save(B, d4, {});
owns.save(C, d5, {});
owns.save(C, d6, {});
owns.save(D, d1, {});
owns.save(D, d2, {});
owns.save(E, d1, {});
owns.save(E, d3, {});
owns.save(F, d3, {});
owns.save(F, d4, {});
owns.save(G, d6, {});
owns.save(H, d6, {});
owns.save(H, d2, {});

Starting at the Node circle/A I want to find all connected vertices only stopping when I encounter a circle which is not blue . 从Node circle/A开始,我想找到所有连接的顶点,只有在遇到非blue的圆时才停止。

The following AQL does what I want: 以下AQL可以满足我的要求:

FOR v, e, p IN 0..5 ANY "circles/A" owns 
    FILTER p.vertices[* filter has(CURRENT, 'color')].color ALL == 'blue'
    return v._id

But the FILTER clause does not cause any pruning to occur. 但是FILTER子句不会引起任何修剪。 At least, as I said above, in the large database I have, increasing the max depth makes it very slow, without changing the results. 至少如上所述,在我拥有的大型数据库中,增加最大深度会使它非常缓慢,而不会更改结果。

So how do I ensure that the filtering of the paths causes the algorithm to prune the paths? 那么,如何确保路径过滤导致算法修剪路径呢? The docs are a little thin on this. 该文档对此有点薄。 I can only find examples where exact path lengths are used ( p.vertices[1] for example). 我只能找到使用确切路径长度的示例(例如p.vertices[1] )。

As far as I know, there is only one pattern the optimizer is currently capable of recognizing to prune paths instead of post-filtering, and that is a plain filter on the path variable in combination with the ALL operator. 据我所知,优化器目前只有一种模式可以识别修剪路径而不是后过滤,这是结合ALL运算符对路径变量进行的简单过滤。

The inline filter you added may prevent this optimization from being applied. 您添加的嵌入式过滤器可能会阻止应用此优化。 I don't see why you added it in the first place. 我不明白您为什么首先添加它。 A vertex without color attribute has an implicit value of null , which is not equal to 'blue' and should thus be unnecessary. 没有color属性的顶点的隐含值null ,它不等于'blue' ,因此是不必要的。

Does this query produce the same results, but faster as you increase the traversal depth? 该查询是否产生相同的结果,但是随着遍历深度的增加而更快?

FOR v, e, p IN 0..5 ANY "circles/A" owns 
    FILTER p.vertices[*].color ALL == 'blue'
    return v._id

There is an open feature request for an explicit way to prune paths. 有一个开放功能请求要求提供一种明确的修剪路径的方法。 Feel free to add your use case. 随时添加您的用例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM