简体   繁体   English

ArangoDB:小型集合上的GRAPH_EDGES命令非常慢(超过20秒)

[英]ArangoDB: GRAPH_EDGES command very slow (more than 20 sec) on small collections

I'am evaluating ArangoDB and I see that GRAPH_EDGES and GRAPH_VERTICES commands are very slow, on small collections (300 vertices). 我正在评估ArangoDB,发现在小型集合 (300个顶点) ,GRAPH_EDGES和GRAPH_VERTICES命令非常慢。

I have 3 collections: 我有3个收藏:

TactiveService( 300 Vertices) --> TusesCommand( 300 Edges) --> Tcommand (1 Vertex) TactiveService(300个顶点)-> TusesCommand(300条边)-> Tcommand(1个顶点)

Using GRAPH_EDGES, this query take 24 sec 使用GRAPH_EDGES,此查询需要24秒

FOR service IN TactiveService
   LET usesCommand = (
      return FIRST(GRAPH_EDGES("topvision", {}, { edgeExamples : [{_from: service._id}], edgeCollectionRestriction : "TusesCommand", includeData:true, maxDepth : 1 }))
   )
   LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command} 

For the same result , this query takes 0.020 sec 对于相同的结果 ,此查询需要0.020秒

FOR service IN TactiveService
   LET usesCommand = (
      FOR usesCommand IN TusesCommand
         FILTER usesCommand._from == service._id
         RETURN usesCommand
   )
   LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command} 

GRAPH_EDGES is unusable for me in FOR statement (same problem with GRAPH_VERTICES). GRAPH_EDGES在FOR语句中对我不可用(与GRAPH_VERTICES相同的问题)。

Ideas on the reason of this slowness are welcome. 欢迎就这种缓慢的原因提出想法。

We are well aware that GRAPH_EDGES is not well suited to be used like this in a query. 我们很清楚, GRAPH_EDGES不太适合在查询中像这样使用。

We therefore introduced AQL pattern matching traversals , which should perform significantly better. 因此,我们引入了AQL模式匹配遍历 ,它应该表现得更好。

You could formulate your query like this, replacing the GRAPH_EDGES with a traversal: 您可以这样GRAPH_EDGES查询,将GRAPH_EDGES替换为遍历:

FOR service IN TactiveService
LET usesCommand = (
                   FOR v, e IN 1..1 OUTBOUND service "TusesCommand"
                       FILTER e._from == service._id RETURN e
   )
   LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command} 

Please note that the specified filter is implicitely true because of we queried for OUTBOUND edges starting from service - so e._from will always be equal to service._id . 请注意,由于我们查询了从service开始的OUTBOUND边,因此指定的过滤器隐式为true ,因此e._from将始终等于service._id Instead of specifying GRAPH "topvision" and later on limit the edge collections we want to take into account in the traversal, we use the an anonymous graph query only taking into account the edge collection TusesCommand as you did. 与其指定GRAPH "topvision" ,而不是稍后限制遍历中要考虑的边集合,我们使用匿名图查询,就像您所做的那样仅考虑边集合TusesCommand

So simplifying it a little more, the query could look like: 因此,将其简化一点,查询可能类似于:

FOR service IN TactiveService
LET usesCommand = (
          FOR v, e IN 1..1 OUTBOUND service "TusesCommand" RETURN {v: v, e: e}
   )
RETURN { service : service, usesCommand: usesCommand} 

This may return more vertices than your query, but it will only fetch them once; 这可能会返回比您的查询更多的顶点,但是只会获取一次。 so the result set may be bigger, but the number of index lookups is reduced by the removed DOCUMENT calls of the query. 因此结果集可能更大,但是通过删除查询的DOCUMENT调用可以减少索引查找的次数。

As you already noticed and formulated with your second query, if your actual problem works better with a classic join ArangoDB offers you the freedom of choice to work with your data like that. 正如您已经在第二个查询中注意到并提出的那样,如果您的实际问题通过经典联接可以更好地使用,则ArangoDB可让您自由选择使用这样的数据。

edit : Michael is right for sure, the direction has to be OUTBOUND 编辑 :迈克尔肯定是正确的,方向必须OUTBOUND

if for some reason you do not want to upgrade to 2.8 as @dothebart suggests. 如果由于某种原因您不想按照@dothebart的建议升级到2.8。 You can also fix the old query. 您也可以修复旧查询。 Original: 原版的:

FOR service IN TactiveService
   LET usesCommand = (
      return FIRST(GRAPH_EDGES("topvision", {}, { edgeExamples : [{_from: service._id}], edgeCollectionRestriction : "TusesCommand", includeData:true, maxDepth : 1 }))
   )
   LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command} 

The slow part of the query is finding the starting point. 查询的最慢部分是寻找起点。 The API of GRAPH_EDGES uses the second parameter as start example. GRAPH_EDGES的API使用第二个参数作为开始示例。 {} matches to all start Points. {}匹配所有起始点。 So it now computes all outbound edges for all vertices first (this is expensive, as this actually means for every vertex in the start collection, we collect all edges for every vertex in the start collection). 因此,它现在首先计算所有顶点的所有出站边缘(这很昂贵,因为这实际上意味着对于start集合中的每个顶点,我们收集start集合中每个顶点的所有边缘)。 Than it post filters all found edges with the example you gave (Which removes almost all of the edges again). 然后用您提供的示例对所有找到的边缘进行过滤(这将再次去除几乎所有的边缘)。 If you replace the start example by the _id of the start vertex it will just collect the edges for this specific vertex. 如果将起始示例替换为起始顶点的_id,它将仅收集该特定顶点的边。 Now you are also interested in the edges of only one direction (OUTBOUND) so you can just give it in the options as well (so only edges with _from == service._id are fetched by GRAPH_EDGES in first place). 现在您也只对一个方向的边缘感兴趣(出界),因此也可以在选项中给出它(因此,只有_from == service._id的边缘才被GRAPH_EDGES提取)。

FOR service IN TactiveService
   LET usesCommand = (
      RETURN FIRST(GRAPH_EDGES("topvision", service._id, { edgeCollectionRestriction : "TusesCommand", includeData:true, maxDepth : 1, direction: 'outbound' }))
   )
   LET command = DOCUMENT(usesCommand[0]._to)
RETURN { service : service, usesCommand: usesCommand[0], command:command}

However I still expect that the version of @dothebart is faster in 2.8 and i would also recommend to switch to the newest version. 但是我仍然希望@dothebart的版本在2.8中更快,我也建议切换到最新版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM