简体   繁体   English

AQL验证节点的路径

[英]AQL to validate path to node

We're required to have some AQL that validates a specific path to an entity. 我们需要具有一些AQL,以验证到实体的特定路径。 The current solution performs very poorly, due to needing to scan whole collections. 由于需要扫描整个集合,因此当前解决方案的性能非常差。

eg here we have 3 entity 'types': a, b, c (though they are all in a single collection) and specific edge collections between them and we want to establish whether or not there is a connection between _key "123" and _key "234" that goes exactly through a -> b -> c. 例如,在这里,我们有3个实体“类型”:a,b,c(尽管它们都在单个集合中)以及它们之间的特定边缘集合,我们想要确定_key“ 123”和_key之间是否存在连接正好通过a-> b-> c的“ 234”。

FOR a IN entities FILTER e._key == "123" FOR b IN 1..1 OUTBOUND e edges_a_to_b FOR c IN 1..1 INBOUND e_1 edges_c_to_b FILTER e_2._key == "234" ...

This can fan out very quickly! 这可以很快散开!

We have another solution, where we use SHORTEST PATH and specify the appropriate DIRECTION and edge collections which is much faster (>100times). 我们还有另一种解决方案,其中我们使用最短路径并指定适当的方向和边缘集合,这要快得多(> 100倍)。 But worry that this approach does not satisfy quite our general case... the order of the edges is not enforced, and we may have to go through the same edge collection more than once, which we cannot do with that syntax. 但是担心这种方法不能完全满足我们的一般情况……边缘的顺序没有得到执行,并且我们可能必须多次通过同一个边缘集合,而这是我们无法使用的语法。

Is there another way, possibly involving paths in the traversal? 还有另一种方法,可能涉及遍历中的路径吗?

Thanks! 谢谢! Dan. 担。

If i understand correctly you always know the exact path that is required between your two vertices. 如果我正确理解,您将始终知道两个顶点之间所需的确切路径。

So to take your example a -> b -> c , a valid result will have: path.vertices == [a, b, c] So we can use this path to filter on it, which only works if you use a single traversal step instead of multiple ones. 因此,以path.vertices == [a, b, c] a -> b -> c为例,有效的结果将是: path.vertices == [a, b, c]因此,我们可以使用此路径对其进行过滤,仅当您使用单个路径时才有效遍历步骤,而不是多个。 So what we try to du is the following pattern: 因此,我们尝试使用的是以下模式:

FOR c,e, path IN <pathlength> <direction> <start> <edge-collections>
  FILTER path.vertices[0] == a // This needs to be formulated correctly
  FILTER path.vertices[1] == b // This needs to be formulated correctly
  FILTER path.vertices[2] == c // This needs to be formulated correctly
  LIMIT 1 // We only net exactly one path, so limit 1 is enough
  [...]

So with this hint is it possible to write the query in the following way: 因此,有了这个提示,就可以通过以下方式编写查询:

FOR a IN entities
  FILTER a._key == "123"
  FOR c, e, path IN 2 OUTBOUND a edges_a_to_b, INBOUND edges_b_to_c
    FILTER path.vertices[1] == /* whatever identifies b e.g. vertices[1].type == "b" */
    FILTER path.vertices[2]._key == "234"
    LIMIT 1 /* This will stop as soon as the first match is found, so very important! */
    /* [...] */

This will allow the optimizer to apply the filter conditions as early as possible, und will (almost) use the same algorithm as the shortest path implementation. 这将使优化程序可以尽早应用过滤条件,并且(几乎)将使用与最短路径实现相同的算法。 The trick is to use one traversal instead of multiples to save internal overhead and allow for better optimization. 诀窍是使用一次遍历而不是多次遍历以节省内部开销并允许更好的优化。

Also take into account that it might be better to search in the opposite direction: 还请注意,朝相反的方向搜索可能会更好:

eg instead of a -> b -> c check for c <- b <- a which might be faster. 例如,代替a- a -> b -> c检查c <- b <- a -a可能更快。 This depends on the amount of edges per each node. 这取决于每个节点的边缘数量。 I assume a doctor has many surgeries, but a single patient most likely has only a small amount of surgeries so it is better to start at the patient and check backwards instead of starting at the doctor and check forwards. 我假设医生有很多手术,但是一个病人很可能只有少量手术,因此最好是从病人那里开始并向后检查,而不是从医生那里开始并向前检查。

Please let me know it this helps already, otherwise we can talk about more details and see if we can find some further optimizations. 请让我知道它已经对您有所帮助,否则我们可以讨论更多详细信息,并查看是否可以找到进一步的优化方法。

Disclaimer: I am part of the Core-Dev team at ArangoDB 免责声明:我是ArangoDB的Core-Dev团队的成员

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM