简体   繁体   English

arangodb aql有效地从startvertex过渡到endvertex,并找到它们之间的连接

[英]arangodb aql effectively tarversing from startvertex through the endvertex and find connection between them

i'm very new to graph concept and arangodb. 我是图形概念和arangodb的新手。 i plan to using both of them in a project which related to communication analysis. 我计划在与通讯分析有关的项目中同时使用它们。 i have set the data to fit the need in arangodb with one document collection named object and one edge collection named object_routing 我已将数据设置为适合arangodb的需求,其中一个名为object文档集合和一个名为object_routing边缘集合

in my object the data structure is as follow 在我的object ,数据结构如下

{
  "img": "assets/img/default_message.png",
  "label": "some label",
  "obj_id": "45a92a7344ee4f758841b5466c010ed9",
  "type": "message"
}
...
{
  "img": "assets/img/default_person.png",
  "label": "some label",
  "obj_id": "45a92a7344ee4f758841b5466c01111",
  "type": "user"
}

in my object_routing the data structure is as follow 在我的object_routing ,数据结构如下

{
  "message_id": "no_data",
  "source": "45a92a7344ee4f758841b5466c010ed9",
  "target": "45a92a7344ee4f758841b5466c01111",
  "type": "has_contacted"
}

with _from : object/45a92a7344ee4f758841b5466c010ed9 and _to : object/45a92a7344ee4f758841b5466c01111 _from : object/45a92a7344ee4f758841b5466c010ed9_to : object/45a92a7344ee4f758841b5466c01111

the sum of data for object is 23k and for object_routing is 127k. object的数据总和为23k, object_routing的数据总和为127k。

my question is, how can i effectively traversing from start vertex through the end vertex, so that i can presumably get all the connected vertex and its edge and its children and so on between them untill there is nothing to traverse again? 我的问题是,我如何才能有效地从起始顶点遍历到终止顶点,以便大概可以获取所有相连的顶点及其边缘以及子节点之间的依此类推,直到再没有遍历为止?

i'm afraid my question is not clear enough and my understanding of graph concept is not in the right direction so please bear with me 恐怕我的问题还不够清楚,我对图形概念的理解没有正确的方向,请耐心等待

note : bfs algorithm is not an option because that is not what i need. 注意:bfs算法不是一种选择,因为那不是我所需要的。 if possible, i would like to get the longest path. 如果可能的话,我想走最长的路。 my arangodb current version is 3.1.7 running on a cluster with 1 coordinator and 3 db servers 我的arangodb当前版本是3.1.7,在具有1个协调器和3个数据库服务器的群集上运行

It is worth trying a few queries to get a feel for how AQL traversals work, but maybe start with this example from the AQL Traversal documentation page: 值得尝试一些查询来了解AQL遍历如何工作,但是也许可以从AQL遍历文档页面的以下示例开始:

FOR v, e, p IN 1..10 OUTBOUND 'object/45a92a7344ee4f758841b5466c010ed9' GRAPH 'insert_my_graph_name'
  LET last_vertex_in_path = LAST(p.vertices)
  FILTER last_vertex_in_path.obj_id == '45a92a7344ee4f758841b5466c01111'
  RETURN p

This sample query will look at all outbound edges in your graph called insert_my_graph_name starting from the vertex with an _id of object/45a92a7344ee4f758841b5466c010ed9 . 此示例查询将查看从顶点开始的,具有_idobject/45a92a7344ee4f758841b5466c010ed9图形中名为insert_my_graph_name所有出站边。

The query is then set up to return three variables for every path found: 然后将查询设置为为找到的每个路径返回三个变量:

  • v contains a collection of vertices for the outbound path found v包含找到的出站路径的顶点集合
  • e contains a collection of edges for the outbound path found e包含找到的出站路径的边的集合
  • p contains the path that was found p包含找到的路径

A path is consisted of vertices connected to each other by edges. 路径由通过边彼此连接的顶点组成。

If you want to explore the variables, try this version of the query: 如果要浏览变量,请尝试以下版本的查询:

FOR v, e, p IN 1..10 OUTBOUND 'object/45a92a7344ee4f758841b5466c010ed9' GRAPH 'insert_my_graph_name'
  RETURN {
    vertices: v,
    edges: e,
    paths: p
  }

What is nice is that AQL returns this information in JSON format, in arrays and such. 很好的是,AQL以JSON格式(以数组等)返回此信息。

When a path is returned, it is stored as a document with two attributes, edges and vertices , where the edges attribute is an array of edge documents the path went down, and the vertices attribute is an array of vertex documents. 当返回路径时,它被存储为具有两个属性edgesvertices的文档,其中edges属性是路径下降的边缘文档的数组,而vertices属性是顶点文档的数组。

The interesting thing about the vertices array is that the order of array elements is important. 关于vertices数组的有趣之处在于数组元素的顺序很重要。 The first document in the vertices array is the starting vertex, and the last document is the ending vertex. vertices数组中的第一个文档是起始顶点,最后一个文档是终止顶点。

So the example query above, because your query is set up as an OUTBOUND query, that means your starting vertex will always be the FIRST element of the array stored at p.vertices' and the end of the path will always be the LAST` element of that array. 所以上面的示例查询,因为您的查询被设置为OUTBOUND查询,这意味着您的起始顶点将始终是存储在p.vertices' and the end of the path will always be the处的数组的FIRST元素p.vertices' and the end of the path will always be the LAST`元素该数组。

It doesn't matter how many vertices are traversed in your path, that rule still works. 路径中遍历的顶点数量无关紧要,该规则仍然有效。

If your query was an INBOUND rule, then the logic stays the same, in that case FIRST(p.vertices) will be the starting vertex for the path, and LAST(p.vertices) will be the terminating vertex, which will be the same _id as what you specified in your query. 如果您的查询是INBOUND规则,则逻辑保持不变,在这种情况下, FIRST(p.vertices)将是路径的起始顶点,而LAST(p.vertices)将是终止顶点,即与您在查询中指定的_id相同。

So back to your use case.. if you want to filter out all OUTBOUND paths from your starting vertex to a specific vertex, then you can add the LET last_vertex_in_path = LAST(p.vertices) declaration to set a reference to the last vertex in the path provided. 所以回到你的使用情况..如果你希望将所有过滤OUTBOUND从起始顶点的路径到一个特定的顶点,那么你可以添加LET last_vertex_in_path = LAST(p.vertices)申报设置到最后一个顶点的参考提供的路径。

Then you can easily provide a FILTER that references this variable, and then filter on any attribute of that terminating vertex. 然后,您可以轻松地提供一个引用此变量的FILTER ,然后对该终止顶点的任何属性进行过滤。 You could filter on the last_vertex_in_path._id or last_vertex_in_path.obj_id or any other parameter of that final vertex document. 您可以过滤last_vertex_in_path._idlast_vertex_in_path.obj_id或该最终顶点文档的任何其他参数。

Play with it and practice some, but once you see that a graph traversal query only provides you with these three key variables, v , e , and p , and these aren't anything special, they are just arrays of vertices and edges, then you can do some pretty powerful filtering. 进行一些练习,但是一旦您发现图形遍历查询只为您提供了这三个关键变量vep ,它们并不是什么特别的东西,它们只是顶点和边的数组,然后您可以进行一些非常强大的过滤。

You could put filters on properties of any of the vertices, edges, or path positions to do some pretty flexible filtering and aggregation of the results it sends through. 您可以在任何顶点,边或路径位置的属性上放置过滤器,以对发送的结果进行非常灵活的过滤和聚合。

Also have a look at the traversal options, they can be useful. 还可以查看遍历选项,它们可能很有用。

To get started just make sure your have your documents and edges loaded, and that you've created a graph with those document and edges collections in it. 首先,请确保已加载了文档和边,并创建了一个包含这些文档和边集合的图形。

And yes.. you can have many document and edge collections in a single graph, even sharing document/edge collections over multiple graphs if that suits your use cases. 是的..您可以在一个图形中包含许多文档和边集合,如果适合您的用例,甚至可以在多个图形上共享文档/边集合。

Have fun! 玩得开心!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM