简体   繁体   中英

Difference between ArangoDB graph traversal results in arangosh vs arango http

I'm using ArangoDb v3.0.7 with Python 3.4.

I'm running a graph traversal in both the arango shell and via a call to the traverser using 'POST /_api/traversal' (from python using the python-arango library). I'm using the same parameters/config for both calls, and the same user defined expander and visitor functions.

While the traversal in the shell works as expected, the traversal via the http api call does not return all the vertices that should be visited, as dictated by the expander function.

On the python/http side, I'm executing:

query_result = dl_graph.traverse(
        start_vertex=start_vertex,
        strategy="dfs",
        max_depth=8,
        order="preorder",
        vertex_uniqueness="global",
        edge_uniqueness="global",
        filter_func=None,
        expander_func=expander,  // string containing function body, see def below
        visitor_func=visitor  // string containing function body, see def below
    )

When running under the arango shell, the config is:

{
  "datasource" : {
    "graph" : [ Graph kraken EdgeDefinitions: [
      "Owns: [Account] -> [Account, Building]",
      "Services: [BSObject] -> [BSObject, SpatialObject]",
      "Employs: [Account] -> [Account, User]",
      "WorksIn: [Project, Role, Team, TeamMember, User] -> [Building, Project, SpatialO...",
      "IsIn: [SpatialObject] -> [Building, SpatialObject]",
      "AccessControl: [Project, Role, Team, TeamMember, User] -> [BSObject, Building, C...",
      "Instantiation: [BSObject] -> [Class]",
      "Supply: [BSObject] -> [BSObject, SpatialObject]",
      "HasRole: [TeamMember, User] -> [Role]",
      "Contracts: [Account] -> [Account, Contract, Service]",
      "can_read: [Team] -> [message_exchange]",
      "can_write: [Team] -> [message_exchange]",
      "Context: [message_exchange] -> [BSObject, Building, SpatialObject]",
      "has_authority_over: [Team] -> [Team]",
      "Provides: [Contract, ContractSchedule, Role] -> [Service]",
      "To: [Service] -> [Team]",
      "Under: [Role] -> [Contract, ContractSchedule]",
      "Leases: [Account] -> [Building, SpatialObject]",
      "Partof: [ContractSchedule] -> [Contract]",
      "Manages: [testA] -> [Account]"
    ] VertexCollections: [
      "Contract",
      "ContractSchedule",
      "Group",
      "Service",
      "test1",
      "testdialog",
      "GraphViews",
      "GraphViewPositions",
      "testBuilding",
      "test5",
      "test6",
      "test7",
      "test10",
      "test11",
      "test123",
      "test134567",
      "testA"
    ] ],
    "getVertexId" : function (vertex) { ... },
    "getPeerVertex" : function (edge, vertex) { ... },
    "getInVertex" : function (edge) { ... },
    "getOutVertex" : function (edge) { ... },
    "getEdgeId" : function (edge) { ... },
    "getEdgeFrom" : function (edge) { ... },
    "getEdgeTo" : function (edge) { ... },
    "getLabel" : function (edge) { ... },
    "getAllEdges" : function (vertex) { ... },
    "getInEdges" : function (vertex) { ... },
    "getOutEdges" : function (vertex) { ... }
  },
  "order" : 0,
  "itemOrder" : 0,
  "strategy" : 1,
  "uniqueness" : {
    "vertices" : 2,
    "edges" : 2
  },
  "visitor" : function (config, result, vertex, path, connected) { ... },
  "filter" : function maxDepthFilter (config, vertex, path) { ... },
  "expander" : function (config, vertex, path) { ... },
  "maxIterations" : 10000000,
  "minDepth" : 0,
  "maxDepth" : 8,
  "buildVertices" : true,
  "messages" : [
    "in edge, gedID:525147be-ebec-11e5-9e42-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:525147be-ebec-11e5-9e42-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:525147be-ebec-11e5-9e42-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:525147be-ebec-11e5-9e42-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:525147be-ebec-11e5-9e42-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca",
    "in edge, gedID:cfa4fa2e-ef0e-11e5-8377-80193436fdca"
  ]
}

Note that the 'messages' array at the end was added by the expander function below. This was my attempt to have some diagnostic output returned to the python client when running under the http api. This technique also only worked when running under the shell. The reason for setting up the messages array is that I couldn't find out where the the 'require("internal").print(...)' output goes when using the http api.

The expander function is the following. The key part is that the function will traverse the graph until it gets to a vertex with the gedID of '30dd37a6-ec18-11e5-81b7-80193436fdca'. When the visitor function receives this vertex, it creates a list of ids from the path to that vertex.

function (config, vertex, path) {
if (!config.messages) config.messages = [];

var connections = [ ];
if (vertex.gedID == '67a59b76-efc3-11e5-9c6f-80193436fdca' 
 || vertex.gedID == '2df65d4a-ef3f-11e5-b2f8-80193436fdca' 
 || vertex.gedID == 'eb529db8-453e-11e6-b430-80193436fdca'
 || vertex.gedID == 'd38d7148-5ccd-11e6-b863-80193436fdca'
 || vertex.gedID == 'e6dc9d00-5ccd-11e6-9362-80193436fdca')  // follow out edges for these vertices only
{
  config.datasource.getOutEdges(vertex).forEach(function (e) {
  require("internal").print("vertex, name:" + vertex.name);
  require("internal").print("out edge, gedID:" + e.gedID);
  var toVertex = require("internal").db._document(e._to)

  if (toVertex.gedID != 'c1da8df6-2731-11e6-9aba-80193436fdca'         //     Block traversal through this vertex
   && (e.gedID != '525147be-ebec-11e5-9e42-80193436fdca' || e.Update ==  'afd34490-ef35-11e5-b18e-80193436fdca')      // not equal to edges of this type, and update has the specified value
   && !(vertex.gedID == '67a59b76-efc3-11e5-9c6f-80193436fdca' && toVertex.gedID == '67a59b76-efc3-11e5-9c6f-80193436fdca'))   // ignore connections between vertices of this type
    connections.push({vertex: toVertex, edge: e});
  });
}
else
{
  config.datasource.getInEdges(vertex).forEach(function (e) {
    require("internal").print("vertex, name:" + vertex.name);
    require("internal").print("in edge, gedID:" + e.gedID);
    config.messages.push("in edge, gedID:" + e.gedID);

    if (vertex.gedID != "30dd37a6-ec18-11e5-81b7-80193436fdca"   //  Terminating vertex
     && e.gedID != '1605c270-269e-11e6-9b64-80193436fdca'    // Avoid traversal in this direction
     && e.gedID != 'a50f19d8-efdc-11e5-a5bf-80193436fdca'   // Avoid traversal in this direction
     && e.gedID != '8e327408-5e45-11e6-8c9a-80193436fdca')  // Avoid traversal in this direction
      connections.push({ vertex: require("internal").db._document(e._from),  edge: e});
  });
  require("internal").print("messages:" + config.messages);
}
return connections;
}

The visitor function is:

function (config, result, vertex, path, connected) {    
  if (! result || ! result.visited) {
    return;
  }

  require("internal").print("messages:" + config.messages);

  if (!result.visited.pathVertices) {result.visited.pathVertices = []}

  result.visited.messages = config.messages;

  if (result.visited.vertices) {
    result.visited.vertices.push(vertex);
  }

  if (result.visited.paths) {
    if (vertex.gedID == "30dd37a6-ec18-11e5-81b7-80193436fdca")
    {
      for (var e = 0; e < path.edges.length; e++)
      {
        result.visited.paths.push(path.edges[e]._id)
      }
      for (var v = 0; v < path.vertices.length; v++)
      {
        result.visited.pathVertices.push(path.vertices[v]._id)
      }
    }
  }
}

The intention is that when the vertex with the id '30dd37a6-ec18-11e5-81b7-80193436fdca' is found, the pathVertices array is populated with the IDs.

When running under the arango shell, the final vertex ('30dd37a6-ec18-11e5-81b7-80193436fdca') is passed to the visitor function, and the pathVertices array is populated. However when running via the http api, the pathVertices array is returned empty.

In addition, the 'result.visited.vertices' array contains the final vertex (30dd37a6-ec18-11e5-81b7-80193436fdca) when running under the shell, but this vertex is missing when running with the http api.

There are probably better ways to do this, and I'd welcome input on that, but the key question here is why there is a difference in the way the traversal operates between the arango shell and the http api (and how can I get my missing vertex/pathVertices array returned to the python code).

As you probably already know arangosh as well as the python drivers talk HTTP to arangodb.

So you can easily compare what the both of them do to invoke the traversal using wireshark .

Compare the two, and you probably can give a much reduced set of information what should be different.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM