简体   繁体   中英

Gremlin query to get affected nodes from the Cosmos graph

We have a Cosmos graph database like below. ie. A,B,C,... are nodes/vertices and edges are as shown by the arrows.

在此处输入图像描述

Each node/vertex represents a value in SQL table. Process and the requirement are as follows.

  1. User modifies node A value in SQL table
  2. Gremlin query passes A into the Graph
  3. Graph returns the following vertices in the below listed order
  4. C# app calculates the values of D,K,M,P nodes in the order and updates SQL table
  • D = A+B+C
  • K = F+E+D
  • M = J+K
  • P = L+M+N+O

I tried the following query and it has over 3000 RUs which is very costly.

g.V("A").emit().repeat(__.in('depends')).until(__.inE().count().is(0))

We need some help to optimise the query. thanks

UPDATE ===========

OK, we can rebuild the graph in a single partition to reduce the RUs but we have a scenario where multiple nodes are affected, highlighted in red in the below picture, on the way up starting from A.

Can someone help with a query to get the results in A, D, K, O, M, P order please? Logic to the query is all the child nodes should be listed before their parents

g.addV('ddn').property('pk', 'pk').property(id, 'A').property('formula', 'A').
addV('ddn').property('pk', 'pk').property(id, 'B').property('formula', 'B').
addV('ddn').property('pk', 'pk').property(id, 'C').property('formula', 'C').
addV('ddn').property('pk', 'pk').property(id, 'D').property('formula', 'A+B+C').property('requires', "'A','B','C'").
addV('ddn').property('pk', 'pk').property(id, 'E').property('formula', 'E').
addV('ddn').property('pk', 'pk').property(id, 'F').property('formula', 'E').
addV('ddn').property('pk', 'pk').property(id, 'G').property('formula', 'H+I').property('requires', "'H','I'").
addV('ddn').property('pk', 'pk').property(id, 'H').property('formula', 'H').
addV('ddn').property('pk', 'pk').property(id, 'I').property('formula', 'I').
addV('ddn').property('pk', 'pk').property(id, 'J').property('formula', 'F+G').property('requires', "'F','G'").
addV('ddn').property('pk', 'pk').property(id, 'K').property('formula', 'D+E+F').property('requires', "'D','E','F'").
addV('ddn').property('pk', 'pk').property(id, 'L').property('formula', 'L').
addV('ddn').property('pk', 'pk').property(id, 'M').property('formula', 'J+K').
addV('ddn').property('pk', 'pk').property(id, 'N').property('formula', 'N').
addV('ddn').property('pk', 'pk').property(id, 'O').property('formula', 'A+K').property('requires', "'A','K'").
addV('ddn').property('pk', 'pk').property(id, 'P').property('formula', 'L+M+N+O').property('requires', "'L','M','N','O'").
V('D').addE('needs').to(V('A')).
V('D').addE('needs').to(V('B')).
V('D').addE('needs').to(V('C')).
V('G').addE('needs').to(V('H')).
V('G').addE('needs').to(V('I')).
V('K').addE('needs').to(V('D')).
V('K').addE('needs').to(V('E')).
V('K').addE('needs').to(V('F')).
V('J').addE('needs').to(V('F')).
V('J').addE('needs').to(V('G')).
V('O').addE('needs').to(V('A')).
V('O').addE('needs').to(V('K')).
V('M').addE('needs').to(V('J')).
V('M').addE('needs').to(V('K')).
V('P').addE('needs').to(V('L')).
V('P').addE('needs').to(V('M')).
V('P').addE('needs').to(V('N')).
V('P').addE('needs').to(V('O'))

在此处输入图像描述

I think the answer boils down to being able to sort the vertices traversed by their path length.

gremlin> g.V("A").
......1>   emit().repeat(__.in('needs')).path().
......2>   group().
......3>     by(tail(local)).
......4>     by(count(local).fold()).
......5>   order(local).
......6>     by(select(values).tail(local)).
......7>   select(keys)
==>[v[A],v[D],v[K],v[M],v[O],v[P]]

I group() by the last element in the path() and transform each path in the group to its length with count(local) . That allows me to order() the results by the longest path for each vertex.

Note that I don't think you need until(__.inE().count().is(0)) because you're just traversing to path exhaustion in either case. Also, take care with __.inE().count().is(0) as you end up counting all the edges just to detect a count of zero. Most graphs should optimize that to just until(inE()) , but it's always best to be explicit in my opinion. That said, you need to be sure of your data structures when using repeat() - it takes just one edge of bad data to send your traversal into an infinity of traversing. Consider some kind of upper bound to your repeat() that makes sense for your data so that the loop will terminate at some point.

Here is an alternative which actually might be better since it doesn't bother to hold all the counts in the Map after the group() :

gremlin> g.V("A").
......1>   emit().repeat(__.in('needs')).path().
......2>   group().
......3>     by(tail(local)).
......4>     by(count(local).order(local).tail(local)).
......5>   order(local).
......6>     by(values).
......7>   select(keys)
==>[v[A],v[D],v[K],v[M],v[O],v[P]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM