简体   繁体   中英

Value Aggregation along Traversal Gremlin Cosmos DB

I need to perform a query of the following form: I have a tree structure with costs at the leaf node. I need a single query to give me all the aggregated costs under the root node. 在此处输入图片说明

For example in the above graph, I would expect an output from my query like

{ 1: 6, 2: 4, 3: 2, 4: 1, 5: 1, 6: 2, 7: 1, 8: 1}

I was looking into using the 'sack' step from the Gremlin API for this, but cosmosDB doesn't seem to support sacks currently. I also tried storing a pseudo-property of "aggregated-cost" and working my way up from the leaf nodes, but I was unable to figure out how to store a dynamic value at each node as a property that is local to only that node. Is this kind of query possible given these constraints?

When asking questions about Gremlin, it is always best to include a short Gremlin script of sample data:

g.addV().property('id',1).as('1').
  addV().property('id',2).as('2').
  addV().property('id',3).as('3').
  addV().property('id',4).property('cost',1).as('4').
  addV().property('id',5).property('cost',1).as('5').
  addV().property('id',6).property('cost',2).as('6').
  addV().property('id',7).property('cost',1).as('7').
  addV().property('id',8).property('cost',1).as('8').
  addE('link').from('1').to('2').
  addE('link').from('1').to('3').
  addE('link').from('2').to('4').
  addE('link').from('2').to('5').
  addE('link').from('2').to('6').
  addE('link').from('3').to('7').
  addE('link').from('3').to('8').iterate()

With the steps available in CosmosDB I think that the closest you might be able to get is this:

gremlin> g.V().
......1>   group().
......2>     by('id').
......3>     by(emit(has('cost')).
......4>        repeat(out()).
......5>        values('cost').
......6>        fold())
==>[1:[1,1,2,1,1],2:[1,1,2],3:[1,1],4:[1],5:[1],6:[2],7:[1],8:[1]]

The group() helps produce the Map structure you wanted. Then for each vertex you group on you use repeat() to traverse out until you reach the leaf vertices. Note that emit() is ensuring that only those vertices that are leaves with the "cost" property are being returned for purpose of the result.

The reason I say that this is about as close as you can get with CosmosDB is because I don't see that CosmosDB supports the sum() step here . If it did then:

gremlin> g.V().
......1>   group().
......2>     by('id').
......3>     by(emit(has('cost')).
......4>        repeat(out()).
......5>        values('cost').
......6>        sum())
==>[1:6,2:4,3:2,4:1,5:1,6:2,7:1,8:1]

I guess you will have to do that final computation on the returned result yourself.

For others (or when CosmosDB supports sack() in the future) you can do:

gremlin> g.V().has('cost').
......1>   sack(assign).
......2>     by('cost').
......3>   emit().
......4>     repeat(__.in('link')).
......5>   group().
......6>     by('id').
......7>     by(sack().sum())
==>[1:6,2:4,3:2,4:1,5:1,6:2,7:1,8:1]

Courtesy of the Gremlin Guru .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM