The query starts at the vertex 'me'. I wish to find all A-vertices that are connected to my B-vertex and one of my C-vertices.
A person (like me) is always connected to exacty one B-Vertex, but several C-vertices. Also, the C-vertices connected to me are connected to possibly hundreds of A-vertices. Whereas, my B-vertex, is usually connected to less than 50 A-Vertices.
=> way more CA edges, than BA edges.
I developed two traversals to find all A-vertices connected to my B-vertex and C-vertices.
To make them more easy to understand, lets call those A-vertices connected to me via the B-vertex 'Ab' and those A-vertices connected to me via a C-vertex 'Ac'. Of course the two sets have an intersection, which is exactly what I'm after.
The first traversal uses the 'intersecting set' (Schnittmenge) between the Ac- amd Ab-vertices. First it collects all Ab and stores them with 'as()', then it collects all Ac and keeps only those equal to an Ab-vertex.
g.V('me').out('mb').out('ba').as('Ab')
.V('me').out('mc').out('ca').as('Ac')
.where(eq('Ab'))
variation with aggregate:
g.V('me').out('mb').out('ba').aggregate('Ab')
.V('me').out('mc').out('ca')
.where(within('Ab')).dedup()
The seceond uses a filter. First collecting all Ab-vertices (as this is the smaller set of the two) and then using filter, to only keep those Ab-vertices that are also connected to me via a C-vertex.
g.V('me').out('mb').out('ba')
.filter(
__.in('ca').in('mc').hasId('me')
)
In my estimation, the second should be more efficient, because it traverses a smaller section of the Graph.
Am I right in this assumption? Is there a more efficient approach?
My second problem relates to the sack operator. I wish to sort the resulting set of A-vertices by the strength of the C-Path.
The first query is capable of doing that.
g.withSack(1.0f)
.V('me').out('mb').out('ba').as('Ab')
.V('me')
.outE('mc').has('weight').sack(mult).by('weight')
.inV().hasLabel('C')
.outE('ca').has('weight').sack(mult).by('weight')
.inV().hasLabel('A')
.as('Ac')
.where(eq('Ab'))
.group().by().by(sack().sum())
.unfold()
.order().by(values, desc)
Is there a way to get the me-AC sack-value in the second query as well? My only guess would be to turn the second query around: first find all Ac-vertices, note the sack-values, then remove those not part of Ab. But this would traverse a huge part of the graph. As I said above: the set of Ac-vertices counts several hundred, whereas Ab-vertices are less than fifty.
Data:
g.addV('person').property(id, 'me')
.addV('A').property(id, 'a1')
.addV('A').property(id, 'a2')
.addV('A').property(id, 'a3')
.addV('A').property(id, 'a4')
.addV('B').property(id, 'b')
.addV('C').property(id, 'c1')
.addV('C').property(id, 'c2')
.addE('mc').property(id, 'mc1').property('weight', 0.5).from(V('me')).to(V('c1'))
.addE('mc').property(id, 'mc2').property('weight', 0.6).from(V('me')).to(V('c2'))
.addE('mb').property(id, 'mb').from(V('me')).to(V('b'))
.addE('ba').property(id, 'ba1').from(V('b')).to(V('a2'))
.addE('ba').property(id, 'ba2').from(V('b')).to(V('a3'))
.addE('ba').property(id, 'ba3').from(V('b')).to(V('a4'))
.addE('ca').property(id, 'ca1').property('weight', 0.5).from(V('c1')).to(V('a1'))
.addE('ca').property(id, 'ca2').property('weight', 0.7).from(V('c2')).to(V('a2'))
.addE('ca').property(id, 'ca3').property('weight', 0.4).from(V('c2')).to(V('a3'))
(my code runs on Neptune with gremlin: {'version': 'tinkerpop-3.4.11'})
There is a third option (which can parallellize retrieving B and C, depending on the TinkerPop implementation, but does not retrieve all A):
g.V('me').out('mc').as('C')
.V('me').out('mb').out('ba')
.where(in('ca').within('C'))
For the sack multiplication you can traverse back very fast from A to 'me', because the vertices are already in the cache of the graph system.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.