简体   繁体   中英

In gremlin, how to find and sort all triads of connected vertices that a given vertex belongs to?

I am making a social app where users can be friends.

For a given user A , I want to find all of the triads of users such that A -isFriends-> B AND B -isFriends-> C AND C -isFriends-> A .

My current approach is as follows:

g.V(A).repeat(__.out('isFriends')).times(3).path().by(id).toList()

and then outside of gremlin, I filter out all of the Path objects where the first object is not the same as the last object. I'd prefer to have gremlin do this filtering for me, but I'm unsure how to filter based on the output of path() .

I have tried cyclicPath() , but this simply returns a flat list of Vertex objects, which I do not understand. From this I would expect similar output to path() but with only the paths where the first and last vertices are the same included. Let me know if this expectation is incorrect.

I would also like to then sort these paths based on the results of a sub-traversal (how many mutual friends the three vertices have), but I am unsure how to run a traversal starting on the vertices included in the output of path() , without starting a new gremlin query.

I am using the javascript-gremlin driver (v3.4.4) and am making queries against AWS Neptune, where lambdas are not available.

Please let me know if my approach or understanding is off.

So I tried to create a sample graph for your issue: sample graph

You are trying to find a simple cyclic path I think you can achieve that with:

g.V().hasLabel('A').as('a')
.both('isFriends').as('b')
.both('isFriends').as('c')
.where(both('isFriends').hasLabel('A'))
.select('a', 'b', 'c')

Note: that triads are symmetric so each of them will return twice.

Below is a query to answer both your questions. It is optimized to run with only 2 hops:

g.V().hasLabel("A").as("A").both("isFriends").as("B").aggregate("friends")
.out("isFriends").as("C").where(within("friends"))
.project("triad","mutual")
  .by(select("A","B","C").by(label()))
  .by(select("B","C").select(values).unfold()
    .both("isFriends").where(within("friends"))
    .groupCount().by().unfold()
    .where(select(values).is(eq(2)))
    .select(keys).label().fold())
.order().by(select("mutual").count(local), desc)

Some explanation:

  • Find A friends and store them as 'friends'.
  • Then find their friends which are within 'friends' thus friends with A (use out this time to prevent duplication).
  • Use project to make the result more verbose.
  • Selecting A, B and C to get the triad.

Now come the fun part of finding the triad mutual friends:

  • Start from B and C and find their friends which are also A friends.
  • Group-Count those friends and filter only those which have count of 2 (friends with both B and C).
  • Finally, sort the results by the count of mutual friends.

Instead of actual list of mutual friends, you can keep only their count by replacing last 2 lines:

    .select(keys).label().fold())
.order().by(select("mutual").count(local), desc)

with:

    .count())
.order().by(select("mutual"), desc)

Finally, if you want only the triads (still sorted) you can remove the project:

g.V().hasLabel("A").as("A").both("isFriends").as("B").aggregate("friends")
.out("isFriends").as("C").where(within("friends"))
.order().by(
    select("B","C").select(values).unfold()
    .both("isFriends").where(within("friends"))
    .groupCount().by().unfold().where(select(values).is(eq(2)))    
    .count(), desc)
.select("A","B","C").by(label()).select(values)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM