简体   繁体   中英

Gremlin query to find the entire sub-graph that a specific node is connected in any way to

I am brand new to Gremlin and am using gremlin-python to traverse my graph. The graph is made up of many clusters or sub-graphs which are intra-connected, and not inter-connected with any other cluster in the graph.

A simple example of this is a graph with 5 nodes and 3 edges:

  • Customer_1 is connected to CreditCard_A with 1_HasCreditCard_A edge
  • Customer_2 is connected to CreditCard_B with 2_HasCreditCard_B edge
  • Customer_3 is connected to CreditCard_A with 3_HasCreditCard_A edge

I want a query that will return a sub-graph object of all nodes and edges connected (in or out) to the queried node. I can then store this sub-graph as a variable and then run different traversals on it to calculate different things.

This query would need to be recursive as these clusters could be made up of nodes which are many (inward or outward) hops away from each other. There are also many different types of nodes and edges, and they all must be returned.

For example:

  • If I specified Customer_1 in the query, the resulting sub-graph would contain Customer_1 , Customer_3 , CreditCardA , 1_HasCreditCard_A , and 3_HasCreditCard_A .
  • If I specififed Customer_2 , the returned sub-graph would consist of Customer_2 , CreditCard_B , 2_HasCreditCard_B .
  • If I queried Customer_3 , the exact same subgraph object as returned from the Customer_1 query would be returned.

I have used both Neo4J with Cypher and Dgraph with GraphQL and found this task quite easy in these two langauges, but am struggling a bit more with understanding gremlin.

EDIT:

From, this question , the selected answer should achieve what I want, but without specifying the edge type by changing .both('created') to just .both() .

However, the loop syntax: .loop{true}{true} is invalid in Python of course. Is this loop function available in gremlin-python ? I cannot find anything.

EDIT 2:

I have tried this and it seems to be working as expected, I think.

g.V(node_id).repeat(bothE().otherV().simplePath()).emit()

Is this a valid solution to what I am looking for? Is it also possible to include the queried node in this result?

Regarding the second edit, this looks like a valid solution that returns all the vertices connected to the starting vertex. Some small fixes:

  • you can change the bothE().otherV() to both()
  • if you want to get also the starting vertex you need to move the emit step before the repeat
  • I would add a dedup step to remove all duplicate vertices (can be more than 1 path to a vertex)
g.V(node_id).emit().repeat(both().simplePath()).dedup()

exmaple: https://gremlify.com/jngpuy3dwg9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM