简体   繁体   中英

Selecting all connected nodes in a sub-graph with a specific starting point to display in an R visualization

I have a simple neo4j database that I use for social network analysis. The database consists of user nodes and other nodes that users may have in common, such as phone or address. There is only one type of relationship [:HAS]. For a user to match another user they must traverse at least one node in between.

Our objective is to store this data in a graph, and deploy an R shiny app to enter a user id and see the full network of connected users. In order to do this we need to pull all nodes and relationships from the connected sub-graph into an edges data frame.

We have achieved some success using the following cypher query. However, this query will only pull in nodes up to 5 degrees of connection away. It also fails for any highly connected node - freezing up our neo4j instance in the process. Is there a more efficient method we should be using to transform the graph data into an edges data frame?

edges_query=paste('MATCH (c0:user {userID:',as.character(cust_id),'})-[]->(l1) 
               OPTIONAL MATCH (l1)<-[]-(c1)
               where id(c1) <> id(c0)
               OPTIONAL MATCH (c1)-[]->(l2)
               where id(l2) <> id(l1)
               OPTIONAL MATCH (l2)<-[]-(c2)
               where id(c2) <> id(c0)
               OPTIONAL MATCH (c2)-[]->(l3)
               where id(l3) <> id(l2)
               OPTIONAL MATCH (l3)<-[]-(c3)
               where id(c3) <> id(c2)
               OPTIONAL MATCH (c3)-[]->(l4)
               where id(l4) <> id(l3)
               OPTIONAL MATCH (l4)<-[]-(c4)
               where id(c4) <> id(c3)
               OPTIONAL MATCH (c4)-[]->(l5)
               where id(l5) <> id(l4)
               OPTIONAL MATCH (l5)<-[]-(c5)
               where id(c5) <> id(c4)


               return 
               ID(c0) as c0_node_id
               , c0.userID as c0_user_id
               , ID(l1) as l1_node_id
               , LABELS(l1) as l1_node_type
               , ID(c1) as c1_node_id
               , c1.userID as c1_user_id
               , id(l2) as l2_node_id
               , labels(l2) as l2_node_type
               , ID(c2) as c2_node_id
               , c2.userID as c2_user_id
               , id(l3) as l3_node_id
               , labels(l3) as l3_node_type
               , ID(c3) as c3_node_id
               , c3.userID as c3_user_id
               , id(l4) as l4_node_id
               , labels(l4) as l4_node_type
               , ID(c4) as c4_node_id
               , c4.userID as c4_user_id
               , id(l5) as l5_node_id
               , labels(l5) as l5_node_type
               , ID(c5) as c5_node_id
               , c5.userID as c5_user_id
               ',sep='')

You should be using the variable-length path matching syntax in Cypher. This syntax is [:REL_TYPE*min..max] , eg [:HAS*..5] where the default min is 1.

You should also be using parameters instead of building a string. Instead of using paste to embed the cust_id , use a named parameter in the query and replace it with its value when running the cypher function, eg

cypher(graph, "MATCH (n:User {userID: {cust_id} }) RETURN n.userID", cust_id=12345)

Let me show you an example of how you'd do this with an example graph.

library(RNeo4j)
library(visNetwork)

vis = function(edges) {
  nodes = data.frame(id=unique(c(edges$from, edges$to)))
  nodes$label = nodes$id
  visNetwork(nodes, edges)
}

graph = startGraph("http://localhost:7474/db/data")

query = "
MATCH p = (:User {userID: {cust_id}})-[:HAS*..5]-(:User)
WITH [x IN nodes(p) WHERE x:User] AS users
UNWIND range(1, size(users) - 1) AS idx
WITH users[idx - 1] AS from, users[idx] AS to
RETURN DISTINCT from.userID AS from, to.userID AS to;
"

edges = cypher(graph, query, cust_id="Tom Cruise")
vis(edges)

I edited the movie graph that ships with Neo4j to fit your model. The above code gives me the following in RStudio:

visNetwork

You can then easily use this in a Shiny app with renderVisNetwork .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM