简体   繁体   中英

cypher narrow search with same node property

the question I have is here : cypher-how-get-relation-between-every-two-node-and-the-distance-from-start-node , more detail here: 2 million companies, each of them must and only belong to a leading company,called group, so every node has properties: groupId and companyId; what's more, companies in different group may have relationship. QUESTION: given a groupId and the leading company id, return all relations in this group and every node in the group's shortest distance to leading company.

since the sql that anwser have big performance issue, especially the shortPath one, so my question is can we narrow down the search scope when use shortPath, only search nodes with same property?

or are there other way to solve the original question?

sorry since I am in China mainland, cannot reach the console.neo4j.com(even with VPN), so I put my sample here:

create (a :COMPANY {companyId:"a",groupId:"ag"}),
       (b:COMPANY  {companyId:"b",groupId:"ag"}),
       (c:COMPANY  {companyId:"c",groupId:"ag"}),
       (d:COMPANY {companyId:"d",groupId:"ag"}),
       (e:COMPANY  {companyId:"e",groupId:"eg"})
create (a)-[:INVESTMENT]->(b),
       (b)-[:INVESTMENT]->(c),
       (c)-[:INVESTMENT]->(d),
       (a)-[:INVESTMENT]->(c),
       (d)-[:INVESTMENT]->(b),
       (c)-[:INVESTMENT]->(e) 
return *

here the node a,b,c,d are same group and a is leading company, e are another group but has relationship with c . so I want get the node-node relation in ag group, for example: ab,ac,bc,cd,db and the shortest distance from a to group member, for example,return dist.a=0,dist.b=1,dist.c=1,dist.d=2

I think that this can not be solved with the help of pure cypher. You can try using the APOC library by adding a temporary property to the relation, and applying the Dijkstra algorithm .

Input params:

{
  "groupId": "ag",
  "leadingCompany": "a"
}

Query:

// Search for a leading company
MATCH (lc:COMPANY {companyId: $leadingCompany, groupId: $groupId})
WITH lc, 
     apoc.create.uuid() as tmpProp // Temporary property name

// All relationships in the group are found. 
// And the value of the temporary property is set ..
MATCH (c1:COMPANY {groupId: $groupId})-[r:INVESTMENT]->(c2:COMPANY {groupId: $groupId})
CALL apoc.create.setRelProperty(r, tmpProp, 1) yield rel
WITH lc, tmpProp, 
     count(r) as tmp

// For each node in the group, need to find short paths to the leading company
MATCH (c:COMPANY {groupId: $groupId})
CALL apoc.algo.dijkstraWithDefaultWeight(lc, c, 'INVESTMENT', tmpProp, 2000000) yield path
WITH tmpProp, c, 
     min(length(path)) as distanceToLeading

// All paths in the group are found, and the temporary property is deleted
MATCH (c)-[r:INVESTMENT]->(:COMPANY {groupId: $groupId})
CALL apoc.create.removeRelProperties(r, [tmpProp]) yield rel
RETURN c as groupNode, distanceToLeading, 
       collect(r) as groupRelations

APOC Procedures can help out here, as some of the path expander procedures can be used to find the shortest distance to each node in the group, and there's also a cover() procedure that will find all relationships between a group of nodes.

You'll want to make sure you have an index on :Company(groupId) and :Company(companyId) first.

MATCH (c:Company{groupId:$groupId})
WITH collect(c) as companies
WITH companies, [c in companies | id(c)] as companyIds, [c in companies 
 WHERE NOT (c)<-[:INVESTMENT]-(:Company{groupId:$groupId})][0] as lead
// for the above, if you already know the lead companyId, just MATCH to the lead instead of this filter
CALL apoc.algo.cover(companyIds) YIELD rel
WITH companies, lead, collect(rel {start:startNode(rel).companyId, end:endNode(rel).companyId}) as relationships
UNWIND companies as company
MATCH path = shortestPath((lead)-[:INVESTMENT*]->(company))
WHERE all(node in nodes(path) WHERE node in companies)
RETURN relationships, collect(company {.companyId, distance:length(path)}) as distance

This query will get you the desired output:

 match p=((c:COMPANY{companyId:'a'})-[i:INVESTMENT*0..99]->(l:COMPANY)) 
    where l.groupId=c.groupId 
    with c,i,l,nodes(p) as path  order by c.companyId
    with c,l,collect(distinct l.companyId) as Companies,min(size(path))-1 as Dist
    match pp=shortestpath((cc:COMPANY{companyId:'a'})-[ii:INVESTMENT*0..99]->(ll:COMPANY)) 
    where ll.companyId in Companies
    with c,Companies,Dist,reduce(s='',x in nodes(pp)|s + x.companyId ) as CompanyPath     
return c.companyId,Companies,Dist,CompanyPath order by Dist

You will notice, it does not require advanced knowledge of the groupId. If a lead company can be in two groups, you would need to include this in the initial where.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM