We are new to Neo4j (and excited!) and I'm trying to apply Cypher to our problem. We have a query that matches paths but which needs to remove paths that involve any nodes or edges that were traversed on any other paths that originated from the same node or edge. Here's a test case:
CREATE (a1:A {name: 'a1'})-[ab1:AB {name: 'ab1'}]->(b1:B {name: 'b1'}),
(a1)-[ab2:AB {name: 'ab2'}]->(b2:B {name: 'b2'}),
(a2:A {name: 'a2'})-[ab3:AB {name: 'ab3'}]->(b1),
(a2)-[ab5:AB {name: 'ab5'}]->(b3:B {name: 'b3'}),
(a3:A {name: 'a3'})-[ab4:AB {name: 'ab4'}]->(b2),
(a3)-[ab6:AB {name: 'ab6'}]->(b3),
(a4:A {name: 'a4'})-[ab7:AB {name: 'ab7'}]->(b3);
Formatted for readability:
a1-[ab1]->b1
a1-[ab2]->b2
a2-[ab3]->b1
a2-[ab5]->b3
a3-[ab4]->b2
a3-[ab6]->b3
a4-[ab7]->b3
We want to find these paths: [A, AB, B, AB, A, AB, B, AB, A] (four steps). (Note: we don't care about edge directionality.) Here's my first try (our terminology: i_id = 'initial' and t_id = 'terminal').
MATCH p = (i_id:A)-[ab1:AB]-(b1:B)-[ab2:AB]-(a1:A)-[ab3:AB]-(b2:B)-[ab4:AB]-(t_id:A)
RETURN i_id.name, ab1.name, b1.name, ab2.name, a1.name, ab3.name, b2.name, ab4.name, t_id.name
ORDER BY i_id.name;
The result is reasonable, given Cypher's Uniqueness feature:
+-------------------------------------------------------------------------------------------------+
| i_id.name | ab1.name | b1.name | ab2.name | a1.name | ab3.name | b2.name | ab4.name | t_id.name |
+-------------------------------------------------------------------------------------------------+
| "a1" | "ab1" | "b1" | "ab3" | "a2" | "ab5" | "b3" | "ab6" | "a3" |
| "a1" | "ab1" | "b1" | "ab3" | "a2" | "ab5" | "b3" | "ab7" | "a4" |
| "a1" | "ab2" | "b2" | "ab4" | "a3" | "ab6" | "b3" | "ab5" | "a2" |
| "a1" | "ab2" | "b2" | "ab4" | "a3" | "ab6" | "b3" | "ab7" | "a4" |
| "a2" | "ab3" | "b1" | "ab1" | "a1" | "ab2" | "b2" | "ab4" | "a3" |
| "a2" | "ab5" | "b3" | "ab6" | "a3" | "ab4" | "b2" | "ab2" | "a1" |
| "a3" | "ab4" | "b2" | "ab2" | "a1" | "ab1" | "b1" | "ab3" | "a2" |
| "a3" | "ab6" | "b3" | "ab5" | "a2" | "ab3" | "b1" | "ab1" | "a1" |
| "a4" | "ab7" | "b3" | "ab5" | "a2" | "ab3" | "b1" | "ab1" | "a1" |
| "a4" | "ab7" | "b3" | "ab6" | "a3" | "ab4" | "b2" | "ab2" | "a1" |
+-------------------------------------------------------------------------------------------------+
However, we want additional filtering. Consider WHERE i_id.name = 'a2':
+-------------------------------------------------------------------------------------------------+
| i_id.name | ab1.name | b1.name | ab2.name | a1.name | ab3.name | b2.name | ab4.name | t_id.name |
+-------------------------------------------------------------------------------------------------+
| "a2" | "ab3" | "b1" | "ab1" | "a1" | "ab2" | "b2" | "ab4" | "a3" |
| "a2" | "ab5" | "b3" | "ab6" | "a3" | "ab4" | "b2" | "ab2" | "a1" |
+-------------------------------------------------------------------------------------------------+
Notice how the first path contains ab4.name = "ab4", which is also found on the second path as ab3.name. Conversely, "ab2" is found on the second path as ab4.name and on the first path as ab3.name. In our application we want these two to 'cancel out' so that the query returns no matches for a2.
So finally, my question: How would you approach doing this in Cypher? Multiple queries is OK as long as they execute quickly :-) I'm brand new to Cypher, but some of the things I thought might be useful are (straw-clutching, here :-)
We'd like to do as much in Cypher as possible, but if the answer is "You can't do that," then we'll pull the above candidate results into memory and finish processing there. Thanks very much!
So I've worked up a solution using your second suggestion, which is to add properties to the relationships to indicate if they are on two or more pathways.
First, create a traversed
property on each AB
relationship and set it to 0
:
MATCH ()-[ab:AB]-()
SET ab.traversed = 0
Now I'm going to use the a2
as the starting node for an example. This query finds all pathways from a2
to another node with label A
that is four steps long. The traversed
property of each of the relationships is set to the count of times that relationship was encountered on a pathway.
MATCH p = (a2:A {name:'a2'})-[:AB*4]-(:A)
UNWIND RELATIONSHIPS(p) AS r
WITH r, COUNT(*) AS times_traversed
SET r.traversed = times_traversed
RETURN r.name, r.traversed
ORDER BY r.name
And we get the following output:
As you explain in your example, ab2
and ab4
are on both pathways and so their traversed
property is 2
.
With these properties set on each relationship, you can filter the pathways to only the pathways whose sum of the traversed
properties is equal to the path length, which is 4 in your case.
MATCH p = (a2:A {name:'a2'})-[:AB*4]-(:A)
WHERE REDUCE(traversal = 0, r IN RELATIONSHIPS(p) | traversal + r.traversed) = LENGTH(p)
RETURN p
This returns no paths, since the sum of the traversed
properties is 6
for both pathways, and not the required 4
.
But like I said, this is super inelegant and there is probably a better way to do this.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.