简体   繁体   中英

How do I write queries to compare nodes and edges on different paths?

We are new to Neo4j (and excited!) and I'm trying to apply Cypher to our problem. We have a query that matches paths but which needs to remove paths that involve any nodes or edges that were traversed on any other paths that originated from the same node or edge. Here's a test case:

CREATE (a1:A {name: 'a1'})-[ab1:AB {name: 'ab1'}]->(b1:B {name: 'b1'}),
       (a1)-[ab2:AB {name: 'ab2'}]->(b2:B {name: 'b2'}),
       (a2:A {name: 'a2'})-[ab3:AB {name: 'ab3'}]->(b1),
       (a2)-[ab5:AB {name: 'ab5'}]->(b3:B {name: 'b3'}),
       (a3:A {name: 'a3'})-[ab4:AB {name: 'ab4'}]->(b2),
       (a3)-[ab6:AB {name: 'ab6'}]->(b3),
       (a4:A {name: 'a4'})-[ab7:AB {name: 'ab7'}]->(b3);

Formatted for readability:

a1-[ab1]->b1
a1-[ab2]->b2
a2-[ab3]->b1
a2-[ab5]->b3
a3-[ab4]->b2
a3-[ab6]->b3
a4-[ab7]->b3

We want to find these paths: [A, AB, B, AB, A, AB, B, AB, A] (four steps). (Note: we don't care about edge directionality.) Here's my first try (our terminology: i_id = 'initial' and t_id = 'terminal').

MATCH p = (i_id:A)-[ab1:AB]-(b1:B)-[ab2:AB]-(a1:A)-[ab3:AB]-(b2:B)-[ab4:AB]-(t_id:A)
RETURN i_id.name, ab1.name, b1.name, ab2.name, a1.name, ab3.name, b2.name, ab4.name, t_id.name
ORDER BY i_id.name;

The result is reasonable, given Cypher's Uniqueness feature:

+-------------------------------------------------------------------------------------------------+
| i_id.name | ab1.name | b1.name | ab2.name | a1.name | ab3.name | b2.name | ab4.name | t_id.name |
+-------------------------------------------------------------------------------------------------+
| "a1"      | "ab1"    | "b1"    | "ab3"    | "a2"    | "ab5"    | "b3"    | "ab6"    | "a3"      |
| "a1"      | "ab1"    | "b1"    | "ab3"    | "a2"    | "ab5"    | "b3"    | "ab7"    | "a4"      |
| "a1"      | "ab2"    | "b2"    | "ab4"    | "a3"    | "ab6"    | "b3"    | "ab5"    | "a2"      |
| "a1"      | "ab2"    | "b2"    | "ab4"    | "a3"    | "ab6"    | "b3"    | "ab7"    | "a4"      |
| "a2"      | "ab3"    | "b1"    | "ab1"    | "a1"    | "ab2"    | "b2"    | "ab4"    | "a3"      |
| "a2"      | "ab5"    | "b3"    | "ab6"    | "a3"    | "ab4"    | "b2"    | "ab2"    | "a1"      |
| "a3"      | "ab4"    | "b2"    | "ab2"    | "a1"    | "ab1"    | "b1"    | "ab3"    | "a2"      |
| "a3"      | "ab6"    | "b3"    | "ab5"    | "a2"    | "ab3"    | "b1"    | "ab1"    | "a1"      |
| "a4"      | "ab7"    | "b3"    | "ab5"    | "a2"    | "ab3"    | "b1"    | "ab1"    | "a1"      |
| "a4"      | "ab7"    | "b3"    | "ab6"    | "a3"    | "ab4"    | "b2"    | "ab2"    | "a1"      |
+-------------------------------------------------------------------------------------------------+

However, we want additional filtering. Consider WHERE i_id.name = 'a2':

+-------------------------------------------------------------------------------------------------+
| i_id.name | ab1.name | b1.name | ab2.name | a1.name | ab3.name | b2.name | ab4.name | t_id.name |
+-------------------------------------------------------------------------------------------------+
| "a2"      | "ab3"    | "b1"    | "ab1"    | "a1"    | "ab2"    | "b2"    | "ab4"    | "a3"      |
| "a2"      | "ab5"    | "b3"    | "ab6"    | "a3"    | "ab4"    | "b2"    | "ab2"    | "a1"      |
+-------------------------------------------------------------------------------------------------+

Notice how the first path contains ab4.name = "ab4", which is also found on the second path as ab3.name. Conversely, "ab2" is found on the second path as ab4.name and on the first path as ab3.name. In our application we want these two to 'cancel out' so that the query returns no matches for a2.

So finally, my question: How would you approach doing this in Cypher? Multiple queries is OK as long as they execute quickly :-) I'm brand new to Cypher, but some of the things I thought might be useful are (straw-clutching, here :-)

  • comparing paths as collections (something like WHERE ab4.name NOT IN ...?)
  • labeling/adding properties to items indicate the i_id and path they're located at?
  • FOR EACH?
  • UNWIND?
  • GROUP BY?

We'd like to do as much in Cypher as possible, but if the answer is "You can't do that," then we'll pull the above candidate results into memory and finish processing there. Thanks very much!

So I've worked up a solution using your second suggestion, which is to add properties to the relationships to indicate if they are on two or more pathways.

First, create a traversed property on each AB relationship and set it to 0 :

MATCH ()-[ab:AB]-()
SET ab.traversed = 0

Now I'm going to use the a2 as the starting node for an example. This query finds all pathways from a2 to another node with label A that is four steps long. The traversed property of each of the relationships is set to the count of times that relationship was encountered on a pathway.

MATCH p = (a2:A {name:'a2'})-[:AB*4]-(:A)
UNWIND RELATIONSHIPS(p) AS r
WITH r, COUNT(*) AS times_traversed
SET r.traversed = times_traversed
RETURN r.name, r.traversed
ORDER BY r.name

And we get the following output:

在此处输入图片说明

As you explain in your example, ab2 and ab4 are on both pathways and so their traversed property is 2 .

With these properties set on each relationship, you can filter the pathways to only the pathways whose sum of the traversed properties is equal to the path length, which is 4 in your case.

MATCH p = (a2:A {name:'a2'})-[:AB*4]-(:A)
WHERE REDUCE(traversal = 0, r IN RELATIONSHIPS(p) | traversal + r.traversed) = LENGTH(p)
RETURN p

This returns no paths, since the sum of the traversed properties is 6 for both pathways, and not the required 4 .

But like I said, this is super inelegant and there is probably a better way to do this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM