如何编写查询以比较不同路径上的节点和边？

Question

We are new to Neo4j (and excited!) and I'm trying to apply Cypher to our problem. 我们是Neo4j的新手（很兴奋！），我正在尝试将Cypher应用到我们的问题中。 We have a query that matches paths but which needs to remove paths that involve any nodes or edges that were traversed on any other paths that originated from the same node or edge. 我们有一个与路径匹配的查询，但需要删除涉及从相同节点或边缘起源的任何其他路径上遍历的任何节点或边缘的路径。 Here's a test case: 这是一个测试用例：

CREATE (a1:A {name: 'a1'})-[ab1:AB {name: 'ab1'}]->(b1:B {name: 'b1'}),
       (a1)-[ab2:AB {name: 'ab2'}]->(b2:B {name: 'b2'}),
       (a2:A {name: 'a2'})-[ab3:AB {name: 'ab3'}]->(b1),
       (a2)-[ab5:AB {name: 'ab5'}]->(b3:B {name: 'b3'}),
       (a3:A {name: 'a3'})-[ab4:AB {name: 'ab4'}]->(b2),
       (a3)-[ab6:AB {name: 'ab6'}]->(b3),
       (a4:A {name: 'a4'})-[ab7:AB {name: 'ab7'}]->(b3);

Formatted for readability: 格式化以提高可读性：

a1-[ab1]->b1
a1-[ab2]->b2
a2-[ab3]->b1
a2-[ab5]->b3
a3-[ab4]->b2
a3-[ab6]->b3
a4-[ab7]->b3

We want to find these paths: [A, AB, B, AB, A, AB, B, AB, A] (four steps). 我们想找到以下路径：[A，AB，B，AB，A，AB，B，AB，A]（四个步骤）。 (Note: we don't care about edge directionality.) Here's my first try (our terminology: i_id = 'initial' and t_id = 'terminal'). （注意：我们不在乎边缘方向性。）这是我的第一次尝试（我们的术语：i_id ='initial'和t_id ='terminal'）。

MATCH p = (i_id:A)-[ab1:AB]-(b1:B)-[ab2:AB]-(a1:A)-[ab3:AB]-(b2:B)-[ab4:AB]-(t_id:A)
RETURN i_id.name, ab1.name, b1.name, ab2.name, a1.name, ab3.name, b2.name, ab4.name, t_id.name
ORDER BY i_id.name;

The result is reasonable, given Cypher's Uniqueness feature: 鉴于Cypher的“ 唯一性”功能，结果是合理的：

+-------------------------------------------------------------------------------------------------+
| i_id.name | ab1.name | b1.name | ab2.name | a1.name | ab3.name | b2.name | ab4.name | t_id.name |
+-------------------------------------------------------------------------------------------------+
| "a1"      | "ab1"    | "b1"    | "ab3"    | "a2"    | "ab5"    | "b3"    | "ab6"    | "a3"      |
| "a1"      | "ab1"    | "b1"    | "ab3"    | "a2"    | "ab5"    | "b3"    | "ab7"    | "a4"      |
| "a1"      | "ab2"    | "b2"    | "ab4"    | "a3"    | "ab6"    | "b3"    | "ab5"    | "a2"      |
| "a1"      | "ab2"    | "b2"    | "ab4"    | "a3"    | "ab6"    | "b3"    | "ab7"    | "a4"      |
| "a2"      | "ab3"    | "b1"    | "ab1"    | "a1"    | "ab2"    | "b2"    | "ab4"    | "a3"      |
| "a2"      | "ab5"    | "b3"    | "ab6"    | "a3"    | "ab4"    | "b2"    | "ab2"    | "a1"      |
| "a3"      | "ab4"    | "b2"    | "ab2"    | "a1"    | "ab1"    | "b1"    | "ab3"    | "a2"      |
| "a3"      | "ab6"    | "b3"    | "ab5"    | "a2"    | "ab3"    | "b1"    | "ab1"    | "a1"      |
| "a4"      | "ab7"    | "b3"    | "ab5"    | "a2"    | "ab3"    | "b1"    | "ab1"    | "a1"      |
| "a4"      | "ab7"    | "b3"    | "ab6"    | "a3"    | "ab4"    | "b2"    | "ab2"    | "a1"      |
+-------------------------------------------------------------------------------------------------+

However, we want additional filtering. 但是，我们需要其他过滤。 Consider WHERE i_id.name = 'a2': 考虑在哪里i_id.name ='a2'：

+-------------------------------------------------------------------------------------------------+
| i_id.name | ab1.name | b1.name | ab2.name | a1.name | ab3.name | b2.name | ab4.name | t_id.name |
+-------------------------------------------------------------------------------------------------+
| "a2"      | "ab3"    | "b1"    | "ab1"    | "a1"    | "ab2"    | "b2"    | "ab4"    | "a3"      |
| "a2"      | "ab5"    | "b3"    | "ab6"    | "a3"    | "ab4"    | "b2"    | "ab2"    | "a1"      |
+-------------------------------------------------------------------------------------------------+

Notice how the first path contains ab4.name = "ab4", which is also found on the second path as ab3.name. 注意第一个路径如何包含ab4.name =“ ab4”，在第二个路径中也可以找到ab3.name。 Conversely, "ab2" is found on the second path as ab4.name and on the first path as ab3.name. 相反，在第二条路径上以“ ab4.name”找到“ ab2”，在第一条路径上以“ ab3.name”找到。 In our application we want these two to 'cancel out' so that the query returns no matches for a2. 在我们的应用程序中，我们希望这两个“取消”，以便查询不返回与a2相匹配的内容。

So finally, my question: How would you approach doing this in Cypher? 所以最后，我的问题是：您将如何在Cypher中进行此操作？ Multiple queries is OK as long as they execute quickly :-) I'm brand new to Cypher, but some of the things I thought might be useful are (straw-clutching, here :-) 只要可以快速执行多个查询就可以了:-)我对Cypher是全新的，但是我认为可能有用的一些事情是（稻草抓紧，这里:-)

comparing paths as collections (something like WHERE ab4.name NOT IN ...?) 将路径作为集合进行比较（类似WHERE ab4.name NOT IN ...？）
labeling/adding properties to items indicate the i_id and path they're located at? 为项目添加标签/添加属性指示它们所在的i_id和路径？
FOR EACH? 每次？
UNWIND? 放松？
GROUP BY? 通过...分组？

We'd like to do as much in Cypher as possible, but if the answer is "You can't do that," then we'll pull the above candidate results into memory and finish processing there. 我们想在Cypher中做更多的事情，但是如果答案是“您做不到”，那么我们将上面的候选结果存入内存并完成处理。 Thanks very much! 非常感谢！

Answer 1

So I've worked up a solution using your second suggestion, which is to add properties to the relationships to indicate if they are on two or more pathways. 因此，我根据您的第二个建议制定了一个解决方案，即向关系添加属性以指示它们是否在两个或多个路径上。

First, create a traversed property on each AB relationship and set it to 0 : 首先，在每个AB关系上创建一个traversed属性并将其设置为0 ：

MATCH ()-[ab:AB]-()
SET ab.traversed = 0

Now I'm going to use the a2 as the starting node for an example. 现在，我将使用a2作为示例的起始节点。 This query finds all pathways from a2 to another node with label A that is four steps long. 此查询查找从a2到另一个标签A节点的所有路径，该节点的长度为四步。 The traversed property of each of the relationships is set to the count of times that relationship was encountered on a pathway. 每个关系的traversed属性设置为在路径上遇到该关系的次数。

MATCH p = (a2:A {name:'a2'})-[:AB*4]-(:A)
UNWIND RELATIONSHIPS(p) AS r
WITH r, COUNT(*) AS times_traversed
SET r.traversed = times_traversed
RETURN r.name, r.traversed
ORDER BY r.name

And we get the following output: 我们得到以下输出：

在此处输入图片说明

As you explain in your example, ab2 and ab4 are on both pathways and so their traversed property is 2 . 正如您在示例中所解释的， ab2和ab4都在两条路径上，因此它们的traversed属性为2 。

With these properties set on each relationship, you can filter the pathways to only the pathways whose sum of the traversed properties is equal to the path length, which is 4 in your case. 通过在每个关系上设置这些属性，可以将路径过滤为仅其traversed属性之和等于路径长度（在您的情况下为4）的路径。

MATCH p = (a2:A {name:'a2'})-[:AB*4]-(:A)
WHERE REDUCE(traversal = 0, r IN RELATIONSHIPS(p) | traversal + r.traversed) = LENGTH(p)
RETURN p

This returns no paths, since the sum of the traversed properties is 6 for both pathways, and not the required 4 . 这不会返回任何路径，因为两个路径的traversed属性之和均为6 ，而不是必需的4 。

But like I said, this is super inelegant and there is probably a better way to do this. 但是，就像我说的那样，这太过优雅了，也许有更好的方法可以做到这一点。

如何编写查询以比较不同路径上的节点和边？

问题描述

1 个解决方案

解决方案1
2 2014-06-17 20:42:16

如何编写查询以比较不同路径上的节点和边？

问题描述

1 个解决方案

解决方案1 2 2014-06-17 20:42:16

解决方案1
2 2014-06-17 20:42:16