简体   繁体   English

如何在Neo4j / Cypher查询中查找不同的节点

[英]How to find distinct nodes in a Neo4j/Cypher query

I'm trying to do some pattern matching in neo4j/cypher and I came across this issue: 我正在尝试在neo4j / cypher中进行一些模式匹配,但遇到了这个问题:

There are two types of graphs I want to search for: 我要搜索两种类型的图:

  1. Star graphs: A graph with one center node and multiple outgoing relationships. 星图:具有一个中心节点和多个外向关系的图。

  2. n-length line graphs: A line graph with length n where none of the nodes are repeats (I have some bidirectional edges and cycles in my graph) n长度线图:长度为n的线图,其中没有节点是重复的(我的图中有一些双向边和周期)

So the main problem is that when I do something such as: 所以主要的问题是当我做诸如此类的事情时:

  1. MATCH a-->b, a-->c, a-->d 匹配a-> b,a-> c,a-> d
  2. MATCH a-->b-->c-->d 匹配a-> b-> c-> d

Cypher doesn't guarantee (when I tried it) that a, b, c, and d are all different nodes. Cypher不保证(当我尝试过)a,b,c和d都是不同的节点。 For small graphs, this can easily be fixed with 对于小图,可以很容易地用

WHERE not(a=b) AND not(a=c) AND ... 不是(a = b)且不是(a = c)且...

But I'm trying to have graphs of size 10+, so checking equality between all nodes isn't a viable option. 但是我正在尝试使用大小为10+的图形,因此检查所有节点之间的相等性不是一个可行的选择。 Afaik, RETURN DISTINCT does not work as well since it doesn't check equality among variables, only across different rows. Afaik,RETURN DISTINCT效果不佳,因为它不检查变量之间的相等性,仅检查不同行之间的相等性。 Is there any simple way I can specify the query to make the differently named nodes distinct? 有什么简单的方法可以指定查询以区分名称不同的节点?

Old question, but look to APOC Path Expander procedures for how to address these kinds of use cases, as you can change the traversal uniqueness behavior for expansion (the same way you can when using the traversal API...which these procedures use). 这是个老问题,但是请参阅APOC路径扩展器过程以解决这些用例,因为您可以更改扩展的遍历唯一性行为(使用遍历API的方式与这些过程所使用的方式相同)。

Cypher implicitly uses RELATIONSHIP_PATH uniqueness, meaning that per path returned, a relationship must be unique, it cannot be used multiple times in a single path. Cypher隐式使用RELATIONSHIP_PATH唯一性,这意味着对于返回的每个路径,关系必须是唯一的,不能在单个路径中多次使用它。

While this is good for queries where you need all possible paths, it's not a good fit for queries where you want distinct nodes or a subgraph or to prevent repeating nodes in a path. 虽然这对于需要所有可能路径的查询很有用,但对于希望使用不同的节点或子图或防止在路径中重复节点的查询而言,这不是一个很好的选择。

For an n-length path, let's say depth 6 with only outgoing relationships of any type, we can change the uniqueness to NODE_PATH, where a node must be unique per path, no repeats in a path: 对于n长度的路径,假设深度6仅具有任何类型的传出关系,我们可以将唯一性更改为NODE_PATH,其中节点在每个路径上必须是唯一的,路径中不得重复:

MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.expandConfig(n, {maxLevel:6, uniqueness:'NODE_PATH'}) YIELD path
RETURN path

If you want all reachable nodes up to a certain depth (or at any depth by omitting maxLevel), you can use NODE_GLOBAL uniqueness, or instead just use apoc.path.subgraphNodes() : 如果您希望所有可到达的节点达到一定深度(或通过省略maxLevel达到任何深度),则可以使用NODE_GLOBAL唯一性,或者仅使用apoc.path.subgraphNodes()

MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphNodes(n, {maxLevel:6}) YIELD node
RETURN node

NODE_GLOBAL uniqueness means that across all paths that a node must be unique, it will only be visited once, and there will only be one path to a node from a given start node. NODE_GLOBAL的唯一性意味着, 在所有路径上 ,一个节点必须是唯一的,它只会被访问一次,并且从给定的起始节点到该节点的路径只有一条。 This keeps the number of paths that need to be evaluated down significantly, but because of this behavior not all relationships will be traversed, if they expand to a node already visited. 这样可以大大减少需要评估的路径数量,但是由于这种行为,如果将所有关系扩展到已访问的节点,则不会遍历所有关系。

You will not get relationships back with this procedure (you can use apoc.path.spanningTree() for that, although as previously mentioned not all relationships will be included, as we will only capture a single path to each node, not all possible paths to nodes). 您不会通过此过程恢复关系(可以使用apoc.path.spanningTree() ,尽管如前所述,并非所有关系都将包括在内,因为我们将仅捕获到每个节点的单个路径,而不是所有可能的路径到节点)。 If you want all nodes up to a max level and all possible relationships between those nodes, then use apoc.path.subgraphAll() : 如果要使所有节点达到最大级别以及这些节点之间的所有可能关系,请使用apoc.path.subgraphAll()

MATCH (n)
WHERE id(n) = 12345
CALL apoc.path.subgraphAll(n, {maxLevel:6}) YIELD nodes, relationships
RETURN nodes, relationships

Richer options exist for label and relationship filtering, or filtering (whitelist, blacklist, endnode, terminator node) based on lists of pre-matched nodes. 存在用于标签和关系过滤或基于预匹配节点列表的过滤(白名单,黑名单,端节点,终结者节点)的更丰富的选项。

We also support repeating sequences of relationships or node labels. 我们还支持重复关系或节点标签的序列。

If you need filtering by node or relationship properties during expansion, then this won't be a good option as that feature is yet supported. 如果您需要在扩展过程中按节点或关系属性进行过滤,那么这不是一个好选择,因为该功能尚受支持。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM