简体繁体 English

Neo4j设计：财产与“节点与关系”

[英]Neo4j Design: Property vs “Node & Relationship”

原文 2013-03-18 09:06:45 9 3 graph/ nosql/ neo4j/ graph-databases

I have a node type that has a string property that will have the same value really often. 我有一个节点类型，其字符串属性通常具有相同的值。 Etc. Millions of nodes with only 5 options of that string value. 等等。数百万个节点只有该字符串值的5个选项。 I will be doing searches by that property. 我将通过该属性进行搜索。

My question would be what is better in terms of performance and memory: a) Implement it as a node property and have lots of duplicates (and search using WHERE). 我的问题是在性能和内存方面更好：a）将其实现为节点属性并具有大量重复（并使用WHERE进行搜索）。 b) Implement it as 5 additional nodes, where all original nodes reference one of them (and search using additional MATCH). b）将其实现为5个额外节点，其中所有原始节点引用其中一个节点（并使用额外的MATCH进行搜索）。

3 个解决方案

Without knowing further details it's hard to give a general purpose answer. 在不了解更多细节的情况下，很难给出通用答案。

From a performance perspective it's better to limit the search as early as possible. 从性能角度来看，最好尽早限制搜索。 Even more beneficial if you do not have to look into properties for a traversal. 如果您不必查看遍历的属性，那将更加有益。

Given that I assume it's better to move the lookup property into a seperate node and use the value as relationship type. 鉴于我认为最好将lookup属性移动到单独的节点并将该值用作关系类型。

Use labels ; 使用标签 ; this blog post is a good intro to this new Neo4j 2.0 feature: 这篇博文是这个新的Neo4j 2.0功能的一个很好的介绍：

Labels and Schema Indexes in Neo4j Neo4j中的标签和模式索引

I've thought about this problem a little as well. 我也想过这个问题。 In my case, I had to represent state: 就我而言，我必须代表州：

STARTED 已启动
IN_PROGRESS 进行中
SUBMITTED 提交
COMPLETED 已完成

Overall the Node + Relationship approach looks more appealing in that only a single relationship reference needs to be maintained each time rather than a property string and you don't need to scan an extra additional index which has to be maintained on the property (memory and performance would intuitively be in favor of this approach). 总体而言，Node + Relationship方法看起来更具吸引力，因为每次只需要维护一个关系引用而不是属性字符串，并且您不需要扫描必须在属性上维护的额外附加索引（内存和表现将直观地支持这种方法）。

Another advantage is that it easily supports the ability of a node being linked to multiple "special nodes". 另一个优点是它很容易支持节点链接到多个“特殊节点”的能力。 If you foresee a situation where this should be possible in your model, this is better than having to use a property array (and searching using "in"). 如果您预见到模型中应该可以实现这种情况，那么这比使用属性数组（以及使用“in”进行搜索）更好。

In practice I found that the problem then became, how do you access these special nodes each time. 在实践中，我发现问题变成了，每次如何访问这些特殊节点。 Either you maintain some sort of constants reference where you have the node ID of these special nodes where you can jump right into them in your START statement (this is what we do) or you need to do a search against property of the special node each time (name, perhaps) and then traverse down it's relationships. 您可以维护某种常量引用，其中您拥有这些特殊节点的节点ID，您可以在START语句中直接跳转到这些节点ID （这是我们的工作），或者您需要针对特殊节点的属性进行搜索时间（也许是名字），然后遍历它的关系。 This doesn't make for the prettiest of cypher queries. 这并不适用于最漂亮的密码查询。