简体   繁体   English

Neo4j是否接受混合类型索引?

[英]Are mixed type indexes acceptable in Neo4j?

I have a data set which includes a number of nodes, all of which labeled claim , which can have various properties (names P1 , P2 , etc., through P2000 ). 我有一个包含多个节点的数据集,所有节点都标记为claim ,这些节点可以具有各种属性(名称为P1P2等,直到P2000 )。 Currently, each of the claim nodes can have only one of these properties, and each property has value, which can be of different types (ie P1 may be string, P2 may be float, P3 integer, etc.). 当前,每个claim节点只能具有这些属性中的一个,并且每个属性具有值,该值可以具有不同的类型(即P1可以是字符串, P2可以是浮点型, P3整数等)。 I also need to be able to look up the nodes by any property (ie "find all nodes with P3 which equals to 42" ). 我还需要能够通过任何属性"find all nodes with P3 which equals to 42" (即"find all nodes with P3 which equals to 42" )。

I have modeled it as nodes having property value and label according to the P property. 我已将其建模为具有属性value并根据P属性标记的节点。 Then I define schema index on label claim and property value . 然后,我在标签claim和属性value上定义架构索引。 The lookup then would look something like: 然后查找将类似于:

MATCH (n:P569:claim) WHERE n.value = 42 RETURN n

My first question is - is this OK to have such index? 我的第一个问题是-这样的索引可以吗? Are mixed type indexes allowed? 是否允许混合类型索引?

The second question is that the lookup above works (though I'm not sure whether it uses index or not), but this doesn't - note the label order is switched: 第二个问题是上面的查找有效(尽管我不确定它是否使用索引),但这没有-注意标签顺序已切换:

neo4j-sh (?)$ MATCH (n:claim:P569) WHERE n.value>0 RETURN n; 
IncomparableValuesException: Don't know how to compare that. Left: "113" (String); Right: 0 (Long)

P569 properties are all numeric, but there are string properties from other P-values one of which is "113". P569属性都是数字,但是还有其他P值的字符串属性,其中之一是“ 113”。 Somehow, even though I said the label should be both claim and P569, the "113" value is still included in the comparison, even though it has no P569 label: 以某种方式,即使我说标签应同时是声明和P569,即使没有P569标签,“ 113”值仍包含在比较中:

neo4j-sh (?)$ MATCH (n:claim) WHERE n.value ="113" RETURN LABELS(n);
+-------------------+
| LABELS(n)         |
+-------------------+
| ["claim","P1036"] |
| ["claim","P902"]  |
+-------------------+

What is wrong here - why it works with one label order but not another? 这有什么问题-为什么它适用于一个标签订单而不适用于另一个标签订单? Can this data model be improved? 可以改进此数据模型吗?

Let me at least try to side-step your question, there's another way you could model this that would resolve at least some of your problems. 让我至少尝试回避您的问题,还有另一种方法可以对此建模,从而解决至少一些问题。

You're encoding the property name as a label. 您正在将属性名称编码为标签。 Perhaps you want to do that to speed up looking up a subset of nodes where that property applies; 也许您想这样做,以加快查找该属性适用的节点子集的速度。 still it seems like you're causing a lot of difficulty by shoe-horning incomparable data values all into the same property named "value". 似乎您仍然无法通过将无与伦比的数据值都塞入名为“ value”的同一属性中而造成很多困难。

What if, in addition to using these labels, each property was named the same as the value? 如果除了使用这些标签之外,每个属性都被命名为与值相同怎么办? Ie: 即:

CREATE (n:P569:claim { P569: 42});

You still get your label lookups, but by segregating the property names, you can guarantee that the query planner will never accidentally compare incomparable values in the way it builds an execution plan. 您仍然可以进行标签查找,但是通过分隔属性名称,可以保证查询计划程序不会在构建执行计划的过程中意外地比较不可比的值。 Your query for this node would then be: 您对此节点的查询将是:

MATCH (n:P569:claim) WHERE n.P569 > 5 AND n.P569 < 40 RETURN n;

Note that if you know the right label to use, then you're guaranteed to know the right property name to use. 请注意,如果您知道要使用的正确标签,那么可以保证知道要使用的正确属性名称。 By using properties of different names, if you're logging your data in such a way that P569's are always integers, you can't end up with that incomparable situation you have. 通过使用不同名称的属性,如果您以P569始终为整数的方式记录数据,那么您将无法摆脱这种无与伦比的局面。 (I think that's happening because of the particular way cypher is executing that query) (我认为这是由于密码执行该查询的特定方式而发生的)

A possible downside here is that if you have to index all of those properties, it could be a lot of indexes, but still might be something to consider. 这里可能的缺点是,如果必须对所有这些属性建立索引,则可能有很多索引,但仍可能需要考虑。

I think it makes sense to take a step back and think what you actually want to achieve, and why you have those 2000 properties in the first place and how you could model them differently in a graph? 我认为退后一步来思考您实际想要实现的目标以及为什么首先拥有这2000个属性以及如何在图形中以不同的方式建模它们是有意义的?

Also make sure to just leave off properties you don't need and use coalesce() to provide the default. 另外,请确保不要使用不需要的属性,并使用coalesce()提供默认值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM