简体   繁体   中英

Are mixed type indexes acceptable in Neo4j?

I have a data set which includes a number of nodes, all of which labeled claim , which can have various properties (names P1 , P2 , etc., through P2000 ). Currently, each of the claim nodes can have only one of these properties, and each property has value, which can be of different types (ie P1 may be string, P2 may be float, P3 integer, etc.). I also need to be able to look up the nodes by any property (ie "find all nodes with P3 which equals to 42" ).

I have modeled it as nodes having property value and label according to the P property. Then I define schema index on label claim and property value . The lookup then would look something like:

MATCH (n:P569:claim) WHERE n.value = 42 RETURN n

My first question is - is this OK to have such index? Are mixed type indexes allowed?

The second question is that the lookup above works (though I'm not sure whether it uses index or not), but this doesn't - note the label order is switched:

neo4j-sh (?)$ MATCH (n:claim:P569) WHERE n.value>0 RETURN n; 
IncomparableValuesException: Don't know how to compare that. Left: "113" (String); Right: 0 (Long)

P569 properties are all numeric, but there are string properties from other P-values one of which is "113". Somehow, even though I said the label should be both claim and P569, the "113" value is still included in the comparison, even though it has no P569 label:

neo4j-sh (?)$ MATCH (n:claim) WHERE n.value ="113" RETURN LABELS(n);
+-------------------+
| LABELS(n)         |
+-------------------+
| ["claim","P1036"] |
| ["claim","P902"]  |
+-------------------+

What is wrong here - why it works with one label order but not another? Can this data model be improved?

Let me at least try to side-step your question, there's another way you could model this that would resolve at least some of your problems.

You're encoding the property name as a label. Perhaps you want to do that to speed up looking up a subset of nodes where that property applies; still it seems like you're causing a lot of difficulty by shoe-horning incomparable data values all into the same property named "value".

What if, in addition to using these labels, each property was named the same as the value? Ie:

CREATE (n:P569:claim { P569: 42});

You still get your label lookups, but by segregating the property names, you can guarantee that the query planner will never accidentally compare incomparable values in the way it builds an execution plan. Your query for this node would then be:

MATCH (n:P569:claim) WHERE n.P569 > 5 AND n.P569 < 40 RETURN n;

Note that if you know the right label to use, then you're guaranteed to know the right property name to use. By using properties of different names, if you're logging your data in such a way that P569's are always integers, you can't end up with that incomparable situation you have. (I think that's happening because of the particular way cypher is executing that query)

A possible downside here is that if you have to index all of those properties, it could be a lot of indexes, but still might be something to consider.

I think it makes sense to take a step back and think what you actually want to achieve, and why you have those 2000 properties in the first place and how you could model them differently in a graph?

Also make sure to just leave off properties you don't need and use coalesce() to provide the default.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM