简体   繁体   English

Neo4j的表现

[英]Performance in Neo4j

I have a database with 2.217.731 nodes and 3.127.475 relationships, where nodes are different equipment and relationships between them are like "CONNECTED_TO", "IS_INSIDE", etc. 我有一个2.217.731节点和3.127.475关系的数据库,其中节点是不同的设备,它们之间的关系就像“CONNECTED_TO”,“IS_INSIDE”等。

I am trying to traverse the graph to find specific nodes. 我试图遍历图表以查找特定节点。 In Cypher it would look like 在Cypher看起来像

    MATCH (n:Equipment)<-[IS_INSIDE*]-()<-[CONNECTED_TO*]-(m:Cable) where n.name = "name" RETURN m

using Java Core API, which as I know should be the fastest way to query Neo4j and take seconds, however it runs for tens of minutes. 使用Java Core API,我知道这应该是查询Neo4j并花费几秒钟的最快方法,但它运行了几十分钟。

I am using neo4j-2.0.0 and java version "1.7.0_45", max Java Heap size 7 gigs 我正在使用neo4j-2.0.0和java版本“1.7.0_45”,最大Java堆大小7演出

Neo4j properties: Neo4j属性:

    Map<String, String> config = new HashMap<>();

    config.put( "neostore.nodestore.db.mapped_memory", "1800M" );
    config.put( "neostore.relationshipstore.db.mapped_memory", "3G" );
    config.put( "neostore.propertystore.db.mapped_memory", "100M" );
    config.put( "neostore.propertystore.db.strings.mapped_memory", "150M" );
    config.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );

    inserter = BatchInserters.inserter("target/graphDb", config);

I am new in Neo4j and do not know how to tune it to achieve better performance. 我是Neo4j的新手,不知道如何调整它以获得更好的性能。

If you have to traverse the whole graph then this will be slow. 如果你必须遍历整个图表,那么这将是缓慢的。 If this is a common query consider creating an index on Equiptment.name, which is possible in neo4j 2.0.0 milestone. 如果这是一个常见的查询,请考虑在Equiptment.name上创建索引,这在neo4j 2.0.0里程碑中是可能的。 it will then just look up matching names in the index (a hashtable basically), and then check for the pattern around matching nodes - this will be very fast. 然后它将只查找索引中的匹配名称(基本上是哈希表),然后检查匹配节点周围的模式 - 这将非常快。 See http://blog.neo4j.org/2013/12/neo4j-20-ga-graphs-for-everyone.html http://blog.neo4j.org/2013/12/neo4j-20-ga-graphs-for-everyone.html

Please create an index on the equipment node's property for name. 请在设备节点的属性上为名称创建索引。

CREATE INDEX ON :Equipment(name)

Then please try the following optimized query. 然后请尝试以下优化查询。

MATCH (n:Equipment { name: "name" }),
      (n)<-[IS_INSIDE*]-(x),
      (x)<-[CONNECTED_TO*]-(m:Cable)
RETURN m

Note that this is an equivalent match to the one you've specified but it chunks it up into triples which causes the query execution plan on Neo4j to first match the n:Equipment node on the property name, instead of doing a graph global match operation. 请注意,这与您指定的等效匹配,但它会将其组合成三元组,这会导致Neo4j上的查询执行计划首先匹配属性名称上的n:Equipment节点,而不是执行图形全局匹配操作。 From the reduced set of n:Equipment nodes the following match statements will more performantly scan the variable length patterns of IS_INSIDE and CONNECTED_TO . 从减少的n:Equipment节点集中,以下匹配语句将更加IS_INSIDE扫描IS_INSIDECONNECTED_TO的可变长度模式。

The first thing is to realize that in GraphDB the performance majorly depends on the kind of model you have built and the Cardinality of the various nodes ( Equipment and Cables in your case). 首先要意识到在GraphDB中,性能主要取决于您构建的模型类型和各种节点的基数(在您的情况下为设备和电缆)。 Often using PROFILE and EXPLAIN queries will lead to informative insights regarding the performance of your query in terms of the number of database hits it does and the time it takes. 通常使用PROFILE和EXPLAIN查询将获得有关查询性能的信息性见解,包括数据库命中数和所需的时间。 Choosing a query based on lower number of DB hits is advantageous. 基于较低数量的DB命中选择查询是有利的。

With that, Let us look at the query you are using first: 有了它,让我们先看看你正在使用的查询:

MATCH (n:Equipment)<-[IS_INSIDE*]-()<-[CONNECTED_TO*]-(m:Cable) 
where n.name = "name" 
RETURN m

A couple of pointers on this would be: 关于这一点的几点指示是:

1) When you are trying to find Equipment nodes which are Inside a particular Equipment node, you do not mention the nodes label. 1)当您尝试查找特定设备节点内的设备节点时,您不提及节点标签。 Try using: 尝试使用:

MATCH (n:Equipment)<-[IS_INSIDE*]-(:Equipment)

Instead of 代替

MATCH (n:Equipment)<-[IS_INSIDE*]-()

Since in your case you search for both your Equipment with Name "name" in both Equipment and Cable nodes. 因为在您的情况下,您在设备和电缆节点中搜索名称为“名称”的设备。 Using the alternative that i mentioned, it will be restricted to Cable nodes. 使用我提到的替代方案,它将仅限于Cable节点。 Assuming that Equipment can't be inside cables. 假设设备不能在电缆内部。

2) As others Mentioned building indices on top of Equipment And Cable nodes will be helpful. 2)与其他人一样,在设备和电缆节点之上提到建筑物指数将会有所帮助。 Build an index on the Equipment.name property. 在Equipment.name属性上构建索引。 You can build indices on all of your Cable properties, which may further improve performance. 您可以在所有Cable属性上构建索引,这可以进一步提高性能。

Can you also share how many Equipment nodes are there and how many Cable nodes are there. 您还可以分享有多少个设备节点以及有多少个有线节点。 Also i am assuming that you have already ensured that the Equipment nodes and Cable nodes are distinct. 此外,我假设您已经确保设备节点和电缆节点是不同的。 It is okay to have more relationships, but our model can often benefit from fewer nodes to match. 拥有更多关系是可以的,但我们的模型通常可以从更少的节点中获益。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM