Cypher Neo4j-在集合上使用子句“ IN”的查询非常慢

Question

Hi i'm trying to import some data from CSV files in Neo4j 2.3.1 . 嗨，我正在尝试从Neo4j 2.3.1中的 CSV文件导入一些数据。 I've already imported some nodes of type :Author and :Article . 我已经导入了：Author和：Article类型的一些节点。

The Author node is composed of properties like: 作者节点由以下属性组成：

key -> String 键 ->字符串
principal_name -> String 委托人名称 ->字符串
alias -> Collection of String 别名 ->字符串集合
........ ........

I've also added index on principal_name, alias and key. 我还添加了有关principal_name，别名和键的索引。

The problem comes when I try to import the relationships between nodes of type Article and Author. 当我尝试导入Article和Author类型的节点之间的关系时，就会出现问题。

The CSV has this type of structure: CSV具有以下类型的结构：

articleKey,authorName

Has a naive solution i've tried to create the relationship using a query like this one: 有一个天真的解决方案，我试图使用像这样的查询来创建关系：

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///myPath.csv" AS line
MATCH (art:Article{key: line.key1})
MATCH (auth:Author) WHERE line.key2 IN (auth.alias)
CREATE UNIQUE (auth)-[:AUTHOR_OF]->(art);

The query is painfully slow because the second MATCH is really slow as i discovered using the profiler. 查询非常缓慢，因为正如我使用探查器发现的那样，第二个MATCH确实非常慢。 It takes 10-12 seconds to create every relation because i've many Authors in the db(around 1000000). 创建每个关系需要10-12秒 ，因为我在db中有很多Authors（大约1000000）。

So i'm looking for a way to execute a query like this one to get a faster execution(is an example to illustrate the structure that i want to obtain): 因此，我正在寻找一种执行这样的查询的方法以加快执行速度（这是一个示例，说明了我想要获得的结构）：

MATCH (auth:Author{principal_name: line.key2})
IF auth null THEN
  MATCH (auth:Author) WHERE line.key2 IN (auth.alias)
END

There is a way to do that with Cypher ? 有没有办法用Cypher做到这一点？

Answer 1

If you changed your model so that all of an Author node's names (both the principal name and all the aliases) are all in separate Name nodes, like this: 如果更改了模型，以使所有Author节点的名称（主体名称和所有别名）都位于单独的Name节点中，如下所示：

(auth:Author)-[:HAS_NAME]->(name:Name {name: 'Fred McGillicutty'})

Then the query would be simply: 那么查询将很简单：

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///myPath.csv" AS line
MATCH
  (art:Article { key: line.key1 }),
  (auth:Author)-[:HAS_NAME]->(name:Name { name:line.key2 })
CREATE (auth)-[:AUTHOR_OF]->(art);

If you create indexes on :Article(key) , and :Name(name) , this query should be very efficient. 如果您在:Article(key)和:Name(name)上创建索引，则此查询应该非常有效。

Answer 2

If many authors have aliases and if you expect to query on these aliases you should model them as nodes. 如果许多作者都有别名，并且您希望查询这些别名，则应将其建模为节点。 I think this will speed up queries for creating relationships and allows for more flexible queries involving aliases. 我认为这将加快创建关系的查询，并允许涉及别名的更灵活的查询。

(:Alias)<-[:HAS]-(:Author)-[:AUTHOR_OF]->(:Article)

Add indexes on all nodes. 在所有节点上添加索引。 If possible use uniqueness constraints . 如果可能，请使用唯一性约束。

You can now query for Alias and Author nodes to add relationships: 现在，您可以查询Alias和Author节点以添加关系：

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///myPath.csv" AS line
MATCH (art:Article {key: line.key1})
// get the Author directly or by alias
MATCH (alias:Alias)<-[:HAS]-(auth:Author)
WHERE alias.principal_name = line.key2 OR auth.principal_name = line.key2
CREATE (auth)-[:AUTHOR_OF]->(art)

With indexes the lookups should be pretty fast. 使用索引，查找应该非常快。

Cypher Neo4j-在集合上使用子句“ IN”的查询非常慢

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-12-03 18:23:50

解决方案2
0 2015-12-03 16:05:25

Cypher Neo4j-在集合上使用子句“ IN”的查询非常慢

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-12-03 18:23:50

解决方案2 0 2015-12-03 16:05:25

解决方案1
1 已采纳 2015-12-03 18:23:50

解决方案2
0 2015-12-03 16:05:25