neo4j cypher查询太慢了

Question

以下查询需要1.5秒到9秒，取决于{keywords}

match (pr:Property)
WHERE (pr.name in {keywords})
with pr
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 10

每个Item都连接到100个Properties 。

服务器上的示例配置文件：

neo4j-sh (?)$ profile match (pr:Property)
WHERE (pr.name in ["GREEN","SHORT","PLAIN","SHORT-SLEEVE"])
with pr
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 40;
+------------------------------------------------------------------------------------------40 rows

ColumnFilter(symKeys=["prop", "rType", "  INTERNAL_AGGREGATE58d28d0e-5727-4850-81ef-7298d63d7be8"], returnItemNames=["prop", "sum", "rType"], _rows=40, _db_hits=0)
Slice(limit="Literal(40)", _rows=40, _db_hits=0)
  EagerAggregation(keys=["Cached(prop of type Node)", "Cached(rType of type Any)"], aggregates=["(  INTERNAL_AGGREGATE58d28d0e-5727-4850-81ef-7298d63d7be8,Distinct(Count(it),it))"], _rows=40, _db_hits=0)
    Extract(symKeys=["it", "ca", "  UNNAMED122", "pr", "pr2", "  UNNAMED130", "  UNNAMED99"], exprKeys=["prop", "rType"], _rows=645685, _db_hits=645685)
      SimplePatternMatcher(g="(it)-['  UNNAMED122']-(pr2),(ca)-['  UNNAMED130']-(pr2)", _rows=645685, _db_hits=0)
        Filter(pred="hasLabel(it:Item(0))", _rows=6258, _db_hits=0)
          SimplePatternMatcher(g="(it)-['  UNNAMED99']-(pr)", _rows=6258, _db_hits=0)
            Filter(pred="any(-_-INNER-_- in Collection(List(Literal(GREEN), Literal(SHORT), Literal(PLAIN), Literal(SHORT-SLEEVE))) where Property(pr,name(1)) == -_-INNER-_-)", _rows=4, _db_hits=1210)
              NodeByLabel(identifier="pr", _db_hits=0, _rows=304, label="Property", identifiers=["pr"], producer="NodeByLabel")

neo4j版本：2.0.1

堆大小：最大3.2 GB（甚至不接近它...）

DataBase磁盘使用量：270MB

NumOfNodes：4368

NumOf Relationships：395693

计算机：AWS EC2 c3.large。 但是，尝试在4倍速的计算机上运行它，结果是相同的..

在查看JConsole时，我可以看到堆从50mb到70mb，然后由GC清理。

无论如何要让它更快？ 这种表现对我来说太慢了......

编辑：正如建议我尝试组合匹配，但它在配置文件中看到的速度较慢：

neo4j-sh（？）$ profile match（pr：Property）WHERE（pr .name in [“GREEN”，“SHORT”，“PLAIN”，“SHORT-SLEEVE”]）with pr MaTCH（pr）< - （它：项目） - >（pr2）< - [：CAT] - （ca）将不同的pr2作为prop返回，count（不同）作为sum，ca.name作为rType limit 40;

ColumnFilter(symKeys=["prop", "rType", "  INTERNAL_AGGREGATEa6eaa53b-5cf4-4823-9e4d-0d1d66120d51"], returnItemNames=["prop", "sum", "rType"], _rows=40, _db_hits=0)
Slice(limit="Literal(40)", _rows=40, _db_hits=0)
  EagerAggregation(keys=["Cached(prop of type Node)", "Cached(rType of type Any)"], aggregates=["(  INTERNAL_AGGREGATEa6eaa53b-5cf4-4823-9e4d-0d1d66120d51,Distinct(Count(it),it))"], _rows=40, _db_hits=0)
    Extract(symKeys=["  UNNAMED111", "it", "ca", "  UNNAMED119", "pr", "pr2", "  UNNAMED99"], exprKeys=["prop", "rType"], _rows=639427, _db_hits=639427)
      Filter(pred="(hasLabel(it:Item(0)) AND hasLabel(it:Item(0)))", _rows=639427, _db_hits=0)
        SimplePatternMatcher(g="(ca)-['  UNNAMED119']-(pr2),(it)-['  UNNAMED99']-(pr),(it)-['  UNNAMED111']-(pr2)", _rows=639427, _db_hits=0)
          Filter(pred="any(-_-INNER-_- in Collection(List(Literal(GREEN), Literal(SHORT), Literal(PLAIN), Literal(SHORT-SLEEVE))) where Property(pr,name(1)) == -_-INNER-_-)", _rows=4, _db_hits=1210)
            NodeByLabel(identifier="pr", _db_hits=0, _rows=304, label="Property", identifiers=["pr"], producer="NodeByLabel")

Answer 1

首先，确保对Property标签上的name属性建立索引。 据我所知，索引不与IN语句一起使用，但这应该在将来的版本中解决。 性能会很快好转。

CREATE INDEX ON :Property(name)

您可以按如下方式减少查询：

MATCH (pr:Property)
WHERE (pr.name in {keywords})
MATCH (pr)<--(it:Item)-->(pr2)<-[:CAT]-(ca)
RETURN distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
LIMIT 10

Answer 2

你可以做两个“解决方法”，直到IN为索引修复：

联盟

将它分成两个查询，

第一个使用索引查找和所有这些的联合，比如

MATCH (pr:Property {keyword:{keyword1}) return id(pr)
UNION ALL
MATCH (pr:Property {keyword:{keyword2}) return id(pr)
...

等等

然后在第二个查询中执行：

MATCH (pr) WHERE ID(pr) IN {ids}
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 10

遗产索引

为“keyword”创建node_auto_index，然后使用lucene查询语法进行初始查找。

START pr=node:node_auto_index('keyword:("GREEN" "SHORT" "PLAIN" "SHORT-SLEEVE")')
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 10

neo4j cypher查询太慢了

问题描述

2 个解决方案

解决方案1
2 2014-03-18 08:05:39

解决方案2
2 2014-03-19 09:33:49

联盟

遗产索引

neo4j cypher查询太慢了

问题描述

2 个解决方案

解决方案1 2 2014-03-18 08:05:39

解决方案2 2 2014-03-19 09:33:49

联盟

遗产索引

解决方案1
2 2014-03-18 08:05:39

解决方案2
2 2014-03-19 09:33:49