简体   繁体   English

neo4j cypher查询太慢了

[英]neo4j cypher query too slow

The following query, takes between 1.5sec to 9sec, depends on {keywords} 以下查询需要1.5秒到9秒,取决于{keywords}

match (pr:Property)
WHERE (pr.name in {keywords})
with pr
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 10

Each Item is connected to 100 Properties . 每个Item都连接到100个Properties

sample profile on the server: 服务器上的示例配置文件:

neo4j-sh (?)$ profile match (pr:Property)
WHERE (pr.name in ["GREEN","SHORT","PLAIN","SHORT-SLEEVE"])
with pr
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 40;
+------------------------------------------------------------------------------------------40 rows

ColumnFilter(symKeys=["prop", "rType", "  INTERNAL_AGGREGATE58d28d0e-5727-4850-81ef-7298d63d7be8"], returnItemNames=["prop", "sum", "rType"], _rows=40, _db_hits=0)
Slice(limit="Literal(40)", _rows=40, _db_hits=0)
  EagerAggregation(keys=["Cached(prop of type Node)", "Cached(rType of type Any)"], aggregates=["(  INTERNAL_AGGREGATE58d28d0e-5727-4850-81ef-7298d63d7be8,Distinct(Count(it),it))"], _rows=40, _db_hits=0)
    Extract(symKeys=["it", "ca", "  UNNAMED122", "pr", "pr2", "  UNNAMED130", "  UNNAMED99"], exprKeys=["prop", "rType"], _rows=645685, _db_hits=645685)
      SimplePatternMatcher(g="(it)-['  UNNAMED122']-(pr2),(ca)-['  UNNAMED130']-(pr2)", _rows=645685, _db_hits=0)
        Filter(pred="hasLabel(it:Item(0))", _rows=6258, _db_hits=0)
          SimplePatternMatcher(g="(it)-['  UNNAMED99']-(pr)", _rows=6258, _db_hits=0)
            Filter(pred="any(-_-INNER-_- in Collection(List(Literal(GREEN), Literal(SHORT), Literal(PLAIN), Literal(SHORT-SLEEVE))) where Property(pr,name(1)) == -_-INNER-_-)", _rows=4, _db_hits=1210)
              NodeByLabel(identifier="pr", _db_hits=0, _rows=304, label="Property", identifiers=["pr"], producer="NodeByLabel")

neo4j version : 2.0.1 neo4j版本:2.0.1

Heap size : 3.2 GB max (not even close to get to it..) 堆大小:最大3.2 GB(甚至不接近它...)

DataBase disk usage : 270MB DataBase磁盘使用量:270MB

NumOfNodes : 4368 NumOfNodes:4368

NumOf Relationships : 395693 NumOf Relationships:395693

Computer : AWS EC2 c3.large . 计算机:AWS EC2 c3.large。 But, tried to run it on a 4 times faster computer and the results were the same.. 但是,尝试在4倍速的计算机上运行它,结果是相同的..

When looking at the JConsole I can see that the heap goes from 50mb to 70mb and then cleaned by GC. 在查看JConsole时,我可以看到堆从50mb到70mb,然后由GC清理。

Anyway to make it faster? 无论如何要让它更快? This performance is way too slow for me... 这种表现对我来说太慢了......

EDIT: As suggested I tried combining the matches, but it is slower as you can see in the profile: 编辑:正如建议我尝试组合匹配,但它在配置文件中看到的速度较慢:

neo4j-sh (?)$ profile match (pr:Property) WHERE (pr.name in ["GREEN","SHORT","PLAIN","SHORT-SLEEVE"]) with pr MaTCH (pr) <--(it:Item)-->(pr2)<-[:CAT]-(ca) return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType limit 40; neo4j-sh(?)$ profile match(pr:Property)WHERE(pr .name in [“GREEN”,“SHORT”,“PLAIN”,“SHORT-SLEEVE”])with pr MaTCH(pr)< - (它:项目) - >(pr2)< - [:CAT] - (ca)将不同的pr2作为prop返回,count(不同)作为sum,ca.name作为rType limit 40;

ColumnFilter(symKeys=["prop", "rType", "  INTERNAL_AGGREGATEa6eaa53b-5cf4-4823-9e4d-0d1d66120d51"], returnItemNames=["prop", "sum", "rType"], _rows=40, _db_hits=0)
Slice(limit="Literal(40)", _rows=40, _db_hits=0)
  EagerAggregation(keys=["Cached(prop of type Node)", "Cached(rType of type Any)"], aggregates=["(  INTERNAL_AGGREGATEa6eaa53b-5cf4-4823-9e4d-0d1d66120d51,Distinct(Count(it),it))"], _rows=40, _db_hits=0)
    Extract(symKeys=["  UNNAMED111", "it", "ca", "  UNNAMED119", "pr", "pr2", "  UNNAMED99"], exprKeys=["prop", "rType"], _rows=639427, _db_hits=639427)
      Filter(pred="(hasLabel(it:Item(0)) AND hasLabel(it:Item(0)))", _rows=639427, _db_hits=0)
        SimplePatternMatcher(g="(ca)-['  UNNAMED119']-(pr2),(it)-['  UNNAMED99']-(pr),(it)-['  UNNAMED111']-(pr2)", _rows=639427, _db_hits=0)
          Filter(pred="any(-_-INNER-_- in Collection(List(Literal(GREEN), Literal(SHORT), Literal(PLAIN), Literal(SHORT-SLEEVE))) where Property(pr,name(1)) == -_-INNER-_-)", _rows=4, _db_hits=1210)
            NodeByLabel(identifier="pr", _db_hits=0, _rows=304, label="Property", identifiers=["pr"], producer="NodeByLabel")

First of all, make sure that the name property on the Property label is indexed. 首先,确保对Property标签上的name属性建立索引。 As far as I know, indexes aren't used with an IN statement, but this should be resolved in a future version. 据我所知,索引不与IN语句一起使用,但这应该在将来的版本中解决。 Performance will be better soon. 性能会很快好转。

CREATE INDEX ON :Property(name)

You can reduce the query as follows: 您可以按如下方式减少查询:

MATCH (pr:Property)
WHERE (pr.name in {keywords})
MATCH (pr)<--(it:Item)-->(pr2)<-[:CAT]-(ca)
RETURN distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
LIMIT 10

Two you can do as a "workaround", until IN for indexes is fixed: 你可以做两个“解决方法”,直到IN为索引修复:

UNION 联盟

split it up in two queries, 将它分成两个查询,

first one uses index lookup and a union of all these, like 第一个使用索引查找和所有这些的联合,比如

MATCH (pr:Property {keyword:{keyword1}) return id(pr)
UNION ALL
MATCH (pr:Property {keyword:{keyword2}) return id(pr)
...

etc. 等等

then in the second query do: 然后在第二个查询中执行:

MATCH (pr) WHERE ID(pr) IN {ids}
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 10

Legacy Index 遗产索引

Create a node_auto_index for "keyword" and then use lucene query syntax to do your initial lookup. 为“keyword”创建node_auto_index,然后使用lucene查询语法进行初始查找。

START pr=node:node_auto_index('keyword:("GREEN" "SHORT" "PLAIN" "SHORT-SLEEVE")')
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM