[英]How to optimize Cypher-Query with multiple OPTIONAL MATCH
I have following cypher query with multiple optional matches which can't be run on my machine anymore (Cartesian Product): 我有以下具有多个可选匹配项的密码查询,这些匹配项无法在我的计算机上运行(笛卡尔积):
match (document:Document)-[*..2]-(relateddocument:Document)
optional match (document)-[:HAS_CATEGORY]->(c:Category)<-[:HAS_CATEGORY]-(relateddocument)
optional match (document)-[:HAS_KEYWORD]->(k:Keyword)<-[:HAS_KEYWORD]-(relateddocument)
optional match (document)-[:HAS_AUTHOR]->(a:Author)<-[:HAS_AUTHOR]-(relateddocument)
with document, relateddocument, collect(c)+collect(k)+collect(a) as similarity
where id(document) = 85182 return relateddocument, similarity order by similarity desc limit 5
Could you please give me a hint how I could optimize this query? 您能否提示我如何优化此查询?
As the other answers indicate, you need to put the WHERE
clause as close as possible to the corresponding MATCH
, to minimize the number of rows generated by the MATCH
. 如其他答案所示,您需要将
WHERE
子句尽可能地靠近相应的MATCH
,以最大程度地减少MATCH
生成的行数。
In addition, you can eliminate the cartesian products caused by all the back-to-back OPTIONAL MATCH
clauses by using COLLECT
to convert the N rows from each MATCH
to 1 row. 此外,可以通过使用
COLLECT
将每个MATCH
的N行转换为1行,消除由所有背对背的OPTIONAL MATCH
子句引起的笛卡尔乘积。 (The last WITH
would be right before the RETURN
, and so could be "merged" into the RETURN
.) (最后一个
WITH
将在RETURN
之前,因此可以“合并”到RETURN
。)
Also, your ORDER BY similarity DESC
clause does not make any sense (and will probably cause an error), since similarity
is a collection. 同样,您的
ORDER BY similarity DESC
子句没有任何意义(并且可能会导致错误),因为similarity
是一个集合。 You probably meant to use SIZE(similarity)
instead of similarity
there. 您可能打算使用
SIZE(similarity)
代替那里的similarity
。
This should be faster: 这应该更快:
MATCH (document:Document)-[:HAS_CATEGORY|:HAS_KEYWORD|:HAS_AUTHOR*..2]-(relateddocument:Document)
WHERE ID(document) = 85182
OPTIONAL MATCH (document)-[:HAS_CATEGORY]->(c:Category)<-[:HAS_CATEGORY]-(relateddocument)
WITH document, relateddocument, COLLECT(c) AS cs
OPTIONAL MATCH (document)-[:HAS_KEYWORD]->(k:Keyword)<-[:HAS_KEYWORD]-(relateddocument)
WITH document, relateddocument, cs, COLLECT(k) AS ks
OPTIONAL MATCH (document)-[:HAS_AUTHOR]->(a:Author)<-[:HAS_AUTHOR]-(relateddocument)
RETURN relateddocument, cs+ks+collect(a) as similarity
ORDER BY SIZE(similarity) DESC
LIMIT 5;
Notice that the first MATCH
also uses [:HAS_CATEGORY|:HAS_KEYWORD|:HAS_AUTHOR*..2]
to filter the relationship types, in case your documents have a lot of relationships with other types. 请注意,如果您的文档与其他类型有很多关系,则第一个
MATCH
还使用[:HAS_CATEGORY|:HAS_KEYWORD|:HAS_AUTHOR*..2]
来过滤关系类型。 That could further reduce the number of rows generated by the first MATCH
, which would reduce the amount of work done by the entire query. 这可以进一步减少第一个
MATCH
生成的行数,这将减少整个查询完成的工作量。
One way to immediately improve it is to move the WHERE id(document) = 85182 straight under the MATCH statement. 立即改善它的一种方法是将WHERE id(document)= 85182直接移到MATCH语句下。 That should give a major difference if you PROFILE the query.
如果您配置查询,那应该会带来很大的不同。
Regards, Tom 问候,汤姆
The main problem is that match (document:Document)-[*..2]-(relateddocument:Document)
is a Cartesian product between all documents to all documents 2 links away, and the WITH between the match and the id filter tells Cypher not to apply the filter until AFTER it has done all the work. 主要问题是
match (document:Document)-[*..2]-(relateddocument:Document)
是所有文档到所有2个链接之间的文档之间的笛卡尔积,并且match和id过滤器之间的WITH告诉Cypher在完成所有工作之前不要应用过滤器。 By moving the WHERE id(...) to before the WITH, Cypher will know it is safe to limit (document:Document) to just id 85182, and thus avoid the n^2 match of basically all documents to all documents. 通过将WHERE id(...)移到WITH之前,Cypher知道将(document:Document)限制为ID 85182是安全的,从而避免了基本上所有文档与所有文档的n ^ 2匹配。
MATCH (document:Document)-[*..2]-(relateddocument:Document)
WHERE id(document) = 85182
WITH document, relateddocument
OPTIONAL MATCH (document)-[:HAS_CATEGORY]->(c:Category)<-[:HAS_CATEGORY]-(relateddocument)
OPTIONAL MATCH (document)-[:HAS_KEYWORD]->(k:Keyword)<-[:HAS_KEYWORD]-(relateddocument)
OPTIONAL MATCH (document)-[:HAS_AUTHOR]->(a:Author)<-[:HAS_AUTHOR]-(relateddocument)
WITH relateddocument, collect(c)+collect(k)+collect(a) as similarity
RETURN relateddocument, similarity
order by similarity desc
limit 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.