简体   繁体   English

如何使用多个可选匹配来优化Cypher-Query

[英]How to optimize Cypher-Query with multiple OPTIONAL MATCH

I have following cypher query with multiple optional matches which can't be run on my machine anymore (Cartesian Product): 我有以下具有多个可选匹配项的密码查询,这些匹配项无法在我的计算机上运行(笛卡尔积):

match (document:Document)-[*..2]-(relateddocument:Document)
optional match (document)-[:HAS_CATEGORY]->(c:Category)<-[:HAS_CATEGORY]-(relateddocument)
optional match (document)-[:HAS_KEYWORD]->(k:Keyword)<-[:HAS_KEYWORD]-(relateddocument)
optional match (document)-[:HAS_AUTHOR]->(a:Author)<-[:HAS_AUTHOR]-(relateddocument)
with document, relateddocument, collect(c)+collect(k)+collect(a) as similarity
where id(document) = 85182 return relateddocument, similarity order by similarity desc limit 5

Could you please give me a hint how I could optimize this query? 您能否提示我如何优化此查询?

As the other answers indicate, you need to put the WHERE clause as close as possible to the corresponding MATCH , to minimize the number of rows generated by the MATCH . 如其他答案所示,您需要将WHERE子句尽可能地靠近相应的MATCH ,以最大程度地减少MATCH生成的行数。

In addition, you can eliminate the cartesian products caused by all the back-to-back OPTIONAL MATCH clauses by using COLLECT to convert the N rows from each MATCH to 1 row. 此外,可以通过使用COLLECT将每个MATCH的N行转换为1行,消除由所有背对背的OPTIONAL MATCH子句引起的笛卡尔乘积。 (The last WITH would be right before the RETURN , and so could be "merged" into the RETURN .) (最后一个WITH将在RETURN之前,因此可以“合并”到RETURN 。)

Also, your ORDER BY similarity DESC clause does not make any sense (and will probably cause an error), since similarity is a collection. 同样,您的ORDER BY similarity DESC子句没有任何意义(并且可能会导致错误),因为similarity是一个集合。 You probably meant to use SIZE(similarity) instead of similarity there. 您可能打算使用SIZE(similarity)代替那里的similarity

This should be faster: 这应该更快:

MATCH (document:Document)-[:HAS_CATEGORY|:HAS_KEYWORD|:HAS_AUTHOR*..2]-(relateddocument:Document)
WHERE ID(document) = 85182
OPTIONAL MATCH (document)-[:HAS_CATEGORY]->(c:Category)<-[:HAS_CATEGORY]-(relateddocument)
WITH document, relateddocument, COLLECT(c) AS cs
OPTIONAL MATCH (document)-[:HAS_KEYWORD]->(k:Keyword)<-[:HAS_KEYWORD]-(relateddocument)
WITH document, relateddocument, cs, COLLECT(k) AS ks
OPTIONAL MATCH (document)-[:HAS_AUTHOR]->(a:Author)<-[:HAS_AUTHOR]-(relateddocument)
RETURN relateddocument, cs+ks+collect(a) as similarity
ORDER BY SIZE(similarity) DESC
LIMIT 5;

Notice that the first MATCH also uses [:HAS_CATEGORY|:HAS_KEYWORD|:HAS_AUTHOR*..2] to filter the relationship types, in case your documents have a lot of relationships with other types. 请注意,如果您的文档与其他类型有很多关系,则第一个MATCH还使用[:HAS_CATEGORY|:HAS_KEYWORD|:HAS_AUTHOR*..2]来过滤关系类型。 That could further reduce the number of rows generated by the first MATCH , which would reduce the amount of work done by the entire query. 这可以进一步减少第一个MATCH生成的行数,这将减少整个查询完成的工作量。

One way to immediately improve it is to move the WHERE id(document) = 85182 straight under the MATCH statement. 立即改善它的一种方法是将WHERE id(document)= 85182直接移到MATCH语句下。 That should give a major difference if you PROFILE the query. 如果您配置查询,那应该会带来很大的不同。

Regards, Tom 问候,汤姆

The main problem is that match (document:Document)-[*..2]-(relateddocument:Document) is a Cartesian product between all documents to all documents 2 links away, and the WITH between the match and the id filter tells Cypher not to apply the filter until AFTER it has done all the work. 主要问题是match (document:Document)-[*..2]-(relateddocument:Document)是所有文档到所有2个链接之间的文档之间的笛卡尔积,并且match和id过滤器之间的WITH告诉Cypher在完成所有工作之前不要应用过滤器。 By moving the WHERE id(...) to before the WITH, Cypher will know it is safe to limit (document:Document) to just id 85182, and thus avoid the n^2 match of basically all documents to all documents. 通过将WHERE id(...)移到WITH之前,Cypher知道将(document:Document)限制为ID 85182是安全的,从而避免了基本上所有文档与所有文档的n ^ 2匹配。

MATCH (document:Document)-[*..2]-(relateddocument:Document)
WHERE id(document) = 85182
WITH document, relateddocument
OPTIONAL MATCH (document)-[:HAS_CATEGORY]->(c:Category)<-[:HAS_CATEGORY]-(relateddocument)
OPTIONAL MATCH (document)-[:HAS_KEYWORD]->(k:Keyword)<-[:HAS_KEYWORD]-(relateddocument)
OPTIONAL MATCH (document)-[:HAS_AUTHOR]->(a:Author)<-[:HAS_AUTHOR]-(relateddocument)
WITH relateddocument, collect(c)+collect(k)+collect(a) as similarity
RETURN relateddocument, similarity 
order by similarity desc 
limit 5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM