neo4j cypher查询运行速度很慢

Question

我使用neo4j存储应用程序数据，下面的图像描绘了图形结构

每个圆圈是一个节点，每个箭头描绘一个关系，关系类型如上所述。 它还many to many or one to many or one to one relationship for nodes定义many to many or one to many or one to one relationship for nodes 。

我想从图表中检索什么。

我想列出公司的所有职位，每个职位都有一系列用户，每个用户都会有一系列的反馈，如下所示

position ---> candidate1 interview round name (Telephonic) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 interview round name (HR Round) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 candidate2 interview round name (Telephonic) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 interview round name (HR Round) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 . . .

许多候选人不会进行面试，而不是那些候选人应该为空。

下面是我用来检索我需要的数据的查询。

MATCH (comp:Company {dId: "155dyv1wgT"})<-[:`POSITION_COMPANY`]-(pos: Position {status: 'OPEN'})-[:`POSITION_WORKFLOW`]->(:WorkFlow)-[:`WORKFLOW_CANDIDATE-WORKFLOW`]->(cw : CandidateWorkFlow)-[:`CANDIDATE-WORKFLOW_COMPANY-CANDIDATE`]->(cc : CompanyCandidate)

where ((not (has(cc.isSpam) or has(cc.isTrash))) OR (cc.isSpam=false and cc.isTrash=false)) and pos.positionType IN ['PUBLIC','DISCRETE'] with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc

OPTIONAL MATCH (cw)-[:`CANDIDATE_WORKFLOW_INTERVIEW`]->(inwrkflw: InterviewWorkFlow)-[:`INTERVIEW_ROUND`]->(intrnd: InterviewRound)-[:`INTERVIEW_ROUND_FEEDBACK`]->(ffform: FeedbackForm)-[:`FEEDBACK_QUESTION`]-(ffq: Question) 

OPTIONAL MATCH (inwrkflw)-[:`INTERVIEW_WORKFLOW_FEEDBACK`]-(ff:Feedback)

OPTIONAL MATCH (iwr : User)-[:`FEEDBACK_BY`]->(ff)-[:`FEEDBACK_ANSWER`]->(answer:Answer)-[:`QUESTION_ANSWER`]->(ffq) 

with collect({answer : answer.value, rating: answer.rating, question : ffq.qText, givenBy : iwr.fullName, type: ffq.questionType, givenOn: answer.lastModifiedDate}) as rnds, cc, pos, intrnd

with filter(rnd IN rnds WHERE rnd.type = 'COMMENTS') as comments, filter(rnd IN rnds WHERE rnd.type = 'LINEAR_GENERIC') as ratings, cc, pos, intrnd

with distinct collect({roundName: intrnd.name, ratings: ratings, comments: comments}) as rounds, cc, pos

return collect({cc: cc, rounds: rounds}) as data, pos.dId as posId, pos.title as posTitle

dId在每个节点上都是唯一的。

这个查询的问题是对于小数据集，假设1000个候选人有10个位置，它将运行良好。 但对于大型数据集，返回结果需要很长时间。 我甚至在neo4j控制台中等待了5分钟的响应但是在5分钟内没有响应。

该申请将没有1000名候选人。 候选人数量将达到100000 ，最低和最高我可以假设每个公司100万。

我尝试过各种方法来优化此查询，但无法获得响应。

响应SLA应在20秒内。

我的问题是

如何优化此查询以获得我想要的结果？
当前查询有什么问题？

Answer 1

首先，我将您的数据库升级到2.3.3

您的模型由两个描述的子图组成

招聘流程的元信息
一个候选人的具体反馈/答案

对于你的公司来说都非常大：

通过面试过程从公司到问题的235600条路径
83937从公司到问题的具体答案

每个计算需要大约1秒

如果您只是查询它们，您将乘以最终达到200亿条路径的数字。

这需要非常永远的计算

我的解决方案是首先查询一个子图，然后将其放在一边（在聚合中），然后查询第二个子图

首先，我通过匹配问题的答案（将其转换为ExpandInto操作）将它们组合在一起

WHERE (answer)-[:QUESTION_ANSWER]->(ffq)

这使得查询在大约15秒内完成。

然后我将具体（答案）子图一步扩展到问题，并将它们集中在问题本身（ ffq = ffq2 ）

这使总执行时间缩短到1.6秒。

这是最后的查询：

MATCH (comp:Company {dId: "155dyv1wgT"})<-[:`POSITION_COMPANY`]-(pos: Position {status: 'OPEN'})-[:`POSITION_WORKFLOW`]->(:WorkFlow)-[:`WORKFLOW_CANDIDATE-WORKFLOW`]->(cw : CandidateWorkFlow)-[:`CANDIDATE-WORKFLOW_COMPANY-CANDIDATE`]->(cc : CompanyCandidate)

where ((not (has(cc.isSpam) or has(cc.isTrash))) OR (cc.isSpam=false and cc.isTrash=false)) and pos.positionType IN ['PUBLIC','DISCRETE'] with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc

with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc

MATCH (cw)-[:`CANDIDATE_WORKFLOW_INTERVIEW`]->(inwrkflw)

MATCH (inwrkflw)-[:`INTERVIEW_ROUND`]->(intrnd)-[:`INTERVIEW_ROUND_FEEDBACK`]->(ffform)-[:`FEEDBACK_QUESTION`]-(ffq) 

WITH comp,pos, cw, cc,inwrkflw, collect({round:intrnd,form:ffform,question:ffq}) as workflow_questions

MATCH (inwrkflw)-[:`INTERVIEW_WORKFLOW_FEEDBACK`]-(ff:Feedback)
MATCH (iwr : User)-[:`FEEDBACK_BY`]->(ff)-[:`FEEDBACK_ANSWER`]->(answer:Answer)-[:`QUESTION_ANSWER`]->(ffq2) 

UNWIND workflow_questions as wq

WITH comp,pos, cw, cc,inwrkflw, iwr,ff,answer, wq.round as intrnd, wq.form as ffform, wq.question as ffq

WHERE ffq2 = ffq

with collect({answer : answer.value, rating: answer.rating, question : ffq.qText, givenBy : iwr.fullName, type: ffq.questionType, givenOn: answer.lastModifiedDate}) as rnds, cc, pos, intrnd

with filter(rnd IN rnds WHERE rnd.type = 'COMMENTS') as comments, filter(rnd IN rnds WHERE rnd.type = 'LINEAR_GENERIC') as ratings, cc, pos, intrnd

with collect({roundName: intrnd.name, ratings: ratings, comments: comments}) as rounds, cc, pos

return collect({cc: cc, rounds: rounds}) as data, pos.dId as posId, pos.title as posTitle;

neo4j cypher查询运行速度很慢

问题描述

1 个解决方案

解决方案1
0 2016-06-04 21:50:55

neo4j cypher查询运行速度很慢

问题描述

1 个解决方案

解决方案1 0 2016-06-04 21:50:55

解决方案1
0 2016-06-04 21:50:55