![](/img/trans.png)
[英]Cypher Neo4j - Query that uses the clause 'IN' on the collection is very slow
[英]neo4j cypher query running very slow
我使用neo4j存储应用程序数据,下面的图像描绘了图形结构
每个圆圈是一个节点,每个箭头描绘一个关系,关系类型如上所述。 它还many to many or one to many or one to one relationship for nodes
定义many to many or one to many or one to one relationship for nodes
。
我想从图表中检索什么。
我想列出公司的所有职位,每个职位都有一系列用户,每个用户都会有一系列的反馈,如下所示
position ---> candidate1 interview round name (Telephonic) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 interview round name (HR Round) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 candidate2 interview round name (Telephonic) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 interview round name (HR Round) question1 answer1 and answer given by user1 question1 answer1 and answer given by user2 . . .
许多候选人不会进行面试,而不是那些候选人应该为空。
下面是我用来检索我需要的数据的查询。
MATCH (comp:Company {dId: "155dyv1wgT"})<-[:`POSITION_COMPANY`]-(pos: Position {status: 'OPEN'})-[:`POSITION_WORKFLOW`]->(:WorkFlow)-[:`WORKFLOW_CANDIDATE-WORKFLOW`]->(cw : CandidateWorkFlow)-[:`CANDIDATE-WORKFLOW_COMPANY-CANDIDATE`]->(cc : CompanyCandidate)
where ((not (has(cc.isSpam) or has(cc.isTrash))) OR (cc.isSpam=false and cc.isTrash=false)) and pos.positionType IN ['PUBLIC','DISCRETE'] with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc
OPTIONAL MATCH (cw)-[:`CANDIDATE_WORKFLOW_INTERVIEW`]->(inwrkflw: InterviewWorkFlow)-[:`INTERVIEW_ROUND`]->(intrnd: InterviewRound)-[:`INTERVIEW_ROUND_FEEDBACK`]->(ffform: FeedbackForm)-[:`FEEDBACK_QUESTION`]-(ffq: Question)
OPTIONAL MATCH (inwrkflw)-[:`INTERVIEW_WORKFLOW_FEEDBACK`]-(ff:Feedback)
OPTIONAL MATCH (iwr : User)-[:`FEEDBACK_BY`]->(ff)-[:`FEEDBACK_ANSWER`]->(answer:Answer)-[:`QUESTION_ANSWER`]->(ffq)
with collect({answer : answer.value, rating: answer.rating, question : ffq.qText, givenBy : iwr.fullName, type: ffq.questionType, givenOn: answer.lastModifiedDate}) as rnds, cc, pos, intrnd
with filter(rnd IN rnds WHERE rnd.type = 'COMMENTS') as comments, filter(rnd IN rnds WHERE rnd.type = 'LINEAR_GENERIC') as ratings, cc, pos, intrnd
with distinct collect({roundName: intrnd.name, ratings: ratings, comments: comments}) as rounds, cc, pos
return collect({cc: cc, rounds: rounds}) as data, pos.dId as posId, pos.title as posTitle
dId
在每个节点上都是唯一的。
这个查询的问题是对于小数据集,假设1000个候选人有10个位置,它将运行良好。 但对于大型数据集,返回结果需要很长时间。 我甚至在neo4j控制台中等待了5分钟的响应但是在5分钟内没有响应。
该申请将没有1000名候选人。 候选人数量将达到100000
,最低和最高我可以假设每个公司100万。
我尝试过各种方法来优化此查询,但无法获得响应。
响应SLA应在20秒内。
我的问题是
首先,我将您的数据库升级到2.3.3
您的模型由两个描述的子图组成
对于你的公司来说都非常大:
每个计算需要大约1秒
如果您只是查询它们,您将乘以最终达到200亿条路径的数字。
这需要非常永远的计算
我的解决方案是首先查询一个子图,然后将其放在一边(在聚合中),然后查询第二个子图
首先,我通过匹配问题的答案(将其转换为ExpandInto操作)将它们组合在一起
WHERE (answer)-[:QUESTION_ANSWER]->(ffq)
这使得查询在大约15秒内完成。
然后我将具体(答案)子图一步扩展到问题,并将它们集中在问题本身( ffq = ffq2
)
这使总执行时间缩短到1.6秒。
这是最后的查询:
MATCH (comp:Company {dId: "155dyv1wgT"})<-[:`POSITION_COMPANY`]-(pos: Position {status: 'OPEN'})-[:`POSITION_WORKFLOW`]->(:WorkFlow)-[:`WORKFLOW_CANDIDATE-WORKFLOW`]->(cw : CandidateWorkFlow)-[:`CANDIDATE-WORKFLOW_COMPANY-CANDIDATE`]->(cc : CompanyCandidate)
where ((not (has(cc.isSpam) or has(cc.isTrash))) OR (cc.isSpam=false and cc.isTrash=false)) and pos.positionType IN ['PUBLIC','DISCRETE'] with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc
with distinct comp, {dId: pos.dId, title: pos.title} as pos, cw, cc
MATCH (cw)-[:`CANDIDATE_WORKFLOW_INTERVIEW`]->(inwrkflw)
MATCH (inwrkflw)-[:`INTERVIEW_ROUND`]->(intrnd)-[:`INTERVIEW_ROUND_FEEDBACK`]->(ffform)-[:`FEEDBACK_QUESTION`]-(ffq)
WITH comp,pos, cw, cc,inwrkflw, collect({round:intrnd,form:ffform,question:ffq}) as workflow_questions
MATCH (inwrkflw)-[:`INTERVIEW_WORKFLOW_FEEDBACK`]-(ff:Feedback)
MATCH (iwr : User)-[:`FEEDBACK_BY`]->(ff)-[:`FEEDBACK_ANSWER`]->(answer:Answer)-[:`QUESTION_ANSWER`]->(ffq2)
UNWIND workflow_questions as wq
WITH comp,pos, cw, cc,inwrkflw, iwr,ff,answer, wq.round as intrnd, wq.form as ffform, wq.question as ffq
WHERE ffq2 = ffq
with collect({answer : answer.value, rating: answer.rating, question : ffq.qText, givenBy : iwr.fullName, type: ffq.questionType, givenOn: answer.lastModifiedDate}) as rnds, cc, pos, intrnd
with filter(rnd IN rnds WHERE rnd.type = 'COMMENTS') as comments, filter(rnd IN rnds WHERE rnd.type = 'LINEAR_GENERIC') as ratings, cc, pos, intrnd
with collect({roundName: intrnd.name, ratings: ratings, comments: comments}) as rounds, cc, pos
return collect({cc: cc, rounds: rounds}) as data, pos.dId as posId, pos.title as posTitle;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.