[英]Cypher/Neo4j: How to match nodes that have relationship to all related nodes
I'm trying to find out the number of users that have all the necessary skills to qualify for an occupation. 我正在努力找出拥有获得职业资格所需的所有技能的用户数量。 Users can have many skills, and I want to return all the qualified users per job. 用户可以拥有许多技能,我希望每个作业返回所有合格用户。
Here's my current query: 这是我当前的查询:
MATCH (:User)-[:has_skill]->(:Skill)<-[:requires]-(o:Occupation)
WITH DISTINCT o
MATCH (o)
WITH o, SIZE((o)-[:requires]->()) AS occupation_skill_count
MATCH (o)-[:requires]->(:Skill)<-[hs:has_skill]-(u:User)
WITH o, u, occupation_skill_count, count(hs) AS user_skill_count
WHERE occupation_skill_count = user_skill_count
WITH o.title as occupation_title, count(u) as users_count
RETURN occupation_title, users_count
However, I'm concerned that my query is not efficient, since it times out (there are over 60,000 occupations, 10,000 users, and 2,500 skills) . 但是,我担心我的查询效率不高,因为它超时(有超过60,000个职业,10,000个用户和2,500个技能)。 I want to know if there's a better way to write this query. 我想知道是否有更好的方法来编写此查询。
My approach in writing this query is, 我写这个查询的方法是,
This seems to work in staging environment, where the records are much less. 这似乎适用于暂存环境,其中记录要少得多。 However it will just time out in prod as there are too many data. 然而,由于数据太多,它只会超时。 Is there a better way to write this? 有没有更好的方法来写这个?
For performance issues, it helps to show the PROFILE plan of the query. 对于性能问题,有助于显示查询的PROFILE计划。 If you could expand all elements of the plan and paste it into your description, that could help identify where the query can be improved. 如果您可以展开计划的所有元素并将其粘贴到说明中,则可以帮助确定可以改进查询的位置。
Since you're performing this for all occupations, it's a good candidate for batching. 由于您是为所有职业执行此操作,因此它是批处理的理想选择。 However, since batching won't be able to return the counts (it's used for write operations), we can instead use it to write the counts to the :Occupation nodes so we can query for these numbers fast after we're done computing them. 但是,由于批处理将无法返回计数(它用于写入操作),我们可以使用它将计数写入:占用节点,这样我们就可以在计算完这些数字后快速查询这些数字。 。 At that point it's up to you if you want to keep the calculated properties (maybe with a timestamp of when they were calculated), or simply report on them and remove the properties immediately. 此时,如果您想保留计算出的属性(可能是计算时间的时间戳),或者只是报告它们并立即删除属性,则取决于您。
You'll need APOC Procedures for performing the batching operation. 您需要APOC程序来执行批处理操作。 apoc.periodic.iterate()
will be the procedure of choice (you can adjust the batchSize to whatever works best for you). apoc.periodic.iterate()
将是选择的过程(您可以将batchSize调整为最适合您的方法)。 I'll add comments inline. 我会在线添加评论。
CALL apoc.periodic.iterate(
// iterate in batches for all :Occupations
"MATCH (o:Occupation) RETURN o",
// for each occupation, get all skills in ascending order of skilled users
"MATCH (o)-[:requires]->(s:Skill)
WITH o, s, size((s)<-[:has_skill]-()) as skilledUserCount
WHERE skilledUserCount <> 0
ORDER BY skilledUserCount ASC
WITH o, collect(s) as skills
WITH o, head(skills) as first, tail(skills) as skills
// get users with all the required skills
// because of ordering, we start with the smallest set of skilled users
MATCH (first)<-[:has_skill]-(u)
WHERE ALL(skill in skills WHERE (skill)<-[:has_skill]-(u))
// now set this count of users with all skills to the occupation
WITH o, count(u) as skilledUsers
SET o.skilledUsers = skilledUsers
// uncomment next line to keep a timestamp of when this was last updated
// SET o.skilledUsersUpdated = timestamp()
",
{batchSize:1000, parallel:true, iterateList:true}) YIELD batches, total
RETURN batches, total
Once this finishes, all occupations should have their number of skilled users for easy querying: 完成后,所有职业都应该有熟练的用户数,以便于查询:
MATCH (o:Occupation)
RETURN o.title as occupation_title, o.skilledUsers as users_count
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.