简体   繁体   English

Cypher / Neo4j:如何匹配与所有相关节点有关系的节点

[英]Cypher/Neo4j: How to match nodes that have relationship to all related nodes

I'm trying to find out the number of users that have all the necessary skills to qualify for an occupation. 我正在努力找出拥有获得职业资格所需的所有技能的用户数量。 Users can have many skills, and I want to return all the qualified users per job. 用户可以拥有许多技能,我希望每个作业返回所有合格用户。

Here's my current query: 这是我当前的查询:

  MATCH (:User)-[:has_skill]->(:Skill)<-[:requires]-(o:Occupation)
  WITH DISTINCT o
  MATCH (o)
  WITH o, SIZE((o)-[:requires]->()) AS occupation_skill_count
  MATCH (o)-[:requires]->(:Skill)<-[hs:has_skill]-(u:User)
  WITH o, u, occupation_skill_count, count(hs) AS user_skill_count
  WHERE occupation_skill_count = user_skill_count
  WITH o.title as occupation_title, count(u) as users_count
  RETURN occupation_title, users_count

However, I'm concerned that my query is not efficient, since it times out (there are over 60,000 occupations, 10,000 users, and 2,500 skills) . 但是,我担心我的查询效率不高,因为它超时(有超过60,000个职业,10,000个用户和2,500个技能)。 I want to know if there's a better way to write this query. 我想知道是否有更好的方法来编写此查询。

My approach in writing this query is, 我写这个查询的方法是,

  1. Match all the occupations that are connected to user through skill. 匹配通过技能连接到用户的所有职业。
  2. Count the number of required skills for all those occupations. 计算所有这些职业所需技能的数量。
  3. Match all the users that are connected to those occupations through skill, where the number of skills that the user has to that occupation equals the number of all the required skills that the occupation requires. 通过技能匹配与这些职业相关的所有用户,其中用户对该职业的技能数量等于职业所需的所有技能数量。

This seems to work in staging environment, where the records are much less. 这似乎适用于暂存环境,其中记录要少得多。 However it will just time out in prod as there are too many data. 然而,由于数据太多,它只会超时。 Is there a better way to write this? 有没有更好的方法来写这个?

For performance issues, it helps to show the PROFILE plan of the query. 对于性能问题,有助于显示查询的PROFILE计划。 If you could expand all elements of the plan and paste it into your description, that could help identify where the query can be improved. 如果您可以展开计划的所有元素并将其粘贴到说明中,则可以帮助确定可以改进查询的位置。

Since you're performing this for all occupations, it's a good candidate for batching. 由于您是为所有职业执行此操作,因此它是批处理的理想选择。 However, since batching won't be able to return the counts (it's used for write operations), we can instead use it to write the counts to the :Occupation nodes so we can query for these numbers fast after we're done computing them. 但是,由于批处理将无法返回计数(它用于写入操作),我们可以使用它将计数写入:占用节点,这样我们就可以在计算完这些数字后快速查询这些数字。 。 At that point it's up to you if you want to keep the calculated properties (maybe with a timestamp of when they were calculated), or simply report on them and remove the properties immediately. 此时,如果您想保留计算出的属性(可能是计算时间的时间戳),或者只是报告它们并立即删除属性,则取决于您。

You'll need APOC Procedures for performing the batching operation. 您需要APOC程序来执行批处理操作。 apoc.periodic.iterate() will be the procedure of choice (you can adjust the batchSize to whatever works best for you). apoc.periodic.iterate()将是选择的过程(您可以将batchSize调整为最适合您的方法)。 I'll add comments inline. 我会在线添加评论。

CALL apoc.periodic.iterate(
 // iterate in batches for all :Occupations
 "MATCH (o:Occupation) RETURN o",
 // for each occupation, get all skills in ascending order of skilled users
 "MATCH (o)-[:requires]->(s:Skill)
 WITH o, s, size((s)<-[:has_skill]-()) as skilledUserCount
 WHERE skilledUserCount <> 0
 ORDER BY skilledUserCount ASC
 WITH o, collect(s) as skills
 WITH o, head(skills) as first, tail(skills) as skills
 // get users with all the required skills
 // because of ordering, we start with the smallest set of skilled users
 MATCH (first)<-[:has_skill]-(u)
 WHERE ALL(skill in skills WHERE (skill)<-[:has_skill]-(u))
 // now set this count of users with all skills to the occupation
 WITH o, count(u) as skilledUsers
 SET o.skilledUsers = skilledUsers
 // uncomment next line to keep a timestamp of when this was last updated
 // SET o.skilledUsersUpdated = timestamp()
 ",
 {batchSize:1000, parallel:true, iterateList:true}) YIELD batches, total
 RETURN batches, total

Once this finishes, all occupations should have their number of skilled users for easy querying: 完成后,所有职业都应该有熟练的用户数,以便于查询:

MATCH (o:Occupation)
RETURN o.title as occupation_title, o.skilledUsers as users_count

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM