[英]Computing similarity between all nodes neo4j - getting different values for a node pair
我的数据库中有两种节点:
还有一种关系-“喜欢”
两个节点之间的关系描述如下:
(:USER)-[:LIKES]->(:MEDIA)
我正在尝试根据每个节点对之间共享的媒体数来计算所有“ USER”节点之间的相似度(Jaccard相似度)
然后将这种相似性存储为“ ISSIMILAR”关系。 “ ISSIMILAR”关系具有一个称为“相似性”的属性,该属性存储节点之间的相似性
这是我的查询:
Match(u:User)
WITH COLLECT(u) as users
UNWIND users as user
MATCH(user:User{id:user.id})-[:LIKES]->(common_media:Media)<-[:LIKES]-(other:User)
WITH user,other,count(common_media) AS intersection, COLLECT(common_media.name) as i
MATCH(user)-[:LIKES]->(user_media:Media)
WITH user,other,intersection,i, COLLECT(user_media.name) AS s1
MATCH(other)-[:LIKES]->(other_media:Media)
WITH user,other,intersection,i,s1, COLLECT(other_media.name) AS s2
WITH user,other,intersection,s1,s2
WITH user,other,intersection,s1+filter(x IN s2 WHERE NOT x IN s1) AS union, s1,s2
WITH ((1.0*intersection)/SIZE(union)) as jaccard,user,other
MERGE(user)-[:ISSIMILAR{similarity:jaccard}]-(other)
运行此查询,我有两个问题:
这是问题的可视化:
MATCH(user:User)-[r]-(o:User) return o,user,r limit 4
提前致谢
出现两个相似关系的问题是因为您不排除先前构造的相似关系。 您可以通过执行以下操作来避免这种情况:
...
UNWIND users as user
UNWIND users as other
WITH user, other WHERE ID(user) > ID(other)
MATCH(user)-[:LIKES]->(common_media:Media)<-[:LIKES]-(other)
...
最后的查询可以变得更加清晰:
MATCH (u:User) WITH COLLECT(u) AS users
UNWIND users AS user
UNWIND users AS other
MATCH (user)-[:LIKES]->(common_media:Media)<-[:LIKES]-(other) WHERE ID(other) > ID(user)
WITH user, other, COLLECT(common_media) AS intersection
MATCH (user)-[:LIKES]->(user_media:Media)
WITH user, other, intersection,
COLLECT(user_media) AS s1
MATCH (other)-[:LIKES]->(other_media:Media)
WITH user,other,intersection, s1,
COLLECT(other_media) AS s2
RETURN user, other,
(1.0 * SIZE(intersection)) / (SIZE(s1) + SIZE(s2) - SIZE(intersection)) AS jaccard
MERGE (user)-[:ISSIMILAR {similarity: jaccard}]->(other)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.