计算Neo4j中节点之间的相似性

Question

I have the following table which says the frequency of task each Originator performs. 我有下表说明每个Originator执行的任务的频率。 (Please see the attached image). （请参阅附图）。

Task-Frequency for each Originators 每个发起者的任务频率

I represented the above table in Neo4j with the relationship Originator -[Frequency]->Task . 我在Neo4j中使用Originator - [Frequency] - > Task的关系表示了上表。

Now I need to compute similarity(eg. Jaccard Similarity) between two users using Cypher Queries only. 现在我需要仅使用Cypher查询来计算两个用户之间的相似性（例如，Jaccard相似性）。 Would like to know how is it possible or would the schema definition be altered altogether. 想知道它是如何可能的，或者是否会完全改变模式定义。

Thanks in advance. 提前致谢。

Answer 1

This is more a starting point then an answer! 这更像是一个起点，然后是一个答案！ If we start by ignoring the value of the frequency then I think that you can try something like: 如果我们首先忽略频率的值，那么我认为你可以尝试这样的事情：

MATCH (u1:Originator{name:'John'}), (u2:Originator{name:'Sue'})
WITH u1, u2
OPTIONAL MATCH common=(u1)-[:FREQUENCY]->(t:Task)<-[:FREQUENCY]-(u2)
WITH u1, u2, COUNT(common) as intersection
OPTIONAL MATCH (u1)-[:FREQUENCY]->(t:Task)
WITH u1, u2, intersection, COLLECT(DISTINCT t) AS t1s
OPTIONAL MATCH (u2)-[:FREQUENCY]->(t:Task)
WHERE NOT t IN t1s
WITH u1, u2, intersection, t1s + COLLECT(DISTINCT t) AS union
RETURN u1, u2, intersection / union as js

This is definitely untested and there are probably efficiencies to be found by somehow not repeatedly matching the tasks. 这绝对是未经测试的，并且可能通过某种方式找不到重复匹配任务的效率。

What the query is doing is finding the tasks that the two users have in common and storing the number of common tasks in the variable intersection . 查询正在做的是找到两个用户共有的任务，并将常见任务的数量存储在变量intersection 。 It then uses individually matches (optionally) each user's tasks and uses these to calculate the union ( COLLECT will create a zero length array where there are zero matches). 然后它使用单独匹配（可选）每个用户的任务并使用它们来计算union （ COLLECT将创建零长度数组，其中零匹配）。 There could be a divide by zero issue to work around in the final return statement. 在最终的回报声明中可能存在零除问题。

How frequency should affect the result is hard to say, I wonder if you would be better served by swapping :Frequency with :Completed and creating a new relationship for every task completed (ie 6 relationships between 'John' and 'Act A'). 频率如何影响结果很难说，我想知道你是否会通过交换更好地服务:Frequency :Completed并为每个完成的任务创建一个新的关系（即'John'和'Act A'之间的6个关系）。 This would be great for supporting the intersection but would still have interesting connotations for the Union . 这对于支持intersection非常有用，但对于Union来说仍然会有一些有趣的内涵。

Answer 2

The link solved my problem. 该链接解决了我的问题。 Just had to take every links into consideration. 只需要考虑每个环节。

http://neo4j.com/docs/stable/cypher-cookbook-similarity-calc.html http://neo4j.com/docs/stable/cypher-cookbook-similarity-calc.html

计算Neo4j中节点之间的相似性

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-11-05 17:20:52

解决方案2
1 2014-11-07 10:24:51

计算Neo4j中节点之间的相似性

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-11-05 17:20:52

解决方案2 1 2014-11-07 10:24:51

解决方案1
1 已采纳 2014-11-05 17:20:52

解决方案2
1 2014-11-07 10:24:51