简体   繁体   中英

Compute Similarity between nodes in Neo4j

I have the following table which says the frequency of task each Originator performs. (Please see the attached image).

Task-Frequency for each Originators

I represented the above table in Neo4j with the relationship Originator -[Frequency]->Task .

Now I need to compute similarity(eg. Jaccard Similarity) between two users using Cypher Queries only. Would like to know how is it possible or would the schema definition be altered altogether.

Thanks in advance.

This is more a starting point then an answer! If we start by ignoring the value of the frequency then I think that you can try something like:

MATCH (u1:Originator{name:'John'}), (u2:Originator{name:'Sue'})
WITH u1, u2
OPTIONAL MATCH common=(u1)-[:FREQUENCY]->(t:Task)<-[:FREQUENCY]-(u2)
WITH u1, u2, COUNT(common) as intersection
OPTIONAL MATCH (u1)-[:FREQUENCY]->(t:Task)
WITH u1, u2, intersection, COLLECT(DISTINCT t) AS t1s
OPTIONAL MATCH (u2)-[:FREQUENCY]->(t:Task)
WHERE NOT t IN t1s
WITH u1, u2, intersection, t1s + COLLECT(DISTINCT t) AS union
RETURN u1, u2, intersection / union as js

This is definitely untested and there are probably efficiencies to be found by somehow not repeatedly matching the tasks.

What the query is doing is finding the tasks that the two users have in common and storing the number of common tasks in the variable intersection . It then uses individually matches (optionally) each user's tasks and uses these to calculate the union ( COLLECT will create a zero length array where there are zero matches). There could be a divide by zero issue to work around in the final return statement.

How frequency should affect the result is hard to say, I wonder if you would be better served by swapping :Frequency with :Completed and creating a new relationship for every task completed (ie 6 relationships between 'John' and 'Act A'). This would be great for supporting the intersection but would still have interesting connotations for the Union .

The link solved my problem. Just had to take every links into consideration.

http://neo4j.com/docs/stable/cypher-cookbook-similarity-calc.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM