简体   繁体   中英

Neo4j merge nodes by relationship

I have big dataset of persons data and found a lot of duplicates by an algorithm. I marked these duplicates in Neo4j with a relationship. Example: (p:Person)-[:similar]->(d:Person)

For testing purpose I created virtual nodes by combining all nodes marked with the similar-relationship.

CALL algo.unionFind.stream('Person', 'similar', {})
YIELD nodeId, setId
WITH setId AS idd, collect(algo.getNodeById(nodeId)) AS nodis
WHERE size(nodis) > 1
CALL apoc.nodes.collapse(nodis,{properties:'combine'}) YIELD from, rel
RETURN idd, from, rel

Here I found the problem, that only two nodes were compared and stored in the result data. Example:

ID: 5, Peter Smith ID: 4635, Peter Smit

ID: 4635, Peter Smit ID: 765, Peter Smith

ID: 5, Peter Smith ID: 765, Peter Smith

I want to refactor the graph and merge the duplicates (a forrest) into one node. But only one node is merged. How can I merge all forrests, that exist due to the relationship 'similar'?

UPDATE:

I found a semi solution. All similar persons were merged by the following code. All properties were combined as a list. Seems fine to me, except, that the Ids are in a list now, too - but this isn't the topic of the question.

CALL algo.unionFind.stream('Person', 'similar', {})
YIELD nodeId,setId
WITH setId AS idd, collect(algo.getNodeById(nodeId)) AS nodis
CALL apoc.refactor.mergeNodes(nodis, {properties:'combine', mergeRels: true}) YIELD node
RETURN node

How about using constraints unique? I also faced same problems with MERGE.

https://neo4j.com/docs/cypher-manual/current/schema/constraints/

How about using constraints unique?

I also faced same problems with MERGE.

example)

CREATE CONSTRAINT ON ( book:Book) ASSERT book.isbn IS UNIQUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM