I have big dataset of persons data and found a lot of duplicates by an algorithm. I marked these duplicates in Neo4j with a relationship. Example: (p:Person)-[:similar]->(d:Person)
For testing purpose I created virtual nodes by combining all nodes marked with the similar-relationship.
CALL algo.unionFind.stream('Person', 'similar', {})
YIELD nodeId, setId
WITH setId AS idd, collect(algo.getNodeById(nodeId)) AS nodis
WHERE size(nodis) > 1
CALL apoc.nodes.collapse(nodis,{properties:'combine'}) YIELD from, rel
RETURN idd, from, rel
Here I found the problem, that only two nodes were compared and stored in the result data. Example:
ID: 5, Peter Smith ID: 4635, Peter Smit
ID: 4635, Peter Smit ID: 765, Peter Smith
ID: 5, Peter Smith ID: 765, Peter Smith
I want to refactor the graph and merge the duplicates (a forrest) into one node. But only one node is merged. How can I merge all forrests, that exist due to the relationship 'similar'?
I found a semi solution. All similar persons were merged by the following code. All properties were combined as a list. Seems fine to me, except, that the Ids are in a list now, too - but this isn't the topic of the question.
CALL algo.unionFind.stream('Person', 'similar', {})
YIELD nodeId,setId
WITH setId AS idd, collect(algo.getNodeById(nodeId)) AS nodis
CALL apoc.refactor.mergeNodes(nodis, {properties:'combine', mergeRels: true}) YIELD node
RETURN node
How about using constraints unique? I also faced same problems with MERGE.
https://neo4j.com/docs/cypher-manual/current/schema/constraints/
How about using constraints unique?
I also faced same problems with MERGE.
example)
CREATE CONSTRAINT ON ( book:Book) ASSERT book.isbn IS UNIQUE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.