简体   繁体   中英

Create atmost one relationship for newly created node with existing node based on some property

My application receive stream of data which I need to persist in graph DB. With this data , I am first creating nodes in neo4j db in batches (of 1000) and just after that I am trying to find out matching node in existing data to link it.

MATCH(new:EVENT) where new.uniqueId in [NEWLY CREATED NODES UNIQUE ID]
MATCH (existing:EVENT)  where new.myprop = existing.myprop and new.uniqueId <> exising.uniqueID
CREATE (new)-[:LINKED]-(existing)

My problem is, if for a node there are more than one matching existing node than i want to create relationship with just one existing node. My current above query will create relationship with all matching nodes.

is there any efficient way of doing it as number of existing node could be huge ie approx 300M.

Node: I have index created on myprop and uniqueId field

As @InverseFalcon's answer states, you can use aggregation to collect the existing nodes for each distinct new , and take the first in each collection.

For better performance, you should always PROFILE a query to see what can be improved. For example, after doing that with some sample data on my neo4j installation, I saw that: the index was not automatically being used when finding new , and the new.uniqueId <> exising.uniqueId test was causing DB hits. This query fixes both issues, and should have better performance:

MATCH(new:EVENT)
USING INDEX new:EVENT(uniqueId)
WHERE new.uniqueId in [NEWLY CREATED NODES UNIQUE ID]
MATCH (existing:EVENT)
WHERE new.myprop = existing.myprop AND new <> existing
WITH new, COLLECT(existing)[0] AS e
CREATE (new)-[:LINKED]->(e);

It uses USING INDEX to provide a hint to use the index. Also, since uniqueId is supposed to be unique, it just compares the new and existing nodes directly to see if they are the same node.

To ensure that the uniqueness is actually enforced by neo4j, you should create a uniqueness constraint :

CREATE CONSTRAINT ON (e:EVENT) ASSERT e.uniqueId IS UNIQUE;

You can collect the existing node matches per new node and just grab the first:

MATCH(new:EVENT) where new.uniqueId in [NEWLY CREATED NODES UNIQUE ID]
MATCH (existing:EVENT)  where new.myprop = existing.myprop and new.uniqueId <> exising.uniqueID
WITH new, head(collect(existing)) as existing
CREATE (new)-[:LINKED]-(existing)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM