How to fetch nodes and all their children in Spring Data Neo4j

Question

I have this Neo4j Node class:

@Node
@Data
@AllArgsConstructor
public class Person {

    @Id
    @GeneratedValue
    private Long id;
    
    private Long parentId;
        
    @Relationship(type = "PARENT_OF", direction = Relationship.Direction.OUTGOING)
    private List<Person> children;
    
    public Person addChild(Person person) {
        person.setParentId(this.id);
        this.children.add(person);
        return this;
    }
    
}

I would like to build a query to use with the Spring Data @Query annotation, in order to fetch a list of genealogical trees, where the roots have null parentId. For each root I also would like to fetch their children, for each child their own children, etc.

The best I could come up with, so far, is the following:

public interface PersonRepository extends Neo4jRepository<Person, Long> {
    
    @Query("""
        MATCH (person:Person) 
        WHERE person.parentId IS NULL
        OPTIONAL MATCH (person)-[parentOf:PARENT_OF]->(children) 
        RETURN person, collect(parentOf), collect(children) 
        SKIP 0 
        LIMIT 10
    """)
    List<Person> findAllGenealogicalTrees();
}

but it doesn't seem to do what I'm looking for, as it seems to only fetch the children of the roots, but not the children of the children.

Is something wrong with my query?

EDIT:

Tried the suggested query:

MATCH path=(person)-[parentOf:PARENT_OF*]->(child)
WHERE person.parentId IS NULL
      AND NOT (child)-[:PARENT_OF]->()
RETURN path

but the resulting list seems to be the following:

Person(id=0, parentId=null, children=[Person(id=1, parentId=0, children=[Person(id=3, parentId=1, children=[])])])
Person(id=1, parentId=0, children=[Person(id=3, parentId=1, children=[])])
Person(id=3, parentId=1, children=[])

I was expecting the first record only, since parentId should be null. How come is it returning two other records that have a not null parentId?

Answer 1

To get all the paths, you can use a variable-length pattern

MATCH path=(person)-[parentOf:PARENT_OF*]->(child)
WHERE person.parentId IS NULL
      AND NOT (child)-[:PARENT_OF]->()
RETURN path

Answer 2

I think we can agree that the first query by its nature does only one hop because you explicitly say with (person)-[parentOf:PARENT_OF]->(children) that you only want to find the direct children.

The suggestion @Graphileon gave goes into the right direction but from its return part only provides an unordered set of nodes and relationships. Spring Data Neo4j can only assume that all Persons have the same importance and thus returns a collection of all Persons.

What I would suggest is to stay with the path-based approach but modifying the return statement in a way that Spring Data Neo4j and you agree on ;)

MATCH path=(person)-[:PARENT_OF*]->(child:Person)
WHERE person.parentId IS NULL
RETURN person, collect(nodes(path)), collect(relationships(path))

Reference: https://docs.spring.io/spring-data/neo4j/docs/current/reference/html/#custom-queries.for-relationships.long-paths

Another approach could also be that you are using the so-called derived finder methods in your repository:

List<Person> findAllByParentIdIsNull();

Or if you want to have it pageable (don't forget some ordering because the data could get returned randomly otherwise):

Page<Person> findAllByParentIdIsNull(Pageable pageable);

This creates the internal query generator which will do an explorative search (non-path based queries) over the data with multiple cascading queries.

There are (in general) a few things to keep in mind when making the decision:

The path-based approach could really ramp up the memory usage in the database if you have a lot of hops and branches which leads to relative slow response times. I assume that won't be a problem for your domain above but this is always something I would keep an eye on.
In both cases (path-based or cascading queries) you will end up with three buckets of data: The root node, all relationships and all related nodes. The mapping will take some time because Spring Data Neo4j would have to match every returned relationship with the right related node for each relationship it wants to map. There is nothing wrong with this but the result of having a cyclic mapped domain.

How to fetch nodes and all their children in Spring Data Neo4j

Question

2 answers

solution1
1 2021-07-22 11:59:09

solution2
1 ACCPTED 2021-07-23 06:40:49

How to fetch nodes and all their children in Spring Data Neo4j

Question

2 answers

solution1 1 2021-07-22 11:59:09

solution2 1 ACCPTED 2021-07-23 06:40:49

solution1
1 2021-07-22 11:59:09

solution2
1 ACCPTED 2021-07-23 06:40:49