Neo4J node traversal cypher where clause for each node

Question

I've been playing with neo4j for a geneology site and it's worked great!

I've run into a snag where finding the starting node isn't as easy. Looking through the docs and the posts online I haven't seen anything that hints at this so maybe it isn't possible.

What I would like to do is pass in a list of genders and from that list follow a specific path through the nodes to get a single node.

in context of the family:

I want to get my mother's father's mother's mother. so I have my id so I would start there and traverse four nodes from mine.

so pseudo query would be

select person (follow childof relationship)
where starting node is me
where firstNode.gender == female 
 AND secondNode.gender == male 
 AND thirdNode.gender == female 
 AND fourthNode.gender == female

Answer 1

Focusing on the general solution:

MATCH p = (me:Person)-[:IS_CHILD_OF*]->(ancestor:Person)
WHERE me.uuid = {uuid}
  AND length(p) = size({genders})
  AND extract(x in tail(nodes(p)) | x.gender) = {genders}
RETURN ancestor

here's how it works:

match the starting node by id
match all the variable-length paths going to any ancestor
constrain the length of the path (ie the number of relationships, which is the same as the number of ancestors), as you can't parameterize the length in the query
extract the genders in the path
1. nodes(p) returns all the nodes in the path, including the starting node
2. tail(nodes(p)) skips the first element of the list, ie the starting node, so now we only have the ancestors
3. extract() extracts the genders of all the ancestor nodes, ie it transforms the list of ancestor nodes into their genders
4. the extracted list of genders can be compared to the parameter
if the path matched, we can return the bound ancestor, which is the end of the path

However, I don't think it will be faster than the explicit solution, though the performance could remain comparable. On my small test data (just 5 nodes), the general solution does 26 DB accesses whereas the specific solution only does 22, as reported by PROFILE . Further profiling would be needed on a larger database to compare the performances:

PROFILE MATCH p = (me:Person)-[:IS_CHILD_OF*]->(ancestor:Person)
WHERE me.uuid = {uuid}
  AND length(p) = size({genders})
  AND extract(x in tail(nodes(p)) | x.gender) = {genders}
RETURN ancestor

The general solution has the advantage of being a single query which won't need to be parsed again by the Cypher engine, whereas each generated query will need to be parsed.

Answer 2

It was more simple than I thought. Maybe there is still a better way so I'll leave this open for a bit.

the query would be

MATCH (n1:Person { Id: 'f59c40de-506d-4829-a765-7a3ae94af8d1' })
<-[:CHILDOF]-(n2 { Gender:'0'})
<-[:CHILDOF]-(n3 { Gender:'1'})
<-[:CHILDOF]-(n4 { Gender:'1'})
RETURN n4

and for each generation back would add a new row.

Answer 3

The equivalent query would look something like this:

MATCH (me:Person)
WHERE me.ID = ?
WITH me
MATCH (me)-[r:childof*4]->(ancestor:Person)
WITH ancestor, EXTRACT(rel IN r | endNode(rel).gender) AS genders
WHERE genders = ?
RETURN ancestor

Disclaimer, I haven't double-checked the syntax.

In Neo4j you typically find your start node first, typically by an ID of some sort (modify as required to match on a unique property). We then traverse a number of relationships to an ancestor, extract the gender property of all end nodes in the traversed relationships, and compare the genders to the expected list of genders (you'll need to make sure the argument is a bracketed list in the desired order).

Note that this approach filters down all possible results with that degree of childof relationship as opposed to walking your graph, so higher degrees of relationship (the higher the degree of ancestry you're querying), the slower the call will get.

I'm also unsure if you can parameterize the degree of the variable relationship, so that might prevent this from being a generalized solution for any degree of ancestry.

Answer 4

I'm not sure if you want a generic query which can work whatever the collection of genders you pass, or a specific solution.

Here's the specific solution: you match the path with the wanted length, and match each gender, as you've already noted in your own answer.

MATCH (me:Person)-[:IS_CHILD_OF]->(p1:Person)
      -[:IS_CHILD_OF]->(p2:Person)
      -[:IS_CHILD_OF]->(p3:Person)
      -[:IS_CHILD_OF]->(p4:Person)
WHERE me.uuid = {uuid}
  AND p1.gender = {genders}[0]
  AND p2.gender = {genders}[1]
  AND p3.gender = {genders}[2]
  AND p4.gender = {genders}[3]
RETURN p4

Now, if you want to pass in a list of genders of an arbitrary length, it's actually possible. You match a variable-length path, make sure it has the right length (matching the number of genders), then match each gender in sequence.

MATCH p = (me:Person)-[:IS_CHILD_OF*]->(ancestor:Person)
WHERE me.uuid = {uuid}
  AND length(p) = size({genders})
  AND all(i IN range(0, size({genders}) - 1)
          WHERE {genders}[i] = extract(x in tail(nodes(p)) | x.gender)[i])
RETURN ancestor

Building on @InverseFalcon's answer, you can actually compare collections, which simplifies the query:

MATCH p = (me:Person)-[:IS_CHILD_OF*]->(ancestor:Person)
WHERE me.uuid = {uuid}
  AND length(p) = size({genders})
  AND extract(x in tail(nodes(p)) | x.gender) = {genders}
RETURN ancestor

Neo4J node traversal cypher where clause for each node

Question

4 answers

solution1
2 ACCPTED 2016-07-25 21:52:18

solution2
1 2016-07-25 20:17:03

solution3
1 2016-07-25 20:26:21

solution4
1 2016-07-25 20:31:24

Neo4J node traversal cypher where clause for each node

Question

4 answers

solution1 2 ACCPTED 2016-07-25 21:52:18

solution2 1 2016-07-25 20:17:03

solution3 1 2016-07-25 20:26:21

solution4 1 2016-07-25 20:31:24

solution1
2 ACCPTED 2016-07-25 21:52:18

solution2
1 2016-07-25 20:17:03

solution3
1 2016-07-25 20:26:21

solution4
1 2016-07-25 20:31:24