How to optimise neo4j cypher query for high size graph?

Question

I wrote this query to find possible paths between two nodes. However, when I try to use more than 3 steps, it can't finish the job. Graph that I used contains more than 4mil nodes with 49mil relations.

match (src:T047 {CUI:"C0030920"}), 
      (trg:T059 {CUI:"C1294944"}),
      p = (src)-[*..3]-(trg)
where 
      all(relI in relationships(p) 
      where type(relI) in ["RO","CHD","PAR","RB","RL","RO","SIB","RU","SY"])
and
      all(nodeI in nodes(p)
      where labels(nodeI) in ["T004", "T005", "T007", "T016", "T017", "T018", "T019", "T020",
            "T021", "T022", "T023", "T024", "T025", "T026", "T028", "T029", "T030", "T031", "T032", 
            "T033", "T034", "T037", "T038", "T039", "T040", "T041", "T042", "T043", "T045", "T046",
            "T047", "T048", "T049", "T053", "T054", "T055", "T056", "T057", "T059", "T060", "T061", 
            "T074", "T080", "T081", "T098", "T099", "T100", "T101", "T103", "T109", "T114", "T116", 
            "T121", "T123", "T125", "T126", "T127", "T129", "T131", "T168", "T184", "T190", "T191", 
            "T195", "T196", "T197", "T200", "T201"])
return p

Here is plan for this query: https://imgur.com/PpWePOz

Is there any possible ways to optimise this query or at least find estimation time?

Answer 1

First, your query plan shows you aren't using indexes, so it's using a NodeByLabelScan for:T059 nodes and running a filter on all of them to find those with the property in question. The src node also isn't using an index lookup, instead the results of the variable-length expand are filtered for the label and property.

You will need indexes on these to help improve performance. Indexes on :T047(CUI) and :T059(CUI) are the ones you need here. Make sure you have this first.

Also, to force index lookup (as opposed to a var-length-expand and filter, which would be more expensive) you can provide index hints to the planner.

We can also adjust the list predicate for labels on nodes in the path such that they will be filtered during expansion instead of afterward.

WITH ["T004", "T005", "T007", "T016", "T017", "T018", "T019", "T020", "T021", "T022", "T023", "T024", "T025", "T026", "T028", "T029", "T030", "T031", "T032", "T033", "T034", "T037", "T038", "T039", "T040", "T041", "T042", "T043", "T045", "T046", "T047", "T048", "T049", "T053", "T054", "T055", "T056", "T057", "T059", "T060", "T061", "T074", "T080", "T081", "T098", "T099", "T100", "T101", "T103", "T109", "T114", "T116", "T121", "T123", "T125", "T126", "T127", "T129", "T131", "T168", "T184", "T190", "T191", "T195", "T196", "T197", "T200", "T201"] as allowedLabels
MATCH (src:T047 {CUI:"C0030920"}), 
      (trg:T059 {CUI:"C1294944"})
USING INDEX src:T047(CUI)
USING INDEX trg:T059(CUI)
MATCH p = (src)-[*..3]-(trg)
WHERE 
      all(relI in relationships(p) WHERE type(relI) in ["RO","CHD","PAR","RB","RL","RO","SIB","RU","SY"])
    AND all(node IN nodes(p) WHERE labels(node)[0] IN allowedLabels)
RETURN p

This also assumes that all the nodes here only have one possible label, and not multiple. If they can have multiple labels, then we may need to restructure the query.

How to optimise neo4j cypher query for high size graph?

Question

1 answers

solution1
0 ACCPTED 2021-04-17 20:03:23

How to optimise neo4j cypher query for high size graph?

Question

1 answers

solution1 0 ACCPTED 2021-04-17 20:03:23

solution1
0 ACCPTED 2021-04-17 20:03:23