简体   繁体   中英

Neo4j Cypher execution plan when query has WHERE and WITH clause

I have a Neo4j graph database that stores the Staffing Relations and Nodes. I have to write a cypher that will find the home and office address of a resource (or employee) along with their empId and name. This is needed so that Staffing Solution can staff resources according to their home location as well as near to their office.

MATCH (employee:Employee) <-[:ADDRESS_TO_EMPLOYEE]- (homeAddress:HomeAddress) 
WHERE employee.id = '70' 
WITH  employee, homeAddress 
MATCH (employee)-[:EMPLOYEE_TO_OFFICEADDRESS]->(officeAddress:OfficeAddress) 
RETURN employee.empId, employee.name,  
homeAddress.street, homeAddress.area, homeAddress.city,  
officeAddress.street, officeAddress.area, officeAddress.city

This cypher returns the desired results.

However, if I move the WHERE condition in the last, just before the RETURN clause.

MATCH (employee:Employee) <-[:ADDRESS_TO_EMPLOYEE]- (homeAddress:HomeAddress) 
WITH  employee, homeAddress  
MATCH (employee)-[:EMPLOYEE_TO_OFFICEADDRESS]->(officeAddress:OfficeAddress) 
WHERE employee.id = '70' 
RETURN employee.empId, employee.name,  
homeAddress.street, homeAddress.area, homeAddress.city,  
officeAddress.street, officeAddress.area, officeAddress.city 

It again gives me the same result.

So which one is more optimized as the query execution plan is same in both the cases?. I mean same number of DB hits and returned Records.

Now, if I remove the WITH clause,

MATCH (employee:Employee) <-[:ADDRESS_TO_EMPLOYEE]- 
(homeAddress:HomeAddress),
MATCH (employee)-[:EMPLOYEE_TO_OFFICEADDRESS]->(officeAddress:OfficeAddress) 
WHERE employee.id = '70' 
RETURN employee.empId, employee.name, 
homeAddress.street, homeAddress.area, homeAddress.city, 
officeAddress.street, officeAddress.area, officeAddress.city

Then again the results is same, execution plan is also same.

Do I really need WITH in this case?

Any help would be greatly appreciated.

First, you can use Profile and Explain to get the performance of your query. Though, as long as you get the results you want in the time you want, the cypher doesn't matter too much, as the behavior will change depending on the Cypher Planner (version) running in the db. So as long as the cypher passes unit and load tests, the rest doesn't matter (assuming reasonably accurate tests).

Second, In general, less is more. Imagine you had to read your own cypher, and look up the info yourself on paper printouts. Isn't MATCH (officeAddress:OfficeAddress)<-[:EMPLOYEE_TO_OFFICEADDRESS]-(employee:Employee {id:'70'})<-[:ADDRESS_TO_EMPLOYEE]-(homeAddress:HomeAddress) so much easier to tell what exactly you are looking for? The easier it is for the Cypher planner to read what you want, the more likely the Cypher planner will plan the most efficient lookup strategy. Also, keeping your WHERE clause close to the relevant match also helps the planner. So try to keep your cyphers as simple as possible, while still being accurate for what you want.

In your Cypher, the only part that really matters is the WITH. WITH creates a logical break in the cypher, and a scope change for variables, As you aren't doing anything with the with, it's better to drop it. The only side effect it can produce in this case, is tricking the Cypher to do more work than necessary for the first match, to filter it down later. If an Employee is expected to have more than 1 home address, than WITH employee, COLLECT(homeAddress) as homeAdress will reduce that match to 1 row per employee, making the next match cheaper, but since I'm sure both sides of the match should only yield 1 result, it doesn't matter what the planner does first. (In general, you use with to aggregate results down to less rows, to make the rest of the cypher cheaper. Which shouldn't apply in this context)

  1. You should always put a WHERE clause as early as possible in a query. That will filter out data that the rest of the query will not have to deal with, avoiding possible unneeded work.

  2. You should avoid writing a WITH clause that is just passing forward all the defined variables (and is not required syntactically), since it is essentially a no-op. It wastes (a little bit of) time for the planner to process, and makes the Cypher code a bit harder to understand.

This simpler version of your query should produce the same query plan:

MATCH (officeAddress:OfficeAddress)<-[:EMPLOYEE_TO_OFFICEADDRESS]-(employee:Employee)<-[:ADDRESS_TO_EMPLOYEE]-(homeAddress:HomeAddress) 
WHERE employee.id = '70' 
RETURN
  employee.empId, employee.name,  
  homeAddress.street, homeAddress.area, homeAddress.city,  
  officeAddress.street, officeAddress.area, officeAddress.city

And the following version (using the map projection syntax) is even simpler (with a similar query plan).

MATCH (officeAddress:OfficeAddress)<-[:EMPLOYEE_TO_OFFICEADDRESS]-(employee:Employee)<-[:ADDRESS_TO_EMPLOYEE]-(homeAddress:HomeAddress) 
WHERE employee.id = '70' 
RETURN
  employee{.empId, .name},  
  homeAddress{.street, .area, .city},  
  officeAddress{.street, .area, .city}

The results of the above query have a different structure, though:

╒═══════════════════════════╤══════════════════════════════════════╤══════════════════════════════════════╕
│"employee"                 │"homeAddress"                         │"officeAddress"                       │
╞═══════════════════════════╪══════════════════════════════════════╪══════════════════════════════════════╡
│{"name":"sam","empId":"70"}│{"area":1,"city":"foo","street":"123"}│{"area":2,"city":"bar","street":"345"}│
└───────────────────────────┴──────────────────────────────────────┴──────────────────────────────────────┘

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM