简体   繁体   中英

Neo4j cypher query perfomance

I have the following cypher queries and their execution plans respectively,

Before optimization ,

match (o:Order {statusId:74}) <- [:HAS_ORDERS] - (m:Member)
with m,o 
match (m:Member) - [:HAS_WALLET] -> (w:Wallet) where w.currentBalance < 250 
return m as Members,collect(o) as Orders,w as Wallets order by m.createdAt desc limit 10

在此处输入图像描述


After optimization (db hits reduced by 40-50%) ,

match (m:Member) - [:HAS_ORDERS]->(o:Order {statusId:74})
with m, collect(o) as Orders
match (m) - [:HAS_WALLET] - (w:Wallet) where w.currentBalance < 250
return m as Members, Orders, w as Wallets 
order by m.createdAt desc limit 10

在此处输入图像描述

There are 3 types of nodes, Member, Order and Wallet. And the relation between them goes like this,

  • Member - [:HAS_ORDERS] -> Order ,
  • Member - [:HAS_WALLET] -> Wallet

I have around 100k Member nodes (100k wallet) and almost 570k orders for those members. I want to fetch all the members who have order status 74 and wallet balance less than 250, and the above query gives the desired result but it takes an average 1.5 sec to respond.

I suspect there is a still scope of optimization here but I'm not be able to figure out. I've added indexing on fields upon which I'm filtering the data.

I've just started exploring neo4j and not sure how can I optimize this.

We can leverage index-backed ordering to try a different approach here. By providing a type hint (something to indicate the property value is a string) along with the ordering by the indexed property, we can have the planner use the index to check:Member nodes in the order you want (by m.createdAt DESC ) for free (meaning we don't need to check every:Member node and order them), and check each of those in the given order to find the ones that meet the desired criteria until we get the 10 you need.

From some back-and-forth on the Neo4j users slack, you mentioned that of your 100k:Member nodes, about 52k of them fit the criteria you're looking for, so this is a good indicator that we may not have to look very far down the ordered:Member nodes before finding the 10 that meet the criteria.

Here's the query:

MATCH (m:Member)
WHERE m.createdAt > ''  // type hint
WITH m
ORDER BY m.createdAt DESC
MATCH (m)-[:HAS_WALLET]->(w) 
WHERE w.currentBalance < 250 AND EXISTS {
    MATCH (m)-[:HAS_ORDERS]->(:Order {statusId:74})  
} 
WITH m, w
LIMIT 10
RETURN m as member, w as wallet, [(m)-[:HAS_ORDERS]->(o:Order {statusId:74}) | o] as orders

Note that by using an existential subquery, we just have to find one order that satisfies the condition. We wait until after the limit of 10 members is reached before using a pattern comprehension to grab all the orders for the 10 members.

Have you tried subqueries? If you can use a subquery to shrink down the number of nodes before passing it along to subsequent queries. (It would seem that an omniscient Query Planner could do this, but Cypher isn't there yet.). You may have to experiment with which subquery would filter out the most Nodes.

An example of using a subquery is here: https://community.neo4j.com/t/slow-query-with-very-limited-data-and-boolean-false/31555

Another one is here: https://community.neo4j.com/t/why-is-this-geospatial-search-so-slow/31952/24

(Of course, I assume you already have the appropriate properties indexed.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM