简体   繁体   中英

Optimizing Cypher Query - Neo4j

I have the following query

MATCH (User1 )-[:VIEWED]->(page)<-[:VIEWED]- (User2 )

RETURN User1.userId,User2.userId, count(page) as cnt

Its a relatively simple query to find co-page view counts between users. Its just too slow, and I have to terminate it after some time.

Details

User consists of about 150k Nodes Page consists of about 180k Nodes

User -VIEWS-> Page has about 380k Relationships

User has 7 attributes, and Page has about 5 attributes.

Both User and Page are indexed on UserId and PageId respectively.

Heap Size is 512mb (tried to run on 1g too)

What would be some of the ways to optimize this query as I think the count of the nodes and relationships are not a lot.

Use Labels

Always use Node labels in your patterns.

MATCH (u1:User)-[:VIEWED]->(p:Page)<-[:VIEWED]-(u2:User)
RETURN u1.userId, u2.userId, count(p) AS cnt;

Don't match on duplicate pairs of users

This query will be executed for all pairs of users (that share a viewed page) twice. Each user will be mapped to User1 and then each user will also be mapped to User2 . To limit this:

MATCH (u1:User)-[:VIEWED]->(p:Page)<-[:VIEWED]-(u2:User)
WHERE id(u1) > id(u2)
RETURN u1.userId, u2.userId, count(p) AS cnt;

Query for a specific user

If you can bind either side of the pattern the query will be much faster. Do you need to execute this query for all pairs of users? Would it make sense to execute it relative to a single user only? For example:

MATCH (u1:User {name: "Bob"})-[:VIEWED]->(p:Page)<-[:VIEWED]-(u2:User)
WHERE NOT u1=u2
RETURN u1.userId, u2.userId, count(p) AS cnt;

As you are trying different queries you can prepend EXPLAIN or PROFILE to the Cypher query to see the execution plan and number of data hits. More info here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM