简体   繁体   中英

Neo4j Performance - IN Operator Cypher Query

If I had a million users and if I search them using IN Operator with more than 1000 custom ids which are unique indexed.

For example,in movie database given by neo4j Let's say I need to get all movies where my list of actors ( > 1000) should acted in that movie and ordered by movie released date and distinct movie results.

Is that really good to have that operation on database and what are the time complexities if I execute that in single node instance and ha cluster.

This will give you a rough guide on the computational complexity involved in your calculation.

For each of your Actors Neo will look for all the Acted_In relationships going from that node. Lets assume that the average number of Acted_In relationships is 4 per Actor.

Therefore Neo will require 4 traversals per Actor. Therefore for 1000 Actors that will be 4000 traversals. Which for Neo is not a lot (they claim to do about 1 million a second, but of course this depends upon hardware)

Then, the Distinct aspect of the query is trivial for Neo as it knows which Nodes it has visited, so Neo would automatically have the unique list of Movie nodes, so this would be very quick.

If the Release date of the movie is indexed in Neo the ordering of the results would also be very quick.

So theoretically this query should run quickly (well under a second) and have minimal impact on the database

Here is what I'd do, I would start traversing from the actor with the lowest degree, ie the highest selectivity of your dataset. Then find the movies he acted in and check those movies against the rest of the actors.

The second option might be more efficient implementation wise. (There is also another trick that can speed up that one even more, let me know via email when you have the dataset to test it on).

MATCH (n:Actor) WHERE n.id IN {ids}
WITH n, SIZE( (n)-[:ACTED_IN]->() ) as degree
ORDER BY degree ASC 
WITH collect(n) as actors WITH head(actors) as first, tail(actors) as rest, size(actors)-1 as number
// either
MATCH (n)-[:ACTED_IN]->(m)
WHERE size( (m)<-[:ACTED_IN]->() ) > number AND ALL(a in rest WHERE (a)-[:ACTED_IN]->(m))
RETURN m;

// or
MATCH (n)-[:ACTED_IN]->(m)
WHERE size( (m)<-[:ACTED_IN]->() ) > number
MATCH (m)<-[:ACTED_IN]-(a)
WHERE a IN rest
WITH m,count(*) as c, number
WHERE c = number
RETURN m;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM