I trying to create a simple cypher query
that should find all instances in the graph matching roughly this structure (BlogPost A) -> (Term) <- (BlogPost B)
. This means, I am trying all pairs of blog posts that are flagged with the same term and moreover count the number of terms. A term is a mechanism of categorization in this context.
Here is my query proposal:
MATCH (blogA:content {entitySubType:'blog'})
WITH blogA MATCH (blogA) -[]-> (t:term) <-[]- (blogB:content)
WHERE blogB.entitySubType='blog' AND NOT (ID(blogA) = ID(blogB))
RETURN ID(blogA), ID(blogB), count(t) ;
This query ends with null after ~1 day.
Is the uasge of blogA in the subquery not possible in the way I am using it? When using the same query with limits I do get reuslts:
MATCH (blogA:content {entitySubType:'blog'})
WITH blogA
LIMIT 10
MATCH (blogA) -[]-> (t:term) <-[]- (blogB:content)
WHERE blogB.entitySubType='blog' AND NOT (ID(blogA) = ID(blogB))
RETURN ID(blogA), ID(blogB), count(t)
LIMIT 20;
My Neo4j Instance has ~500GB RAM and the whole graph inclduing all properties is ~30 GB with ~15 million vertices in total, whereas there are 101k blog vertices and 108k terms.
I would be grateful for every hint about possible problems or suggestions for improvements.
Also make sure to consume that query with a client driver (eg Java) that can stream the billions of results. Here is a query that would use the compiled runtime which should be fastest and most memory efficient.
MATCH (blogA:Blog)-[:TAGGED]->(t:Term)<-[:TAGGED]-(blogB:Blog)
WHERE blogA <> blogB
RETURN ID(blogA), ID(blogB), count(t);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.