简体   繁体   中英

Neo4j cypher query filtering on relationship count

I have a neo4j database schema that looks like:

(a:Author)<-[r:HAS_AUTHOR]-(n:Article)-[rel:HAS_DESCRIPTOR]->(d:Descriptor)

I'd like to do a query showing the link between authors and descriptors, filtered for authors that have published more than once (count(r)>1) and for descriptors that occurred in more than one article (count(rel)>1)

Here is the query that I wrote:

MATCH (a:Author)<-[r:HAS_AUTHOR]-(n:Article)-[rel:HAS_DESCRIPTOR]->(d:Descriptor)
WITH a,count(r) as cnt WHERE cnt>1
MATCH (a:Author)<-[r:HAS_AUTHOR]-(n:Article)-[rel:HAS_DESCRIPTOR]->(d:Descriptor)
WITH d,count(rel) as cnt1 WHERE cnt1>1
MATCH (a:Author)<-[r:HAS_AUTHOR]-(n:Article)-[rel:HAS_DESCRIPTOR]->(d:Descriptor)
RETURN * limit 100

It doesn't seem to do what I'm expecting. I'm still seeing Authors or Descriptors linked to a single article.

Note that the count of relationships should be considered only in the context of the query (ie.: with limit 100, all authors should be linked to more than one article in the query output graph).

Is that the right way to write this query? Thanks

EDIT

I apologize for not being clear enough.

If I run a simple query showing all author--article--descriptor graphs, I can have some of the scenario in images below.

In all images, yellow nodes are articles, green are authors and pink are descriptors.

Scenario 1: An article that is the only one mentioning the descriptor. I'd like to filter out those descriptors that are mentioned in only one article.

在此处输入图片说明

Scenario 2: A descriptor mentioned by more than one article but whose authors have not published any other articles. I'd like to filter out those authors that have published only one article

在此处输入图片说明

These two filters should apply at the sub-graph level. For example: if I filter down to a particular descriptor type, then the two conditions (author and descriptor with more than one article) should be fulfilled in this new sub-graph.

The first query that was proposed generate graphs as in the image below:

MATCH (a:Author)
WHERE size((a)<-[:HAS_AUTHOR]-()) > 1
MATCH (a)<-[:HAS_AUTHOR]-(n:Article)-[:HAS_DESCRIPTOR]->(d:Descriptor)
WITH a, d, collect(n) as articles
WHERE size(articles) > 1
RETURN a, d, articles

The collect(n) as articles for a,d pair forces the author to have published twice on the same descriptor which is not desirable. I'd like to allow for an author who has published papers on 2 different descriptors to appear. 在此处输入图片说明

The second query that was proposed generate graphs as in the image below:

MATCH (d:Descriptor)
WHERE size((d)<-[:HAS_DESCRIPTOR]-()) > 1
WITH collect(d) as descriptors
MATCH (a:Author)
WHERE size((a)<-[:HAS_AUTHOR]-()) > 1
MATCH (a)<-[:HAS_AUTHOR]-(n:Article)-[:HAS_DESCRIPTOR]->(d)
WHERE d in descriptors
RETURN a, n, d

Note that I added a filter on descriptor type so that the query could run and I'm not sure if that would impact the filtering condition. Here it shows descriptors and author linked to a single article. 在此处输入图片说明

The first optimization is for filtering for :Authors that have published more than once. All this requires is a degree check on :HAS_AUTHOR relationships from the author, something that can be done cheaply since a node knows the types and counts of relationships attached to it. You can use the size() function on the pattern to get this: WHERE size((author)<-[:HAS_AUTHOR]-()) > 1 .

Next to get the patterns involving descriptors that occur in more than one article, we need to do aggregation of the articles by author and descriptor, keeping only rows where there are more than one article.

Try this out:

MATCH (a:Author)
WHERE size((a)<-[:HAS_AUTHOR]-()) > 1
MATCH (a)<-[:HAS_AUTHOR]-(n:Article)-[:HAS_DESCRIPTOR]->(d:Descriptor)
WITH a, d, collect(n) as articles
WHERE size(articles) > 1
RETURN a, d, articles

This returns rows featuring the author, the descriptor, and the collection of articles ( > 1) by the article with the given descriptor.

EDIT

Looks like you want to filter for :Descriptors that have been mentioned more than once total, regardless of author, and not per the subgraph we're forming in the query.

In that case, it may be best to pre-match to these and filter, then collect, and use that collection for some set operations as we expand out the subgraph.

MATCH (d:Descriptor)
WHERE size((d)<-[:HAS_DESCRIPTION]-()) > 1
WITH collect(d) as descriptors
MATCH (a:Author)
WHERE size((a)<-[:HAS_AUTHOR]-()) > 1
MATCH (a)<-[:HAS_AUTHOR]-(n:Article)-[:HAS_DESCRIPTOR]->(d)
WHERE d in descriptors
RETURN a, n, d

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM