简体   繁体   中英

Pipeline result from one query to another in cypher Neo4j

The following query will return a list of researchers with the corresponding list of papers they have written. Next to each paper, the number of citations each paper got by other papers.

MATCH (p:Paper) - [c:CITED_BY] -> (p2:Paper)
MATCH (p) - [w:WRITTEN_BY] -> (a:Author)
WITH a, p, count(c) as numCitations
ORDER BY a.authorName
RETURN a.authorName, p.paperTitle, numCitations

The following query is intended to return the same list of authors, but this time I want to get to know out of the citations that his/her papers received in total, which was the lowest number of citations. See that here I need the parameter numCitations that comes from the previous query (I am going to scan the minimum number in a column!)

MATCH (p:Paper) - [c:CITED_BY] -> (p2:Paper)
MATCH (p) - [w:WRITTEN_BY] -> (a:Author)
WITH a, count(c) as numCit
ORDER BY a.authorName
RETURN a.authorName, min(numCitations)

Is something like this:

Query 1
Author    Paper     numCitations
Alan      A         8
Alan      B         6
Alan      C         4
Alan      D         2 (this is the minimum for Alan's papers)

Query 2
Author   min(numCitations)
Alan     2 (I do not know how to get this number in Neo4j)

In the end, I want to compute the h-index of each author (but I need this input first). Thanks!!!

Looks like you're close, you just have to make sure you're using the same variable as earlier ( numCit ), and make sure your calculation for numCit is with respect to citations per paper per author, so you need to include p in your WITH clause, since aggregations are grouped with respect to the non-aggregation variables.

It will also be more efficient to use the size() of the :CITED_BY relationships rather than placing them in your pattern, as this uses a more efficient degree calculation (nodes know the number of relationships by type/direction), however you can ONLY do this if only :Paper nodes can cite each other (if there are other types of nodes that can cite papers then you can't do this optimization). This also ensures you take into consideration papers which don't have any citations.

Your query would look something like this:

MATCH (p:Paper)-[:WRITTEN_BY]->(a:Author)
WITH a, p, size((p)-[:CITED_BY]->()) as numCit
WITH a, min(numCit) as minCitations
RETURN a.authorName as authorName, minCitations
ORDER BY a.authorName

EDIT

For returning both the minimum number of citations for the author along with a row per paper and number of citations, you'll need to collect the paper along with its number of citations at the same time as you calculate the minimum (so the a :Author variable is the only non-aggregation variable in scope). Then you can UNWIND the collection and project out:

MATCH (p:Paper)-[:WRITTEN_BY]->(a:Author)
WITH a, p, size((p)-[:CITED_BY]->()) as numCit
WITH a, min(numCit) as minCitations, collect(p {.title, numCit}) as papers
UNWIND papers as paper
RETURN a.authorName as authorName, minCitations, paper.title as title, paper.numCit as numCit
ORDER BY authorName

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM