简体   繁体   中英

Filtering out nodes based on outgoing relationship in Cypher query (Similar to SQL outer join)

I have a simple database with three types of nodes (t: transcripts , f: protein families and g: genes . There are two types of relationships, PFAM_MRNA (t)-[r]->(f) and Parent (t)-[p]->(g).

    (g:Gene{Name:'g1'})<-[p:Parent]-(t:transcript{Name:'t1'})
    (g:Gene{Name:'g1'})<-[p:Parent]-(t:transcript{Name:'t2'})
    (g:Gene{Name:'g2'})<-[p:Parent]-(t:transcript{Name:'t3'})
    (g:Gene{Name:'g3'})<-[p:Parent]-(t:transcript{Name:'t4'})
    (g:Gene{Name:'g4'})<-[p:Parent]-(t:transcript{Name:'t5'})

    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t1'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t2'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t3'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t4'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t5'})

    (f:PFAM{ID:'PF1040'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t4'})
    (f:PFAM{ID:'PF1040'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t5'})

Next, I am trying to get the transcripts (and their Parent genes) connected to PF0752 but get rid of the transcripts (and their Parent genes) that are also connected to PF1040.

So, my CYPHER query looks like

    MATCH (f)<-[rel:PFAM_MRNA]-(t)-[p:Parent]->(g) 
    WHERE f.ID IN ['PF0752'] 
    AND NOT f.ID IN ['PF1040'] 
    RETURN *

However, I got a graph like

    (g:Gene{Name:'g1'})<-[p:Parent]-(t:transcript{Name:'t1'})
    (g:Gene{Name:'g1'})<-[p:Parent]-(t:transcript{Name:'t2'})
    (g:Gene{Name:'g2'})<-[p:Parent]-(t:transcript{Name:'t3'})
    (g:Gene{Name:'g3'})<-[p:Parent]-(t:transcript{Name:'t4'})
    (g:Gene{Name:'g4'})<-[p:Parent]-(t:transcript{Name:'t5'})

    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t1'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t2'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t3'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t4'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t5'})

Instead of

    (g:Gene{Name:'g1'})<-[p:Parent]-(t:transcript{Name:'t1'})
    (g:Gene{Name:'g1'})<-[p:Parent]-(t:transcript{Name:'t2'})
    (g:Gene{Name:'g2'})<-[p:Parent]-(t:transcript{Name:'t3'})

    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t1'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t2'})
    (f:PFAM{ID:'PF0752'})<-[r:PFAM_MRNA]-(t:transcript{Name:'t3'})

Any hint/idea of how to make it works is really appreciated.

Thanks,

You can add a WHERE NOT clause on a pattern from t to the PF1040 protein:

MATCH (f:PFAM {ID: 'PF0752'}), (pf:PFAM {ID:'PF1040'})
MATCH (f)<-[rel:PFAM_MRNA]-(t)-[p:Parent]->(g) 
WHERE NOT (pf)<-[:PFAM_MRNA]-(t)
RETURN *

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM