[英]Neo4j cypher query efficiency and syntax
I am attempting to query an ontology of health represented as an acyclic, directed graph in Neo4j v2.1.5. 我正在尝试查询Neo4j v2.1.5中以无环有向图表示的健康本体。 The database consists of 2 million nodes and 5 million edges/relationships.
该数据库由200万个节点和500万个边缘/关系组成。 The following query identifies all nodes subsumed by a disease concept and caused by a particular bacteria or any of the bacteria subtypes as follows:
以下查询标识由疾病概念归入并由特定细菌或任何细菌亚型引起的所有节点,如下所示:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b.sctid, b.FSN
This query runs in < 1 second and returns the correct answers. 该查询在不到1秒的时间内运行,并返回正确的答案。 However, adding one additional parameter adds substantial time (20 minutes).
但是,添加一个附加参数会增加大量时间(20分钟)。 Example:
例:
MATCH p = (a:ObjectConcept{disease}) <-[:ISA*]- (b:ObjectConcept),
q=(c:ObjectConcept{bacteria})<-[:ISA*]-(d:ObjectConcept),
t=(e:ObjectConcept{bacteria})<-[:ISA*]-(f:ObjectConcept),
WHERE NOT (b)-->()--(c)
AND NOT (b)-->()-->(d)
AND NOT (b)-->()-->(e)
AND NOT (b)-->()-->(f)
RETURN distinct b.sctid, b.FSN
I am new to cypher coding, but I have to imagine there is a better way to write this query to be more efficient. 我是密码编码的新手,但我不得不想像有一种更好的方法可以编写此查询以提高效率。 How would Collections improve this?
收藏将如何改善这一点?
Thanks 谢谢
I already answered that on the google group: 我已经在Google网上论坛上回答了:
Hi Scott, 嗨,斯科特,
I presume you created indexes or constraints for :ObjectConcept(name)
? 我假设您为
:ObjectConcept(name)
创建了索引或约束?
I am working with an acyclic, directed graph (an ontology) that models human health and am needing to identify certain diseases (example: Pneumonia) that are infectious but NOT caused by certain bacteria (staph or streptococcus).
我正在使用一个无环的有向图(本体论)来模拟人类健康,并且需要确定某些传染性但不是由某些细菌(葡萄球菌或链球菌)引起的疾病(例如:肺炎)。 All concepts are Nodes defined as ObjectConcepts.
所有概念都是定义为ObjectConcepts的Node。 ObjectConcepts are connected by relationships such as [ISA], [Pathological_process], [Causative_agent], etc.
ObjectConcepts通过[ISA],[Pathological_process],[Causative_agent]等关系进行连接。
The query requires: 该查询要求:
a) Identification of all concepts subsumed by the concept Pneumonia as follows: a)确定肺炎概念包含的所有概念如下:
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)
this already returns a number of paths, potentially millions, can you check that with
MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept) return count(*)
b) Identification of all concepts subsumed by Genus Staph and Genus Strep (including the concept Genus Staph and Genus Strep) as follows. b)识别由葡萄球菌属和链球菌属归纳的所有概念(包括概念的葡萄球菌属和链球菌属)如下。 Note:
注意:
with b MATCH (b) q = (c:ObjectConcept{Strep})<-[:ISA*]-(d:ObjectConcept), h = (e:ObjectConcept{Staph})<-[:ISA*]-(f:ObjectConcept) 与b匹配(b)q =(c:ObjectConcept {Strep})<-[:ISA *]-(d:ObjectConcept),h =(e:ObjectConcept {Staph})<-[:ISA *]-(f :ObjectConcept)
this is then the cross product of the paths from "p", "q" and "h", eg if all 3 of them return 1000 paths, you're at 1bn paths !! 这就是“ p”,“ q”和“ h”的路径的叉积,例如,如果全部3条路径返回1000条路径,那么您的路径就是10亿条!
c) Identify all nodes(p) that do not have a causative agent of Strep (ie, nodes(q)) or Staph (nodes(h)) as follows: c)确定所有没有链球菌病原体的节点(p)(即节点(q))或葡萄球菌(节点(h)),如下所示:
with b,c,d,e,f MATCH (b),(c),(d),(e),(f) WHERE (b)--()-->(c) OR (b)-->()-->(d) OR (b)-->()-->(e) OR (b)-->()-->(f) RETURN distinct b.Name; 与b,c,d,e,f匹配(b),(c),(d),(e),(f)在(b)-()->(c)或(b)- >()->(d)OR(b)->()->(e)OR(b)->()->(f)返回不同的b.Name;
you don't need the WITH or even the MATCH (b),(c),(d),(e),(f) 您不需要WITH甚至MATCH(b),(c),(d),(e),(f)
what connections are there between b and the other nodes ? b和其他节点之间有什么连接? do you have concrete ones?
你有具体的吗? for the first there is also missing one direction.
对于第一个,也缺少一个方向。
the where clause can be a problem, in general you want to show that perhaps this query is better reproduced by a UNION of simpler matches where子句可能是一个问题,一般来说,您想证明此查询可以由更简单匹配的UNION更好地重现
eg 例如
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(c:ObjectConcept{name:Strep}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(e:ObjectConcept{name:Staph}) RETURN b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Strep}) return b.name
UNION
MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Staph}) return b.name
another option would be to utilize the shortestPath() function to find one or all shortest path(s) between Pneumonia and the bacteria with certain rel-types and direction. 另一种选择是利用shortestPath()函数找到肺炎和具有特定rel类型和方向的细菌之间的一条或全部最短路径。
Perhaps you can share the dataset and the expected result. 也许您可以共享数据集和预期结果。
The query was successfully accomplished using UNION functions as follows: 使用UNION函数成功完成了查询,如下所示:
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
q = (c:ObjectConcept{sctid:58800005})<-[:ISA*]-(d:ObjectConcept)
WHERE NOT (b)-->()--(c) AND NOT (b)-->()-->(d)
RETURN distinct b
UNION
MATCH p = (a:ObjectConcept{sctid:233604007}) <-[:ISA*]- (b:ObjectConcept),
t = (e:ObjectConcept{sctid:65119002}) <-[:ISA*]- (f:ObjectConcept)
WHERE NOT (b)-->()-->(e) AND NOT (b)-->()-->(f)
RETURN distinct b
The query runs in sub 20 seconds vs. 20 minutes by reducing the cardinality of the objects being queried. 通过减少要查询的对象的基数,该查询在20秒(而不是20分钟)内运行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.