简体   繁体   English

火花图上的 Gremlin 遍历查询

[英]Gremlin traversal queries on spark graph

I have build a property graph(60 million nodes, 40 million edges) from s3 using Apache Spark Graphx framework.我已经使用 Apache Spark Graphx 框架从 s3 构建了一个属性图(6000 万个节点,4000 万条边)。 I want to fire traversal queries on this graph.我想在这个图上触发遍历查询。

My queries will be like:-我的查询将是: -

g.V().has("name","xyz").out('parent').out().has('name','abc')
g.V().has('proc_name','serv.exe').out('file_create').
has('file_path',containing('Tsk04.txt')).in().in('parent').values('proc_name')
g.V().has('md5','935ca12348040410e0b2a8215180474e').values('files')

mostly queries are of form gV().out().out().out()大多数查询的形式是gV().out().out().out()

Such queries are easily possible on graph db's like neo4j,titan,aws neptune since they support gremlin.此类查询在 neo4j、titan、aws neptune 等图形数据库上很容易实现,因为它们支持 gremlin。

Can we traverse spark graphs in such manner.我们可以以这种方式遍历火花图吗? I tried spark pregel-api but its bit complex as compared to gremlin.我尝试了 spark pregel-api 但与 gremlin 相比它有点复杂。

Reason I am looking for spark graph is because cloud solutions of above mentioned graphdbs is costly.我正在寻找火花图的原因是因为上述 graphdbs 的云解决方案成本高昂。

Spark GraphFrames library should be most convenient for you. Spark GraphFrames 库对您来说应该是最方便的。 it provides neo4j-cypher-like traversal description and use Spark DataFrames api for filtering它提供了类似 Neo4j-cypher 的遍历描述,并使用 Spark DataFrames api 进行过滤
https://graphframes.github.io/graphframes/docs/_site/user-guide.html#motif-finding Here is an example: https://graphframes.github.io/graphframes/docs/_site/user-guide.html#motif-finding这是一个例子:

val g2: GraphFrame = GraphFrame.fromGraphX(gx) // you can start with just V and E DataFrames here
val motifs: GraphFrame = g.find("(a)-[e]->(b); (b)-[e2]->(c)")
motifs.filter("a.name = 'xyz'  and e.label = 'parent' and c.name = 'abc'").show()

TinkerPop it self has spark support, so you can issue spark OLAP queries from gremlin console https://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer TinkerPop 它本身具有 spark 支持,因此您可以从 gremlin 控制台https://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer发出 spark OLAP 查询

Or there are some close source solutions.或者有一些闭源解决方案。 Datastax Enterprise Database has a good Gremlin support for spark: https://www.datastax.com/blog/2017/05/introducing-dse-graph-frames I'm a former author of it Datastax 企业数据库对 spark 有很好的 Gremlin 支持: https://www.datastax.com/blog/2017/05/introducing-dse-graph-frames我是它的前作者

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM