如何使用Spark处理大型Titan Graph

Question

I have loaded very large graph in TItan 1.0.0 with backend Cassandra 2.1.13. 我在TItan 1.0.0中加载了非常大的图形，后端是Cassandra 2.1.13。 I have to perform some operations on the graphs using Spark. 我必须使用Spark在图形上执行一些操作。

For example, 例如，

I want to find subgraphs in a very large graph using Apache Spark 我想使用Apache Spark在一个非常大的图中找到子图
I want to run some clustering (machine learning code) on graph stored in Titan,etc. 我想在Titan等存储的图形上运行一些聚类（机器学习代码）。

Basically, I will be applying some algorithm on TitanGraph using Spark (which I suppose will be faster on a big graph) 基本上，我将使用Spark在TitanGraph上应用一些算法（我想在大图上会更快）

I am able to find the any docs relating this, how to process the graph. 我能够找到与此相关的任何文档，如何处理图表。 Is the Spark a right approach to apply algorithms(Machine learning) on large graph? Spark是一种在大图上应用算法（机器学习）的正确方法吗？ What should be my next steps? 我的下一步应该是什么？ How do I run my Spark code on Titan? 如何在Titan上运行我的Spark代码？ (I am not able to find the exact methods or function through which I should be inserting/using Spark code? （我无法找到插入/使用Spark代码的确切方法或功能？

Any help is appreciated. 任何帮助表示赞赏。

Answer 1

Have you had a look at SparkGraphComputer ? 你看过SparkGraphComputer吗？ This helps you apply Gremlin queries that will be executed on Spark framework. 这有助于您应用将在Spark框架上执行的Gremlin查询。 Have a look at this example: 看看这个例子：

gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin> g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]
gremlin> :remote connect tinkerpop.hadoop graph g
==>useTraversalSource=graphtraversalsource[hadoopgraph[gryoinputformat->gryooutputformat], sparkgraphcomputer]
==>useSugar=false
gremlin> :> g.V().group().by{it.value('name')[1]}.by('name')
==>[a:[marko, vadas], e:[peter], i:[ripple], o:[josh, lop]]

Another way to go is to use the GraphComputer . 另一种方法是使用GraphComputer 。 This helps you a lot on applying OLAP and OLTP on the graph using Spark/Hadoop. 这有助于您在使用Spark / Hadoop在图上应用OLAP和OLTP。 Here is an example 这是一个例子

gremlin> result = graph.compute().program(PageRankVertexProgram.build().create()).submit().get()
==>result[tinkergraph[vertices:6 edges:0],memory[size:0]]
gremlin> result.memory().runtime
==>95
gremlin> g = result.graph().traversal(standard())
==>graphtraversalsource[tinkergraph[vertices:6 edges:0], standard]
gremlin> g.V().valueMap('name',PageRankVertexProgram.PAGE_RANK)
==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], name:[marko]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], name:[vadas]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.4018125], name:[lop]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], name:[josh]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.23181250000000003], name:[ripple]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], name:[peter]]

Answer 2

考虑使用mizo for Titan的OLAP使用spark - 这个答案可能会有所帮助。

如何使用Spark处理大型Titan Graph

问题描述

2 个解决方案

解决方案1
0 2016-02-19 13:12:57

解决方案2
0 2017-01-17 21:08:50

如何使用Spark处理大型Titan Graph

问题描述

2 个解决方案

解决方案1 0 2016-02-19 13:12:57

解决方案2 0 2017-01-17 21:08:50

解决方案1
0 2016-02-19 13:12:57

解决方案2
0 2017-01-17 21:08:50