简体   繁体   English

Spark GraphX的DFS性能与简单的Java DFS实现

[英]DFS performance of Spark GraphX vs simple Java DFS implementation

Considering a graph with 14,000 vertices and 14,000 edges, I wonder why GraphX takes much more time than the java implementation of a graph to get all the paths from a vertex to the leaf? 考虑到具有14,000个顶点和14,000个边的图,我想知道为什么GraphX比图的Java实现花费更多的时间来获取从顶点到叶子的所有路径?

The java implementation: A few seconds Java实现:几秒钟

The Graphx implementation: Several minutes Graphx实施:几分钟

Is spark GraphX really suitable for this kind of treatment? Spark GraphX真的适合这种治疗吗?

My system: i5-7500 @3.40GHz, 8GB RAM 我的系统:i5-7500 @ 3.40GHz,8GB RAM

The pregel's algorythm: 预凝胶的算法:

val sourceId: VertexId = 42 // The ultimate source
  // Initialize the graph such that all vertices except the root have canReach = false.
  val initialGraph: Graph[Boolean, Double]  = graph.mapVertices((id, _) => id == sourceId)
  val sssp = initialGraph.pregel(false)(
    (id, canReach, newCanReach) => canReach || newCanReach, // Vertex Program
    triplet => {  // Send Message
      if (triplet.srcAttr && !triplet.dstAttr) {
        Iterator((triplet.dstId, true))
      } else {
        Iterator.empty
      }
    },
    (a, b) => a || b // Merge Message

It happened to me when implementing some algorithms on Graphx, I believe that GraphX is well adapted for a distributed environment when you have big graphs split accross multiple machines. 当我在Graphx上实现一些算法时,这发生在我身上。我相信当您将大图分散在多台计算机上时,GraphX非常适合于分布式环境。 But now while you say that you use one node, have you checked the number of workers used? 但是,当您说使用一个节点时,您是否检查了使用的工作人员数量? number of executors? 多少执行人? Amount of memory used by each excutor? 每个应聘者使用的内存量? These configuration parameters definitely plays an important role in increasing or decreasing the performance of your GraphX application. 这些配置参数在提高或降低GraphX应用程序的性能中无疑起着重要的作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM