Scala/Spark: Why I get different result when I run spark program in local and cluster using broadcast?

Question

I have a DataFrame,I want to get the the previous partition's value,I use broadcast.This is my code:

val arr = Array((1, 1,1), (7, 2,1), (3, 3,2), (5, 4,2), (7, 5,3), (9, 6,3), (7, 7,4), (9, 8,4))
    var rdd = sc.parallelize(arr, 4)
    val bro=sc.broadcast(new mutable.HashMap[Int,Int])
     rdd=rdd.mapPartitionsWithIndex(
         (partIdx, iter) => {
           val iterArray=iter.toArray
           bro.value+=(partIdx->iterArray.last._1)
           iterArray.toIterator
         })
   rdd=rdd.mapPartitionsWithIndex(
     (partIdx, iter) => {
       val iterArray = iter.toArray
       var flag=true
       if(partIdx!=0) {
         while (flag) {
           if (bro.value.contains(partIdx - 1)) {
             flag = false
           }
         }
         println(bro.value.get(partIdx-1).get)
       }

       iter
     })
rdd.collect()

In first mapPartitionsWithIndex function I put the each partition'value to broadcast, in second mapPartitionsWithIndex function, I get the broadcast'value. The code run in local well, but it does not work in the cluster, the program can not get the previous partition's value, Why I get the different result when I run spark program in local and cluster using broadcast?

Answer 1

You get different results because your code is incorrect. Broadcasted objects must not be modified :

Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

It seems to work because you take advantage of a detail of the implementation of local mode, with all threads running in a single machine. This makes it similar to the mistakes described in understanding closures .

Scala/Spark: Why I get different result when I run spark program in local and cluster using broadcast?

Question

1 answers

solution1
1 ACCPTED 2017-09-08 07:55:54

Scala/Spark: Why I get different result when I run spark program in local and cluster using broadcast?

Question

1 answers

solution1 1 ACCPTED 2017-09-08 07:55:54

solution1
1 ACCPTED 2017-09-08 07:55:54