简体   繁体   中英

returning a list from a function on rdd

I am stuck at something silly. I have an rdd x. On each element of this rdd, I have to call a function f which takes element from this rdd and adds it to a list.

    var list1 = scala.collection.mutable.MutableList[String]()
        def listfinal (x:String):scala.collection.mutable.MutableList[String]={
                list1 += x
                return list1
     }
    val s = rdd.map(x=>listfinal(x))
    print(s.count())

I want only the last list out of it where all the elements of the rdd have been added and not every list which contains elements from rdd successively. How do i do it?

The problem with your code is that Spark operates on copies of all the variables used in the function. Therefore, no updates to the variables are propagated back to the driver program, where you define your list. See here for more details.

To gather all elements of an RDD to a list, consider the aggregate() action. Supposing you have an RDD of Strings, then your solution will look like:

rdd.aggregate(List[String]())((list, element) => element :: list, _ ++ _)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM