Can anyone help in accepting the returning Iterator listWords() method to mapPartitions.
object MapPartitionExample {
def main(args: Array[String]): Unit = {
val conf= new SparkConf().setAppName("MapPartitionExample").setMaster("local[*]")
val sc= new SparkContext(conf)
val input:RDD[String] = sc.parallelize(List("ABC","DEF","GHU","YHG"))
val x= input.mapPartitions(word => listWords(word))
}
def listWords(words: Iterator[String]) : util.Iterator[String] = {
val arrList = new util.ArrayList[String]()
while( words.hasNext ) {
arrList.add( words.next())
}
return arrList.iterator()
}
}
Return type of the function used in mapPartitions
should be scala.collection.Iterator
, not java.util.Iterator
. I don't see much point of your current code, but you can use Scala mutable collections:
import scala.collection.mutable.ArrayBuffer
def listWords(words: Iterator[String]) : Iterator[String] = {
val arr = ArrayBuffer[String]()
while( words.hasNext ) {
arr += words.next()
}
arr.toIterator
}
Personally I'd just map
:
def listWords(words: Iterator[String]) : Iterator[String] = {
// Some init code
words.map(someFunction)
}
Iterable[NotInferU]
is expected but you are returning java.util.Iterator[String]
You would need to convert the java.util.Iterator
to scala Iterator
by importing scala.collection.JavaConversions._
as below
def listWords(words: Iterator[String]) : Iterator[String] = {
val arrList = new util.ArrayList[String]()
while( words.hasNext ) {
arrList.add( words.next())
}
import scala.collection.JavaConversions._
return arrList.toList.iterator
}
Rest of the codes are as it is.
I hope the answer is helpful
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.