简体   繁体   中英

Map Partition Iterator return

Can anyone help in accepting the returning Iterator listWords() method to mapPartitions.

object MapPartitionExample {

  def main(args: Array[String]): Unit = {

    val conf= new SparkConf().setAppName("MapPartitionExample").setMaster("local[*]")
    val sc= new SparkContext(conf)

    val input:RDD[String] = sc.parallelize(List("ABC","DEF","GHU","YHG"))

    val x= input.mapPartitions(word => listWords(word))


  }

  def listWords(words: Iterator[String]) : util.Iterator[String] = {

    val arrList = new util.ArrayList[String]()
    while( words.hasNext ) {
      arrList.add( words.next())
    }
    return arrList.iterator()
  }

}

Return type of the function used in mapPartitions should be scala.collection.Iterator , not java.util.Iterator . I don't see much point of your current code, but you can use Scala mutable collections:

import scala.collection.mutable.ArrayBuffer

def listWords(words: Iterator[String]) : Iterator[String] = {
  val arr = ArrayBuffer[String]()
  while( words.hasNext ) {
    arr += words.next()
  }
  arr.toIterator
}

Personally I'd just map :

def listWords(words: Iterator[String])  : Iterator[String] = {
   // Some init code
   words.map(someFunction)
}

Iterable[NotInferU] is expected but you are returning java.util.Iterator[String]

You would need to convert the java.util.Iterator to scala Iterator by importing scala.collection.JavaConversions._ as below

  def listWords(words: Iterator[String]) : Iterator[String] = {
    val arrList = new util.ArrayList[String]()
    while( words.hasNext ) {
      arrList.add( words.next())
    }
    import scala.collection.JavaConversions._
    return arrList.toList.iterator
  }

Rest of the codes are as it is.

I hope the answer is helpful

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM