简体   繁体   中英

scala map over rdd ERROR java.lang.NullPointerException

I got a weird ERROR when I try to map over rdds read from HDFS, here's my code (simplified).

I tried to put RDDs into a scala Map and then select certain key-values to write HDFS. I ran the piece of code below but every time it ran to the "map over rdd" process(I've made a comment below), I met the NPE Error , that's so wierd!

I can't solve this problem. I had rewrite my code in a clumsy and ugly way to make it work, but I still want to know why the "map over rdd" matters!!!

BTW: All keys I used has existed in the scala Map, and there is no null values. I delete the "map over rdd" process and write key-values to string to make my code work in another way, but just eager to know why this weird problem happens...T_T

val featureRdd = hdfsRDD
  .flatMap { item =>
    val result = new ArrayBuffer[String]()
    val itemInfo = collection.mutable.Map[String, String]()

    // x below is a string tuple: (s1, s2, s3)
    item._2.asScala.foreach(x => {
      itemInfo.put(x._2, x._3)
    })

    val f1 = itemInfo.getOrElse("f1", "")
    val f2 = itemInfo.getOrElse("f2", "")
    if (f1.equals("true") && f2.nonEmpty) {
      val c1 = itemInfo.getOrElse("c1", "0")
      val c2 = itemInfo.getOrElse("c2", "0")

      val featureInfo = collection.mutable.Map[String, String]()
      featureInfo.put("c1", c1)
      featureInfo.put("c2", c2)

      // Every time I add this map code, I will get NPE ERROR
      // And I'm sure this is no null values because I filter all null values ahead of time
      featureInfo.map(item => {
        val featureName = item._1
        val featureValue = item._2
        result += List(featureName, featureValue).mkString(",")
      })
      result
    } else {
      null
    }
  }.filter(_!=null)

This is clearly due to returning null in your flatMap . RDD.flatMap defers to Iterator.flatMap , and check this out:

println(List(List(1), null).iterator.flatMap(identity).toList) // throws NPE
Exception in thread "main" java.lang.NullPointerException
    at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:480)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:486)
    at scala.collection.Iterator.foreach(Iterator.scala:937)
    at scala.collection.Iterator.foreach$(Iterator.scala:937)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
    at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:58)
    at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:49)
    at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:185)
    at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:43)
    at scala.collection.TraversableOnce.to(TraversableOnce.scala:309)
    at scala.collection.TraversableOnce.to$(TraversableOnce.scala:307)
    at scala.collection.AbstractIterator.to(Iterator.scala:1425)
    at scala.collection.TraversableOnce.toList(TraversableOnce.scala:293)
    at scala.collection.TraversableOnce.toList$(TraversableOnce.scala:293)
    at scala.collection.AbstractIterator.toList(Iterator.scala:1425)
    at com.dici.collection.ScalaArrayUtils$.main(ScalaArrayUtils.scala:22)
    at com.dici.collection.ScalaArrayUtils.main(ScalaArrayUtils.scala)

More specifically, this is where I expect your NPE to come from: link . Just remove this null return and replace it with an empty iterable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM