I got a weird ERROR when I try to map over rdds read from HDFS, here's my code (simplified).
I tried to put RDDs into a scala Map and then select certain key-values to write HDFS. I ran the piece of code below but every time it ran to the "map over rdd" process(I've made a comment below), I met the NPE Error , that's so wierd!
I can't solve this problem. I had rewrite my code in a clumsy and ugly way to make it work, but I still want to know why the "map over rdd" matters!!!
BTW: All keys I used has existed in the scala Map, and there is no null values. I delete the "map over rdd" process and write key-values to string to make my code work in another way, but just eager to know why this weird problem happens...T_T
val featureRdd = hdfsRDD
.flatMap { item =>
val result = new ArrayBuffer[String]()
val itemInfo = collection.mutable.Map[String, String]()
// x below is a string tuple: (s1, s2, s3)
item._2.asScala.foreach(x => {
itemInfo.put(x._2, x._3)
})
val f1 = itemInfo.getOrElse("f1", "")
val f2 = itemInfo.getOrElse("f2", "")
if (f1.equals("true") && f2.nonEmpty) {
val c1 = itemInfo.getOrElse("c1", "0")
val c2 = itemInfo.getOrElse("c2", "0")
val featureInfo = collection.mutable.Map[String, String]()
featureInfo.put("c1", c1)
featureInfo.put("c2", c2)
// Every time I add this map code, I will get NPE ERROR
// And I'm sure this is no null values because I filter all null values ahead of time
featureInfo.map(item => {
val featureName = item._1
val featureValue = item._2
result += List(featureName, featureValue).mkString(",")
})
result
} else {
null
}
}.filter(_!=null)
This is clearly due to returning null
in your flatMap
. RDD.flatMap
defers to Iterator.flatMap
, and check this out:
println(List(List(1), null).iterator.flatMap(identity).toList) // throws NPE
Exception in thread "main" java.lang.NullPointerException
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:480)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:486)
at scala.collection.Iterator.foreach(Iterator.scala:937)
at scala.collection.Iterator.foreach$(Iterator.scala:937)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:58)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:49)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:185)
at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:43)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:309)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:307)
at scala.collection.AbstractIterator.to(Iterator.scala:1425)
at scala.collection.TraversableOnce.toList(TraversableOnce.scala:293)
at scala.collection.TraversableOnce.toList$(TraversableOnce.scala:293)
at scala.collection.AbstractIterator.toList(Iterator.scala:1425)
at com.dici.collection.ScalaArrayUtils$.main(ScalaArrayUtils.scala:22)
at com.dici.collection.ScalaArrayUtils.main(ScalaArrayUtils.scala)
More specifically, this is where I expect your NPE to come from: link . Just remove this null
return and replace it with an empty iterable.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.