简体   繁体   中英

scala java.lang.NullPointerException

The following code is causing java.lang.NullPointerException.

val sqlContext = new SQLContext(sc)
val dataFramePerson = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").schema(CustomSchema1).load("c:\\temp\\test.csv")
val dataFrameAddress = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").schema(CustomSchema2).load("c:\\temp\\test2.csv")

val personData = dataFramePerson.map(data => {
  val addressData = dataFrameAddress.filter(i => i.getAs("ID") == data.getAs("ID"));
  var address:Address = null;
  if (addressData != null) {
    val addressRow = addressData.first;
    address = addressRow.asInstanceOf[Address];
  }
  Person(data.getAs("Name"),data.getAs("Phone"),address)
})

I narrowed it down to the following line of that is causing the exception.

val addressData = dataFrameAddress.filter(i => i.getAs("ID") == data.getAs("ID"));

Can someone point out what the issue is?

Your code has a big structural flaw, that is, you can only refer to dataframes from the code that executes in the driver, but not in the code that is run by the executors. Your code contains a reference to another dataframe from within a map, that is executed in executors. See this link Can I use Spark DataFrame inside regular Spark map operation?

val personData = dataFramePerson.map(data => { // WITHIN A MAP
  val addressData = dataFrameAddress.filter(i => // <--- REFERRING TO OTHER DATAFRAME WITHIN A MAP
          i.getAs("ID") == data.getAs("ID"));  
  var address:Address = null;
  if (addressData != null) {

What you want to do instead is a left outer join, then do further processing.

dataFramePerson.join(dataFrameAddress, Seq("ID"), "left_outer")

Note also than when using getAs you want to specify the type, like getAs[String]("ID")

The only thing that can be said is that either dataFrameAddress , or i , or data is null . Use your favorite debugging technique to know which one actually is eg, debugger, print statements or logs.

Note that if you see the filter call in the stacktrace of your NullPointerException , it would mean that only i , or data could be null . On the other hand, if you don't see the filter call, it would rather mean that it is dataFrameAddress that is null .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM