![](/img/trans.png)
[英]SBT gives java.lang.NullPointerException when trying to run spark
[英]Spark throws java.lang.NullPointerException when mapping rdd with java phonetic matching library on null values
我有一個RDD,我是使用map從DataFrame轉到的:
case class Record(id_1: Int, fnam_1: String, lnam_1: String, id_2: Long, fnam_2: String, lnam_2: String)
val rdd = df.map {
case Row(id_1: Int, fnam_1: String, lnam_1: String, id_2: Long, fnam_2: String, lnam_2: String) =>
Record(id_1, fnam_1, lnam_1, id_2, fnam_2, lnam_2)
}
然后,我使用Java語音匹配庫對此rdd執行過濾操作(如下所示):
import edu.ualr.oyster.utilities.DoubleMetaphone
def matchFirstName(rec: Record) = {
val s1 = Option(rec.fnam_1).getOrElse("")
val s2 = Option(rec.fnam_2).getOrElse("")
if (s1.isEmpty || s2.isEmpty)
false
else
new DoubleMetaphone().compareDoubleMetaphone(s1, s2)
}
val rdd_filtered = rdd.filter(matchFirstName(_))
運行此命令時,出現NPE錯誤:
17/04/06 19:06:31 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 160, my.work.cluster.com): java.lang.NullPointerException
at edu.ualr.oyster.utilities.DoubleMetaphone.compareDoubleMetaphone(DoubleMetaphone.java:1020)
at funpackage.EntityResolution$.phoneticMatching(EntityResolution.scala:106)
at esurance.EntityResolution$.esurance$EntityResolution$$matchNames$1(EntityResolution.scala:118)
at esurance.EntityResolution$$anonfun$8.apply(EntityResolution.scala:137)
at esurance.EntityResolution$$anonfun$8.apply(EntityResolution.scala:137)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
我嘗試在項目中的一對字符串上使用語音匹配,並且確實可以正常工作。 我還使用了包裝在用戶定義函數中的spark sql中的相同庫,沒有任何問題。 我懷疑問題可能是由於我的某些值可能會丟失(空)。 但是我嘗試使用其中的Option
來解決這個問題。 知道為什么這失敗了嗎?
我沒有嘗試深入edu.ualr.oyster
庫以查看是否引起異常。 但這似乎是事實。 我切換為使用org.apache.commons.codec.language
庫(相同的雙元音功能),並且該程序可以正常工作在火花上。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.