![](/img/trans.png)
[英]“ java.lang.NullPointerException” in Clustering using Spark
[英]Spark java.lang.NullPointerException when using tuples
我正在使用GraphX API生成火花以構建圖形並使用Pregel API對其進行處理。 如果我從vprog函數返回一個參數元組,則不會發生該錯誤,但是如果我使用相同的元組返回一個新的元組,則會出現空點錯誤。 以下是相關代碼:
val verticesRDD = cleanDtaDF.select("ChildHash", "DN").rdd.map(row => (row(0).toString.toLong, (row(1).toString.toDouble,row(0).toString.toLong)))
val edgesRDD = (rawDtaDF.select("ChildHash", "ParentHash", "dealer_code", "dealer_customer_number", "parent_dealer_cust_number").rdd
.map(row => Edge(row.get(0).toString.toLong, row.get(1).toString.toLong, (row(3) + " is a child of " + row(4), " when dealer is " + row.get(2)))))
val myGraph = Graph(verticesRDD, edgesRDD)
def vprog(vertexId: VertexId, vertexDTA:(Double, Long), msg: Double): (Double, Long) = {
(vertexDTA._1, vertexDTA._2)
}
val result = myGraph.pregel(0.0, 1, activeDirection = EdgeDirection.Out)(vprog,t => Iterator((t.dstId, t.srcAttr._2)),(x, y) => x + y)
如果我對vprog(...)進行簡單更改,則不會發生該錯誤-不訪問元組的成員:
def vprog(vertexId: VertexId, vertexDTA:(Double, Long), msg: Double): (Double, Long) = {
vertexDTA
}
錯誤是
[Stage 101:> (0 + 0) / 200][Stage 102:> (0 + 4) / 200]18/03/10 20:43:16 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 102.0 (TID 5959, ue1lslaved25.na.aws.cat.com, executor 146): java.lang.NullPointerException
at $line69.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.vprog(<console>:60)
at $line70.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<console>:75)
at $line70.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<console>:75)
at org.apache.spark.graphx.Pregel$$anonfun$1.apply(Pregel.scala:125)
at org.apache.spark.graphx.Pregel$$anonfun$1.apply(Pregel.scala:125)
at org.apache.spark.graphx.impl.VertexPartitionBaseOps.map(VertexPartitionBaseOps.scala:61)
at org.apache.spark.graphx.impl.GraphImpl$$anonfun$5.apply(GraphImpl.scala:129)
at org.apache.spark.graphx.impl.GraphImpl$$anonfun$5.apply(GraphImpl.scala:129)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:988)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:979)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:919)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:979)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:697)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
這個問題有一個簡單的解釋。 它與Spark或Graphx無關。
具有功能(只需從原始內容中刪除不相關的項):
def vprog(vertexDTA:(Double, Long)): (Double, Long) = {
(vertexDTA._1, vertexDTA._2)
}
如果arg vertexDTA
為null
,則vertexDTA._1
和vertexDTA._2
都將拋出NullPointerException
。
如果我們將功能更改為
def vprog(vertexDTA:(Double, Long)): (Double, Long) = {
vertexDTA
}
當arg為null
,它只返回它,就無法訪問元組的成員,因此就沒有NPE
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.