簡體   English   中英

為什么Spark-countByValue()引起FileNotFoundException?

[英]Why Spark - countByValue () is causing FileNotFoundException?

我是Spark的新手,在Windows機器中安裝了Spark,並在spark-shell中執行spark命令。 創建了簡單的RDD並沒有找到。 文字出現的次數。 這是我的代碼塊。 這導致FileNotFound Exception

scala>  val inputrdd = sc.parallelize{ Seq("a", "b", "c", "d","a","a") }
inputrdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[70] at parallelize at <console>:24

scala>  inputrdd.count
res102: Long = 6

scala>  inputrdd.first
res103: String = a

scala>

scala>  inputrdd.countByValue()
2018-09-26 19:32:03 ERROR Executor:91 - Exception in task 5.0 in stage 72.0 (TID 113)
java.io.FileNotFoundException: C:\Users\hadoop\AppData\Local\Temp\blockmgr-910b1c57-9f3a-4dea-a80b-701ad0a32ead\1f\shuffle_6_5_0.data.6bac0b0d-93a6-4b57-a1d6-8dbe379c264f (The system cannot find the path specified)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:103)
        at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:116)
        at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
        at org.apache.spark.util.collection.WritablePartitionedPairCollection$$anon$1.writeNext(WritablePartitionedPairCollection.scala:56)
        at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:699)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:72)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

您需要從頭開始重新運行相同的代碼,它可以正常工作。

原因: C:\\Users\\hadoop\\AppData\\Local\\Temp\\blockmgr-910b1c57-9f3a-4dea-a80b-701ad0a32ead被刪除。 它存儲數據幀在運行時保存並丟失的臨時數據。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM