[英]Spark Job failed on YARN -
I am trying to execute the Spark job in YARN Cluster using the following configurations. 我正在尝试使用以下配置在YARN群集中执行Spark作业。
/usr/bin/spark-submit
--class com.example.DriverClass
--master yarn-cluster
app.jar
hdfs:///user/spark/file1.parquet
hdfs:///user/spark/file2.parquet
hdfs:///user/spark/output
20151217052915
--num-executors 20
--executor-memory 12288M
--executor-cores 5
--driver-memory 6G
--conf spark.yarn.executor.memoryOverhead=1332
We are executing with 20 executors and each executor we are passing as 12 GB memory for this job. 我们正在与20个执行程序一起执行,每个执行程序都以12 GB的内存传递给该作业。
Do we have to increase the size of spark.yarn.executor.memoryOverhead
property ? 我们是否必须增加
spark.yarn.executor.memoryOverhead
属性的大小?
Error log: 错误日志:
15/12/18 15:47:39 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 5.0 (TID 117, lpdn0185.com): java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:336)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator$$anonfun$next$1.apply(ExternalAppendOnlyMap.scala:331)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:331)
at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:227)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.SubtractedRDD.integrate$1(SubtractedRDD.scala:110)
at org.apache.spark.rdd.SubtractedRDD.compute(SubtractedRDD.scala:119)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/12/18 15:47:39 INFO scheduler.TaskSetManager: Starting task 2.1 in stage 5.0 (TID 119, lpdn0185.com, PROCESS_LOCAL, 4237 bytes)
15/12/18 15:47:39 WARN scheduler.TaskSetManager: Lost task 3.0 in stage 5.0 (TID 118, lpdn0185.com): FetchFailed(BlockManagerId(2, lpdn0185..com, 37626), shuffleId=4, mapId=42, reduceId=3, message=
org.apache.spark.shuffle.FetchFailedException: Error in opening FileSegmentManagedBuffer{file=/hdfs1/yarn/nm/usercache/phdpentcustcdibtch/appcache/application_1449986083135_60217/blockmgr-34a2e882-6b36-42c6-bcff-03d9bc5ef80b/0c/shuffle_4_42_0.data, offset=5899394, length=46751}
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91)
at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.SubtractedRDD.integrate$1(SubtractedRDD.scala:110)
at org.apache.spark.rdd.SubtractedRDD.compute(SubtractedRDD.scala:119)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error in opening FileSegmentManagedBuffer{file=/hdfs1/yarn/nm/usercache/user1/appcache/application_1449986083135_60217/blockmgr-34a2e882-6b36-42c6-bcff-03d9bc5ef80b/0c/shuffle_4_42_0.data, offset=5899394, length=46751}
at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:113)
at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$3.apply(ShuffleBlockFetcherIterator.scala:300)
at org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$3.apply(ShuffleBlockFetcherIterator.scala:300)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
... 30 more
Caused by: java.io.FileNotFoundException: /hdfs1/yarn/nm/usercache/user1/appcache/application_1449986083135_60217/blockmgr-34a2e882-6b36-42c6-bcff-03d9bc5ef80b/0c/shuffle_4_42_0.data (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:98)
... 35 more
)
Appreciate your help on this. 感谢您的帮助。
I had the same issue for about several weeks. 我有同样的问题大约几个星期。 Exactly speaking, every time I got slightly different errors including what you got.
确切地说,每次我遇到的错误包括您得到的错误时,都会略有不同。 Basically, in my case, I think, compared to cluster capability, data was too big.
基本上,就我而言,与集群功能相比,数据太大。
In brief, what I tried was 简而言之,我尝试的是
Hope this works! 希望这有效!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.