简体   繁体   中英

Apache Spark worker executor EXITED with exit status 1

I have a Spark standalone setup (v 1.4.1) with 3 workers.

I have an application that read a stream from a Kafka Topic elaborate data and store it in another Kafka Topic.

Last night the application fell down and all worker was down.

The worker's logs report like the following:

16/02/04 21:02:10 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=52180" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54330" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@driverHost:52180/user/CoarseGrainedScheduler" "--executor-id" "24279" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160201182749-0007" "--worker-url" "akka.tcp://sparkWorker@worker2:57853/user/Worker"
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24279/stdout with daily rolling
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24279/stderr with daily rolling
16/02/04 21:02:10 INFO Worker: Executor app-20160129184621-0001/1430 finished with state EXITED message Command exited with code 1 exitStatus 1
16/02/04 21:02:10 INFO Worker: Asked to launch executor app-20160129184621-0001/1431 for stream-elaboration
16/02/04 21:02:10 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=57297" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54326" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@driverHost:57297/user/CoarseGrainedScheduler" "--executor-id" "1431" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160129184621-0001" "--worker-url" "akka.tcp://sparkWorker@worker2:57853/user/Worker"
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1431/stdout with daily rolling
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1431/stderr with daily rolling
16/02/04 21:02:11 INFO Worker: Executor app-20160201182749-0007/24279 finished with state EXITED message Command exited with code 1 exitStatus 1
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160201182749-0007/24280 for stream-elaboration
16/02/04 21:02:11 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=52180" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54330" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@driverHost:52180/user/CoarseGrainedScheduler" "--executor-id" "24280" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160201182749-0007" "--worker-url" "akka.tcp://sparkWorker@worker2:57853/user/Worker"
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24280/stdout with daily rolling
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24280/stderr with daily rolling
16/02/04 21:02:11 INFO Worker: Executor app-20160129184621-0001/1431 finished with state EXITED message Command exited with code 1 exitStatus 1
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160129184621-0001/1432 for stream-elaboration
16/02/04 21:02:11 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=57297" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54326" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@driverHost:57297/user/CoarseGrainedScheduler" "--executor-id" "1432" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160129184621-0001" "--worker-url" "akka.tcp://sparkWorker@worker2:57853/user/Worker"
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1432/stdout with daily rolling
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1432/stderr with daily rolling
16/02/04 21:02:11 INFO Worker: Executor app-20160201182749-0007/24280 finished with state EXITED message Command exited with code 1 exitStatus 1
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160201182749-0007/24281 for stream-elaboration

at the end of the log:

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp291507283-42"
Exception in thread "qtp291507283-37" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "ExecutorRunner for app-20160201182749-0007/29488" java.lang.OutOfMemoryError: GC overhead limit exceeded

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkWorker-scheduler-1"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp291507283-38" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "JMX server connection timeout 81" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "JMX server connection timeout 81"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkWorker-10"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp291507283-40"
Exception in thread "qtp291507283-35" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"

Exception in thread "qtp291507283-39" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp291507283-41" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp291507283-36" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" 

....

Running:

ps aux | grep "worker"

the process is still active, but I can't see it on sparkUI.

Why are worker executor restart so frequently?

The logs show multiple java.lang.OutOfMemoryError: GC overhead limit exceeded messages, which means your executors throw errors which cause them to exit.

This error means your program spends too much time running GC (see more details here ). To resolve this - you can try one of these paths:

  • The brute-force way would be disabling this safety by adding -XX:-UseGCOverheadLimit to your executors' JVM options, but it might leave your application doing mostly GC, hence running very slowly
  • Analyze your job's memory usage and optimize it - your code might be consuming more memory than needed, forcing the GC to work too hard
  • Tune your memory settings - for example, if you can increase heap space for the executors, pressure on GC might be reduced

You're essentially running out of memory to run the process smoothly. Options that come to mind:

  1. Specify more memory like you mentioned, try something in between like -Xmx512m first.
  2. Debug your code to find out possibilities of memory leaks.

Why are worker executor restart so frequently?

You are using Spark Streaming which is built on Spark, so it enjoys the same fault-tolerance for worker nodes that means if the worker goes down due to some unexpected error as in your case Spark engine will attempt to restart it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM