hadoop reducer上的JVM崩溃

Question

I am running java codes on hadoop, but encounter this error: 我在hadoop上运行Java代码，但遇到此错误：

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f2ffe7e1904, pid=31718, tid=139843231057664
#
# JRE version: Java(TM) SE Runtime Environment (8.0_72-b15) (build 1.8.0_72-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.72-b15 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x813904]  PhaseIdealLoop::build_loop_late_post(Node*)+0x144
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /hadoop/nm-local-dir/usercache/ihradmin/appcache/application_1479451766852_3736/container_1479451766852_3736_01_000144/hs_err_pid31718.log
#
# Compiler replay data is saved as:
# /hadoop/nm-local-dir/usercache/ihradmin/appcache/application_1479451766852_3736/container_1479451766852_3736_01_000144/replay_pid31718.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp

When I go to the node manager, all the logs are aggregated since yarn.log-aggregation-enable is true , and log hs_err_pid31718.log and replay_pid31718.log cannot be found. 当我转到节点管理器时，由于yarn.log-aggregation-enable is true ，因此所有日志都被汇总，并且找不到日志hs_err_pid31718.log和replay_pid31718.log。

Normally 1) the JVM crashes after several minutes of reducer, 2) sometimes the auto-retry of reducer can succeeds, 3) some reducers can succeed without failure. 通常，1）减速器几分钟后JVM崩溃，2）有时减速器的自动重试可以成功，3）一些减速器可以成功而不会失败。

Hadoop version is 2.6.0, Java is Java8. Hadoop版本是2.6.0，Java是Java8。 This is not a new environments, we have lots of jobs running on the cluster. 这不是一个新环境，我们在集群上有很多作业正在运行。

My questions: 我的问题：

Can I find hs_err_pid31718.log anywhere after yarn aggregate the log and remove the folder? 在纱线聚合日志并删除文件夹之后，可以在任何地方找到hs_err_pid31718.log吗？ Or is there a setting to keep all the local logs so I can check the hs_err_pid31718.log while aggregating logs by yarn? 还是有保留所有本地日志的设置，以便在按纱线聚合日志时可以检查hs_err_pid31718.log？
What's the common steps to narrow down the deep dive scope? 缩小深潜范围的常见步骤是什么？ Since the jvm crashed, I cannot see any exception in code. 由于jvm崩溃，我看不到代码中的任何异常。 I have tried -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp this args but there is no heap dumped on the host failing the reduce tasks. 我已经尝试了-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp这个args，但是在主机上没有没有通过reduce任务的堆。

Thanks for any suggestion. 感谢您的任何建议。

Answer 1

Answers 答案

Use -XX:ErrorFile=<your prefered location>/hs_err_pid<pid>.log to set the hs_error file location to your prefered one. 使用-XX:ErrorFile=<your prefered location>/hs_err_pid<pid>.log将hs_error文件位置设置为您首选的位置。
Crash is due to JDK bug JDK-6675699 this has already fixed in JDK9 and backports are available on JDK8 update 74 onwards. 崩溃是由于JDK错误JDK-6675699所致，该错误已在JDK9中修复，并且从JDK8更新74开始可以使用反向移植。

You are using JDK8 update 72. Kindly upgrade to latest version from here to avoid this crash. 您正在使用JDK8更新72。请从此处升级到最新版本，以避免此崩溃。

hadoop reducer上的JVM崩溃

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-07-11 16:42:54

hadoop reducer上的JVM崩溃

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-07-11 16:42:54

解决方案1
1 已采纳 2017-07-11 16:42:54