[英]JVM crashes when calling JNI function during gc
We have a Java application that has a JNI layer that is multi-threaded (pthread) and will call back to the Java level upon messages received from the underlying network. 我们有一个Java应用程序,该应用程序具有多线程(pthread)的JNI层,并且将在从底层网络接收到消息时回调到Java级别。
We notice that every time it crashes, it is caused by a gc. 我们注意到,每次崩溃它都是由gc引起的。 We can even simulate such a crash by manually trigger a gc by calling
jmap -histo <pid>
while the JNI layer is receiving messages from the network. 我们甚至可以通过在JNI层从网络接收消息时调用
jmap -histo <pid>
手动触发gc来模拟这种崩溃。
Given the information that we have read about the behaviours in JVM during GC in this post, https://stackoverflow.com/a/39401467/4523221 , we still couldn't figure out why such crash is related to gc since JNI function calls are blocked during gc. 鉴于我们在这篇文章https://stackoverflow.com/a/39401467/4523221中已了解到有关GC期间JVM中的行为的信息,由于JNI函数调用,我们仍然无法弄清楚为什么这种崩溃与gc有关在gc期间被阻止。
If anyone can shed light on this, it will be great. 如果任何人都可以阐明这一点,那就太好了。 Thanks in advance.
提前致谢。
The following is a stack trace that we have collected after a crash in our application. 以下是我们在应用程序崩溃后收集的堆栈跟踪。
Program terminated with signal 6, Aborted.
#0 0x0000003cdce325e5 in raise () from /lib64/libc.so.6
#1 0x0000003cdce33dc5 in abort () from /lib64/libc.so.6
#2 0x00007fdafe2516b5 in os::abort(bool) () from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#3 0x00007fdafe3efbf3 in VMError::report_and_die() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#4 0x00007fdafde2f3e2 in report_vm_error(char const*, int, char const*, char const*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#5 0x00007fdafe24c1ff in os::PlatformEvent::park() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#6 0x00007fdafe20c538 in Monitor::ILock(Thread*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#7 0x00007fdafe20c73f in Monitor::lock_without_safepoint_check() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#8 0x00007fdafe2e7a1f in SafepointSynchronize::block(JavaThread*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#9 0x00007fdafe39bcdd in JavaThread::check_safepoint_and_suspend_for_native_trans(JavaThread*) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#10 0x00007fdafe0123d8 in jni_NewByteArray ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#11 0x00007fdaa447b7d1 in JNIEnv_::NewByteArray (this=0x7fdaf800c9f8, len=7)
at /usr/java/jdk1.8.0_65/include/jni.h:1643
---omitted---
#19 0x0000003cdd20b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#20 0x00007fdafe24c133 in os::PlatformEvent::park() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#21 0x00007fdafe20ce27 in Monitor::IWait(Thread*, long) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#22 0x00007fdafe20d5f0 in Monitor::wait(bool, long, bool) ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
---Type <return> to continue, or q <return> to quit---
#23 0x00007fdafe39ed51 in Threads::destroy_vm() ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#24 0x00007fdafdfff931 in jni_DestroyJavaVM ()
from /usr/java/jdk1.8.0_65/jre/lib/amd64/server/libjvm.so
#25 0x00007fdafe91a63d in JavaMain () from /usr/java/jdk1.8.0_65/bin/../lib/amd64/jli/libjli.so
#26 0x0000003cdd207aa1 in start_thread () from /lib64/libpthread.so.0
#27 0x0000003cdcee8aad in clone () from /lib64/libc.so.6
The way we obtained JNIEnv* eg 我们获得JNIEnv *的方式,例如
JNIEnv *env = 0;
jint result = jvm->GetEnv((void **) &env, JNI_VERSION_1_8);
if (result != JNI_OK) {
result = jvm->AttachCurrentThread((void **) &env, NULL);
After spending days investigating this JNI issue, we have finally found out the reason and I would like to share our experience here so that hopefully it will help others. 在花了几天时间调查这个JNI问题之后,我们终于找到了原因,我想在这里分享我们的经验,希望对其他人有帮助。
First of all, the reason why we needed to use JNI in the first place was because we needed to make use of a 3rd party network library that was a Linux native lib, and unfortunately that was the cause of our problem. 首先,首先需要使用JNI的原因是因为我们需要使用Linux本地库的第3方网络库,不幸的是,这是导致问题的原因。
The library provided us a callback handle that we implemented to receive incoming network messages from it, and this callback, we later found out, was simply a signal handler. 该库为我们提供了一个回调句柄,我们实现了该回调句柄以从中接收传入的网络消息,而我们后来发现, 该回调只是一个信号处理程序。 So, it means that this signal handler would get called whenever a signal popped up, even during gc.
因此,这意味着无论何时出现信号, 即使在gc期间,都会调用此信号处理程序。
Since C threads keep running during safepoints in JVM, it would have been fine if those C threads weren't attached to the JVM, otherwise disasters would certainly strike. 由于C线程在JVM中的安全点期间保持运行,因此如果这些C线程未附加到JVM会很好,否则灾难肯定会发生。
Here is kind of what we thought had happened. 这是我们认为已经发生的事情。 (everything below happened in the JNI layer)
(以下所有内容均发生在JNI层中)
The gdb stacktrace that we were seeing was basically what happened when a gc thread that was actually in a middle of doing some work on the heap and then got a call from our application to do some application work and then a few JNI API calls... BOOM 我们看到的gdb stacktrace基本上是当一个gc线程实际上正在堆上进行一些工作,然后从我们的应用程序中调用进行一些应用程序工作,然后进行了一些JNI API调用时发生的。 。BOOM
Solution: 解:
ps maybe some of the details weren't exactly accurate, so any JVM expert advice is welcomed. ps也许某些细节不完全准确,所以欢迎任何JVM专家建议。 I will try to correct them as advised.
我会尝试按照建议纠正它们。
Thanks 谢谢
Update.1 (@apangin): We have another gdb stacktrace here. Update.1(@apangin):我们在这里还有另一个gdb stacktrace。 Just wondering if the GangWorker at #18 was a parallel GC thread.
只是想知道#18的GangWorker是否是并行GC线程。
#0 0x00000035b90325e5 in raise () from /lib64/libc.so.6
#1 0x00000035b9033dc5 in abort () from /lib64/libc.so.6
#2 0x00007febd60813b5 in os::abort(bool) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#3 0x00007febd6223673 in VMError::report_and_die() () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#4 0x00007febd60868bf in JVM_handle_linux_signal () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#5 0x00007febd607ce13 in signalHandler(int, siginfo*, void*) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#6 <signal handler called>
#7 0x00007feb9fcf551c in JNIEnv_::NewByteArray (this=0x7febd001d9f8, len=8) at /usr/java/jdk1.8.0_131/include/jni.h:1643
*<omitted app specific calls>*
#13 <signal handler called>
#14 0x00000035b980b68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#15 0x00007febd607b7e3 in os::PlatformEvent::park() () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#16 0x00007febd603c037 in Monitor::IWait(Thread*, long) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#17 0x00007febd603c956 in Monitor::wait(bool, long, bool) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#18 0x00007febd6244d6b in GangWorker::loop() () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#19 0x00007febd6082568 in java_start(Thread*) () from /usr/java/jdk1.8.0_131/jre/lib/amd64/server/libjvm.so
#20 0x00000035b9807aa1 in start_thread () from /lib64/libpthread.so.0
#21 0x00000035b90e8aad in clone () from /lib64/libc.so.6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.