简体   繁体   English

多线程进程中的处理程序未捕获 SIGSEGV 信号

[英]SIGSEGV signal not caught by handler in multi threaded process

I am working on a Vulkan layer that intercepts all Vulkan calls and writes them in a file.我正在开发一个 Vulkan 层,该层拦截所有 Vulkan 调用并将它们写入文件中。 When it comes to dealing with device memory mapped to user space the layer detects reads and writes from the user by protecting memory regions with mprotect .在处理映射到用户空间的设备内存时,该层通过使用mprotect保护内存区域来检测用户的读取和写入。

The idea is that when the user requests to vkMapMemory some memory then the layer allocates a memory using mmap with the PROT_READ | PROT_WRITE这个想法是,当用户向vkMapMemory请求一些内存时,该层使用带有PROT_READ | PROT_WRITEmmap分配内存。 PROT_READ | PROT_WRITE flags set, sets up a signal handler for SIGSEGV with sigaction , mprotect s it against both reading and writing and returns the base pointer to the user.设置PROT_READ | PROT_WRITE标志,使用sigactionSIGSEGV设置信号处理程序, mprotect防止读取和写入,并将基指针返回给用户。 Any access will trigger a SIGSEGV and the handler takes care of the rest.任何访问都将触发SIGSEGV ,而处理程序会负责其余的工作。 So far so good.到目前为止,一切都很好。

The problem that I'm dealing with is a case where there are 2 threads that allocate and access the said memory regions, and the moment one of the threads access one of said memory regions, the generated SIGSEGV is not directed to handler but instead terminates the application with segmentation fault.我正在处理的问题是,有 2 个线程分配和访问所述内存区域,并且当其中一个线程访问所述内存区域之一时,生成的SIGSEGV不定向到处理程序,而是终止具有分段错误的应用程序。 The other thread works as expected.另一个线程按预期工作。

Bellow is an extract of what's going on before the crash:贝娄是崩溃前发生的事情的摘录:

[59588] AddExceptionHandler() -> sigaction SIGSEGV *******************
[59588] SetMemoryProtection() mprotect: ptr: 0x7fec0819d000 - 0x7fec0839d000 size: 2097152 mask: 0x0
[59588] MapMemory <-- *ppData: 0x7fec0819d000
[59588] PageGuardExceptionHandler()
  [59588] HandleGuardPageViolation() address: 0x7fec0819d000 is_write: 1 clear_guard: 1
  [59588] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59588] SetMemoryProtection() mprotect: ptr: 0x7fec0819d000 - 0x7fec0819e000 size: 4096 mask: 0x3
[59588] PageGuardExceptionHandler() <-- (handled: true)
[59588] MapMemory(memory: 0x5575f2d5cb20 size: 2097152)
[59588] mmap: 0x7fec013fb000 - 0x7fec015fb000 size: 2097152
[59588] SetMemoryProtection() mprotect: ptr: 0x7fec013fb000 - 0x7fec015fb000 size: 2097152 mask: 0x0
[59588] MapMemory <-- *ppData: 0x7fec013fb000
[59588] PageGuardExceptionHandler()
  [59588] HandleGuardPageViolation() address: 0x7fec013fb000 is_write: 1 clear_guard: 1
  [59588] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59588] SetMemoryProtection() mprotect: ptr: 0x7fec013fb000 - 0x7fec013fc000 size: 4096 mask: 0x3
[59588] PageGuardExceptionHandler() <-- (handled: true)
[59588] AllocateMemory(size: 4194304) *pMemory: 0x5575f2f5c080
[59588] AllocateMemory(size: 537600) *pMemory: 0x5575f2f5c710

Test case 'dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.dynamically_uniform.geometry.samplercubearray'..
[59607] AllocateMemory(size: 2097152) *pMemory: 0x7febe80016c0
[59607] MapMemory(memory: 0x7febe80016c0 size: 2097152)
[59607] mmap: 0x7fec007fa000 - 0x7fec009fa000 size: 2097152
[59588] PageGuardExceptionHandler()
  [59588] HandleGuardPageViolation() address: 0x7fec0143b9e0 is_write: 1 clear_guard: 1
  [59588] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59588] SetMemoryProtection() mprotect: ptr: 0x7fec0143b000 - 0x7fec0143c000 size: 4096 mask: 0x3
[59588] PageGuardExceptionHandler() <-- (handled: true)
[59588] PageGuardExceptionHandler()
  [59588] HandleGuardPageViolation() address: 0x7fec0829d000 is_write: 1 clear_guard: 1
  [59588] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59588] SetMemoryProtection() mprotect: ptr: 0x7fec0829d000 - 0x7fec0829e000 size: 4096 mask: 0x3
[59588] PageGuardExceptionHandler() <-- (handled: true)
[59607] SetMemoryProtection() mprotect: ptr: 0x7fec007fa000 - 0x7fec009fa000 size: 2097152 mask: 0x0
[59607] MapMemory <-- *ppData: 0x7fec007fa000
[59607] util_copy_rect() 1 src: 0x5575f2e6c468 dst: 0x7fec007fa000 size: 4
Segmentation fault (core dumped)

So at the end it is visible that the second thread enters the game, requests one of said memory regions, it is interrupted by the first thread's signal handler for SIGSEGV s that take place in the 1st thread, then continues mprotect ing the new region, accesses it but the handler is not called.所以最后可以看到第二个线程进入游戏,请求其中一个内存区域,它被第一个线程中发生的SIGSEGV的第一个线程的信号处理程序中断,然后继续mprotect新区域,访问它但不调用处理程序。

A slightly different order of things before the crash takes places when running with Valgrind:使用 Valgrind 运行时,崩溃前发生的事情顺序略有不同:

[59840] PageGuardExceptionHandler()
  [59840] HandleGuardPageViolation() address: 0x14e08000 is_write: 1 clear_guard: 1
  [59840] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59840] SetMemoryProtection() mprotect: ptr: 0x14e08000 - 0x14e09000 size: 4096 mask: 0x3
[59840] PageGuardExceptionHandler() <-- (handled: true)
[59840] PageGuardExceptionHandler()
  [59840] HandleGuardPageViolation() address: 0x14ac8000 is_write: 1 clear_guard: 1
  [59840] 1107:HandleGuardPageViolation() -> SetMemoryProtection()
  [59840] SetMemoryProtection() mprotect: ptr: 0x14ac8000 - 0x14ac9000 size: 4096 mask: 0x3
[59840] PageGuardExceptionHandler() <-- (handled: true)
[59881] AllocateMemory(size: 2097152) *pMemory: 0xe695c80
[59881] MapMemory(memory: 0xe695c80 size: 2097152)
[59881] mmap: 0x161c9000 - 0x163c9000 size: 2097152
[59881] SetMemoryProtection() mprotect: ptr: 0x161c9000 - 0x163c9000 size: 2097152 mask: 0x0
[59881] MapMemory <-- *ppData: 0x161c9000
[59881] util_copy_rect() 1 src: 0xec7a0d8 dst: 0x161c9000 size: 4
==59840== 
==59840== Process terminating with default action of signal 11 (SIGSEGV)
==59840==  Bad permissions for mapped region at address 0x161C9000
==59840==    at 0x4842B33: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==59840==    by 0x5EFB4CE: util_copy_box (u_surface.c:78)
==59840==    by 0x611CA0B: u_default_texture_subdata (u_transfer.c:71)
==59840==    by 0x5F05BCC: tc_call_texture_subdata (u_threaded_context.c:2529)
==59840==    by 0x5F01031: tc_batch_execute (u_threaded_context.c:213)
==59840==    by 0x584334B: util_queue_thread_func (u_queue.c:313)
==59840==    by 0x5842F5A: impl_thrd_routine (threads_posix.h:87)
==59840==    by 0x49A7608: start_thread (pthread_create.c:477)
==59840==    by 0x4E51132: clone (clone.S:95)

The only difference in this case is that the second thread is not interrupted.在这种情况下,唯一的区别是第二个线程没有被中断。

Running the process with taskset 1 , which essentially makes everything sequential makes the problem to go away.使用taskset 1运行该过程,这基本上使所有事情都按顺序进行,从而使问题消失。

I don't understand why the second thread's SIGSEGV is not being caught by the handler我不明白为什么第二个线程的SIGSEGV没有被处理程序捕获

Edit: An important thing I forgot to mention is that the mesa library is also in the picture.编辑:我忘了提到的一件重要的事情是 mesa 库也在图片中。 The logs are from a gles application (cts test) running on zink which translates it into Vulkan calls.日志来自在 zink 上运行的 gles 应用程序(cts 测试),该应用程序将其转换为 Vulkan 调用。 The multiple threads are generated, if I'm not mistaken, by mesa.如果我没记错的话,多个线程是由 mesa 生成的。

Kinda figured out my problem.有点想通了我的问题。 Somehow SIGSEGV gets blocked.不知何故SIGSEGV被阻止。 I'm not sure how this happens as setting a breakpoint with gdb on sigprocmask never triggers.我不确定这是如何发生的,因为在sigprocmask上使用 gdb 设置断点永远不会触发。

Edit: It is blocked with pthread_sigmask编辑:它被pthread_sigmask阻塞

Re-enabling it each time with:每次重新启用它:

sigset_t x;
sigemptyset(&x);
sigaddset(&x, SIGSEGV);
sigprocmask(SIG_UNBLOCK, &x, NULL);

solves my issue (and possibly creates another one).解决了我的问题(并可能创建另一个问题)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM