简体   繁体   English

调用@plt函数时,在dlopen / static init上共享库SIGSEGV

[英]Shared library SIGSEGV on dlopen / static init when calling @plt function

My app dlopens a library with static initialization code. 我的应用程序使用静态初始化代码调整库。 All other libraries do the same and are loaded fine before, but this one dies, when calling a function from another library. 所有其他库都做同样的事情并且之前加载得很好,但是当从另一个库调用函数时,这个库就会死掉。 This is something like: 这类似于:

0x12311 <-- bad address
_static_initialization_0 <-- function call
....
dlopen

Now, the function call in the disassembly looks like 现在,反汇编中的函数调用看起来像

call _Z6MyFuncRA37_Kc@plt

However this call ends up calling invalid address 0x12311, ie the PLT entry gets the wrong address. 但是,此调用最终调用无效地址0x12311,即PLT条目获取错误的地址。

The problem is highly possible that the library in question is kind of 3rd-party one, ie comes in binary prebuilt form even though it depends on other libraries. 问题很可能是所讨论的库是第三方库,即使它依赖于其他库,它以二进制预构建形式出现。 Previous week we did a big optimization and changed a lot of headers and so on. 上周我们做了一个很大的优化,并改变了很多标题,等等。 The function MyFunc whos PLT is wrong is located in our (another) library, that got massive optimization changes. PLT错误的函数MyFunc位于我们的(另一个)库中,它进行了大量的优化更改。

How this is possible? 这怎么可能? The exact question is: 确切的问题是:

  1. what is the mechanism that causes PLT mismatch 什么是导致PLT不匹配的机制
  2. is there a way to fix it without touching the precompiled library - OPTIONAL as I could get the rebuilt version, but I'm still curious why it crashes 有没有办法解决它而不触及预编译库 - 可选,因为我可以获得重建版本,但我仍然好奇为什么它崩溃

Also, the same app works fine when compiled with -O2 optimization, which is what I call strange (the binary library is same in both cases). 此外,使用-O2优化编译时,相同的应用程序工作正常,这就是我所说的奇怪(二进制库在两种情况下都相同)。

PS ubuntu 12.04 x86_64 but app is i386. PS ubuntu 12.04 x86_64但应用程序是i386。

UPDATE: The suggestion in comments (deleted for some reason) to check LD_DEBUG was good, in LD_DEBUG=bindings I see this in the "crashing" version of app: 更新:注释中的建议(由于某种原因删除)检查LD_DEBUG是好的,在LD_DEBUG =绑定中我在app的“崩溃”版本中看到了这一点:

 10272:  /media/EXT/work/build32/bin/libMyLib.so: error: 
    symbol lookup error: undefined symbol: omp_set_num_threads (fatal)

And then it stops binding libMyLib.so symbols, while in non-failing version it keeps binding other symbols. 然后它会停止绑定libMyLib.so符号,而在非失败版本中它会保持绑定其他符号。 But I don't understand why it then continues execution and tries to load the parent library. 但我不明白为什么它继续执行并尝试加载父库。 Actually the scheme is as follows: 实际上该计划如下:

libA -> libB -> libMyLib

libMyLib fails (as indicated by LD_DEBUG output above) so it skips it and also libB completely (!) and continues with binding libA symbols. libMyLib失败(如上面的LD_DEBUG输出所示),因此它会跳过它并完全跳过libB(!)并继续绑定libA符号。 The non-failing version fully loads libMyLib symbols, then continues with libB symbols, and then with libA symbols. 非失败版本完全加载libMyLib符号,然后继续使用libB符号,然后使用libA符号。

Frankly to me it looks like ld bug. 坦率地说,它看起来像ld bug。

As for why optimized version works I suppose omp_ method is not really needed and is thrown out by linker optimization, thus it does not fail to find it at runtime. 至于为什么优化版本工作,我想omp_方法并不是真正需要的,并且被链接器优化抛出,因此它在运行时找不到它。

Here's what I see in LD_DEBUG=all log after the omp_ symbol is not found for libC: 这是我在LD_DEBUG中看到的=在找不到libC的omp_符号后的所有日志:

19225: symbol=omp_set_num_threads; lookup in file=/usr/lib/i386-linux-gnu/libXdmcp.so.6 [0]
19225: /media/EXT/Work/libC.so: error: symbol lookup error: undefined symbol: omp_set_num_threads (fatal)
19225:
19225: file=/media/EXT/libA.so [0]; destroying link map
19225:
19225: file=/media/EXT/libA.so [0]; dynamically loaded by /media/EXT/libX.so [0]
19225: file=/media/EXT/libA.so [0]; generating link map
19225: dynamic: 0xf2fdb764 base: 0xf2f81000 size: 0x00064a28
19225: entry: 0xf2f8ffd0 phdr: 0xf2f81034 phnum: 7
19225:
19225: checking for version `GCC_3.0' in file /lib/i386-linux-gnu/libgcc_s.so.1 [0] required by file /media/EXT/libA.so [0]
... few more checking
19225: object=/media/EXT/libA.so [0]
19225: scope 0: bin/mainapp /lib/i386-linux-gnu/libpthread.so.0 /media/EXT/libX.so ...
19225: scope 1:...
19225:
19225:
19225: relocation processing: /media/EXT/libA.so
19225: symbol=_ZTVN10__cxxabiv117__class_type_infoE; lookup in file=bin/mainapp [0]
19225: symbol=_ZTVN10__cxxabiv117__class_type_infoE; lookup in file=/lib/i386-linux-gnu/libpthread.so.0 [0]
19225: symbol=_ZTVN10__cxxabiv117__class_type_infoE; lookup in file=/media/EXT/libX.so [0]
19225: binding file /media/EXT/libA.so [0] to /media/EXT/libX.so [0]: normal symbol `_ZTVN10__cxxabiv117__class_type_infoE'

... here it continues to bind libA symbols, and after finishing that

19225:
19225:
19225: calling init: /media/EXT/libC.so
19225:

it calls init for the non-initialized libC.so module. 它为非初始化的libC.so模块调用init。

(Just to mention libX.so is the base module that calls dlopen and also contains basic methods used by all other libs.) (只是提到libX.so是调用dlopen的基本模块,还包含所有其他库使用的基本方法。)

After destroying link map for libA the log shows that it is generated again, I just don't understand if loader continues to load libA or starts from scratch this time without bothering about libB/libC. 在销毁了libA的链接映射后,日志显示它再次生成,我只是不明白加载器是否继续加载libA或者这次从头开始而不打扰libB / libC。 Well, it ignores libB/libC in any case until init is called for libC. 好吧,它在任何情况下都会忽略libB / libC,直到为libC调用init。

omp_set_num_threads is related to OpenMP support inside GCC. omp_set_num_threads与GCC内部的OpenMP支持有关。

You probably should pass the -fopenmp flag to gcc at compile & link times (even if you are just dlopen -ing a library using OpenMP). 你或许应该通过-fopenmp标志在编译和链接时间与gcc(即使你只是dlopen -ing使用OpenMP库)。

Maybe the original library provider forgot that. 也许原来的图书馆提供商忘记了。

(OpenMP is altering the entire behavior of the compilation process) (OpenMP正在改变编译过程的整个行为)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从共享库的导出 function 获取实例时 dlopen 分段错误错误 - dlopen Segmentation fault error when get a instance from shared library's exported function 使用`dlopen`时有关“未定义符号”的共享库错误 - shared library error about “undefined symbol” when using `dlopen` C-共享库-dlopen,dlsym - C - Shared Library - dlopen, dlsym 当动态库使用静态库的符号时,请从静态库中打开一个动态库 - dlopen a dynamic library from a static library, when the dynamic library uses symbols of the static one Linux 共享库 init 和 deinit 也使用 c++ 静态初始化程序 - Linux shared library init & deinit when also using c++ static initializer 从 python 调用 openMP 共享库时未定义的 opnMP function - Undefined opnMP function when calling a openMP shared library from python 调用 static c++ class 方法时的堆栈溢出 (SIGSEGV) - Stack overflow (SIGSEGV) when calling static c++ class method 为什么dlopen加载的主可执行文件和共享库共享一个命名空间静态变量的副本? - Why the main executable and a shared library loaded by dlopen share one copy of a namespace static variable? 使用dlopen动态加载共享库 - Loading shared library dynamically using dlopen 无法从共享库中删除共享库,只能从可执行文件中删除共享库 - Cannot dlopen a shared library from a shared library, only from executables
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM