简体   繁体   English

GCC,-flto,-fno-builtin和glibc函数的自定义函数实现

[英]GCC, -flto, -fno-builtin and custom function implementation of glibc functions

I'm observing unexpected behaviour (at least I cant find explanation for it) with GCC flag -flto and jemalloc / tcmalloc . 我用GCC标志-fltojemalloc / tcmalloc观察意外的行为(至少我找不到解释)。 Once -flto is used and I link with above libraries malloc/calloc and friends are not replaced by je/tc malloc implementation, the glibc implementation is called. 一旦使用-flto并且我链接上面的库malloc / calloc并且朋友没有被je/tc malloc实现替换,则调用glibc实现。 Once I remove -flto flag, everything works as expected. 删除-flto标志后,一切都按预期工作。 I tried to use -fno-builtin / -fno-builtin-* with -flto but still, it doesnt pick the je/tc malloc implementation. 我尝试使用-fno-builtin / -fno-builtin-*-flto但仍然没有选择je/tc malloc实现。

How the -flto machinery works? -flto机器如何工作? Why the binary doesnt pick new implementation? 为什么二进制文件没有选择新的实现? How it even links with -fno-builtin when it should fail on unresolved external for, say, printf ? 它是如何与-fno-builtin链接的,当它应该在未解析的外部,例如printf上失败?

EDIT001: EDIT001:
GCC 7.3 GCC 7.3
Sample code 示例代码

int main()
{
    auto p = malloc(1024);
    free(p);
    return 0;
}

Compilation: 汇编:

/usr/bin/c++ -O2 -g -DNDEBUG -flto -std=gnu++14 -o CMakeFiles/flto.dir/main.cpp.o -c /home/user/Development/CPPJunk/flto/main.cpp / usr / bin / c ++ -O2 -g -DNDEBUG -flto -std = gnu ++ 14 -o CMakeFiles / flto.dir / main.cpp.o -c /home/user/Development/CPPJunk/flto/main.cpp

Linkage: 连锁:

/usr/bin/c++ -O2 -g -DNDEBUG -flto CMakeFiles/flto.dir/main.cpp.o -o flto -L/home/user/Development/jemalloc -Wl,-rpath,/home/user/Development/jemalloc -ljemalloc / usr / bin / c ++ -O2 -g -DNDEBUG -flto CMakeFiles / flto.dir / main.cpp.o -o flto -L / home / user / Development / jemalloc -Wl,-rpath,/ home / user / Development / jemalloc -ljemalloc

EDIT002: EDIT002:
More suitable sample code 更合适的示例代码

#include <cstdlib>

int main()
{
    auto p = malloc(1024);
    if (p) {
        free(p);
    }

    auto p1 = new int;
    if (p1) {
        delete p1;
    }

    auto p2 = new int[32];
    if (p2) {
        delete[] p2;
    }
    return 0;
}

First, your sample code is wrong. 首先,您的示例代码是错误的。 Read carefully the C11 standard n1570 . 仔细阅读C11标准n1570 When you want to use the standard malloc , you should #include <stdlib.h> . 如果要使用标准 malloc ,则应该#include <stdlib.h>

In C++11 (read n3337 ) malloc is frowned upon and should not be used (prefer new ). 在C ++ 11(阅读n3337 )中, malloc不受欢迎,不应该使用(更喜欢new )。 If you still want to use std::malloc in C++ you should #include <cstdlib> (which, in GCC, is internally including <stdlib.h> ) 如果您仍想在C ++中使用std::malloc ,则应该#include <cstdlib> (在GCC中,内部包含<stdlib.h>

Then your sample code is almost C code (once you replace auto with void* ), not C++. 然后你的示例代码几乎是C代码(一旦你用void*替换auto ),而不是C ++。 It could be optimized (once you include <stdlib.h> ), even without -flto but with just -O3 , according to the as-if rule, to an empty main . 它可以被优化 (一旦你包含<stdlib.h> ),即使没有 -flto但只有-O3 ,根据as-if规则,到一个空的main (I've even wrote a public report, bismon-chariot-doc.pdf , which has a section §1.4.2 explaining in several pages how that optimization happens). (我甚至写了一篇公开报告, bismon-chariot-doc.pdf ,其中有一节§1.4.2,在几个页面中解释了如何进行优化)。

To optimize around malloc and free , GCC uses some __attribute__(malloc) function attribute in the declaration (inside <stdlib.h> ) of malloc . 为了优化周围mallocfree ,GCC使用一些__attribute__(malloc) 功能属性的声明(内<stdlib.h>的) malloc

How the -flto machinery works? -flto机器如何工作?

LTO is explained in GCC internals §25 . LTO在GCC内部结构§25中有解释。

It works by using some internal ( GIMPLE -like and/or SSA -like) representation of the code both at "compile" and at "link" time (actually, the linking step becomes another compilation with whole-program optimization, so your code gets "compiled" twice in practice). 它的工作原理是在“编译”和“链接”时使用代码的一些内部( 类似GIMPLE和/或类似SSA )表示(实际上,链接步骤成为整个程序优化的另一个编译,因此您的代码在实践中被“编译”两次)。

LTO always should (in practice) be used with some optimization flag (eg -O2 or even -O3 ) both at compile and at link time. LTO 总是应该(在实践中)在编译和链接时使用一些优化标志(例如-O2或甚至-O3 )。 So you should compile and link with g++ -flto -O2 (it has no practical sense to use -flto without at least -O2 and the exact same optimization flags should be used at compile and at link time). 所以你应该编译并链接 g++ -flto -O2 (没有实际意义使用-flto 而不至少-O2并且在编译和链接时应该使用完全相同的优化标志)。

More precisely -flto also embeds in the object files some internal ( GIMPLE -like) representation of the source code, and that is also used "at link time" (notably for optimization and inlining happening again when "linking" your entire program, re-using its GIMPLE). 更确切地说-flto还在目标文件中嵌入了源代码的一些内部( 类似GIMPLE )表示,并且也在“链接时”使用(特别是在“链接” 整个程序时再次进行优化内联 ,重新 - 使用它的GIMPLE)。 Actually GCC contains some LTO front-end and compiler called lto1 (in addition of the C++ front-end and compiler called cc1plus ) and lto1 is (when you link with g++ -flto -O2 ) used at link time to reprocess these GIMPLE representations. 其实GCC包含了一些LTO前端和编译器叫做lto1 (除C ++的前端和编译器称为cc1plus )和lto1是(当你与链接 g++ -flto -O2 ),用于在链接时再处理这些GIMPLE表示。

Probably, libjemalloc has its own headers, and might have inline (or inlinable) functions. libjemalloc可能有自己的头文件,可能有inline (或无法使用)功能。 Then you also need to use -flto -O2 when compiling that library from its source code (so that its Gimple is stored in the library) 然后你还需要在从源代码编译该库时使用-flto -O2 (以便它的Gimple存储在库中)

At last, the fact that the usual malloc gets called is independent of -flto . 最后,通常的malloc被调用的事实与-flto无关。 It is a linker issue, not a compiler one. 它是一个链接器问题,而不是编译器问题。 You could try to link -ljemalloc statically (and then you'll better build that library also with gcc -flto -O2 ; if you don't build it like that you won't get LTO optimizations across malloc calls). 您可以尝试静态链接-ljemalloc (然后您最好使用gcc -flto -O2构建该库;如果您不构建它,则不会在malloc调用中获得LTO优化)。

You could pass also -v to your compilation and linking commands to understand what g++ is doing. 您也可以将-v传递给编译和链接命令,以了解g++正在做什么。 You could even pass -Wl,--verbose to ask the ld (started by g++ ) to be verbose. 你甚至可以传递-Wl,--verbose来询问ld (由g++开始)是冗长的。

Notice that LTO (and the internal representations that it is using) is compiler and version specific. 请注意,LTO(以及它正在使用的内部表示)是编译器和版本特定的。 The internal (Gimple & SSA ) representation is slightly different between GCC 7 & GCC 8 (and in Clang it is very different, so of course incompatible). 内部(Gimple和SSA )表示在GCC 7GCC 8之间略有不同(并且在Clang中它是非常不同的,因此当然是不兼容的)。 The dynamic linker ld-linux(8) does not know about LTO. 动态链接器ld-linux(8)不知道LTO。

PS. PS。 You could install the libjemalloc-dev package and add #include <jemalloc/jemalloc.h> in your code. 您可以安装libjemalloc-dev包并在代码中添加#include <jemalloc/jemalloc.h> See also jemalloc(3) man page. 另见jemalloc(3)手册页。 Probably libjemalloc could be configured or patched to define some je_malloc symbol as a replacement for malloc . 大概libjemalloc可以被配置或修补,以限定一些je_malloc符号作为用于替换malloc Then it would be simpler (for LTO) to use je_malloc in your code (to avoid conflict between several malloc ELF symbols). 然后,在代码中使用je_malloc会更简单(对于LTO)(以避免几个malloc ELF符号之间的冲突)。 To learn more about symbols in shared libraries, read Drepper's How to Write Shared Libraries paper. 要了解有关共享库中符号的更多信息,请阅读Drepper的“ 如何编写共享库”文章。 And of course you should expect LTO to change the behavior of linking! 当然,你应该期望LTO改变链接的行为!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM