C ++编译器如何优化模板代码？

Question

How do compilers avoid linear growth in the size of the compiled binary with each new type instantiation of a template? 编译器如何通过模板的每个新类型实例化来避免编译二进制文件大小的线性增长？

I don't see how we can avoid making a copy of all the templated code when a new instantiation is used. 我没有看到当使用新的实例化时我们如何避免复制所有模板化代码。

I feel that compile times and binary sizes would be made extremely unwieldy for all but the simplest templates in a reasonably large code base. 我觉得编译时间和二进制大小对于除了相当大的代码库中最简单的模板之外的所有人来说都是非常笨重的。 But their prevalence suggests that compilers are able to do some magic to make them practical. 但是它们的流行表明编译器能够做一些魔术来使它们变得实用。

Answer 1

Many template functions are small enough to inline effectively, so you do get linear growth in the binary - but it is no more than you would get with equivalent non-template functions. 许多模板功能是足够小，有效地串联，所以你得到的二元线性增长-但它是没有比你更会用同等的非模板函数得到。

The One Definition Rule is important here, as it allows the compiler to assume that any template instantiation with identical template parameters generates identical code. 一个定义规则在这里很重要，因为它允许编译器假设具有相同模板参数的任何模板实例化生成相同的代码。 If it detects that the template function has already been instantiated earlier in a source file, it can use that copy instead of generating a new one. 如果它检测到模板函数早先已在源文件中实例化，则它可以使用该副本而不是生成新副本。 Name mangling makes it possible for a linker to recognize the same function from different compiled sources. 名称修改使链接器可以从不同的编译源识别相同的函数。 None of this is guaranteed since your program shouldn't be able to tell the difference between identical copies of a function, but compilers do harder optimizations than this every day. 这些都不能保证，因为您的程序不能区分函数的相同副本之间的区别，但编译器每天都会做比这更难的优化。

The one time that duplicates are required to be filtered out is when a function contains a static variable - there can only be one copy. 需要过滤重复的一次是当函数包含静态变量时 - 只能有一个副本。 But that can be achieved either by filtering out the duplicate functions, or filtering out the static variables themselves. 但这可以通过过滤掉重复的函数或过滤掉静态变量本身来实现。

Answer 2

There are multiple things which result in multiple instantiations not being too harmful to the exacutable size: 有多种因素会导致多个实例化对exacutable大小没有太大的危害：

Many templates are just passing things through to another layer. 许多模板只是将事物传递到另一层。 Although there may be quite a bit of code it mostly disappears when the code is instantiated and inlined. 尽管可能存在相当多的代码，但在代码实例化和内联时，它们大多会消失。 Note inlining [and doing some optimizations] can easily result in bigger code, though. 注意内联[并进行一些优化]很容易导致更大的代码。 Note that inlining small functions often results in smaller (and faster) code (basically because the otherwise necessary calling sequence often requires more instructions than what is inlined and the optimizer gets a better chance to further reduce the code by a more holistic view of what's going on). 请注意，内联小函数通常会导致更小（和更快）的代码（主要是因为其他必要的调用序列通常需要比内联更多的指令，并且优化器通过更全面的视图来进一步减少代码的更好机会上）。
Where template code isn't inlined, duplicate instantiations in different translation units need to be merged into just one instantiation. 在没有内联模板代码的情况下，不同翻译单元中的重复实例化需要合并到一个实例中。 I'm not a linker expert but my understanding is that, eg, ELF uses different sections and the linker can choose to include only those sections which are actually used. 我不是链接器专家，但我的理解是，例如，ELF使用不同的部分，链接器可以选择仅包括实际使用的那些部分。
In bigger executables you'll need some vocabulary types and instantiations which used in many places and effectively shared. 在较大的可执行文件中，您需要一些词汇类型和实例，这些词汇类型和实例在许多地方使用并有效地共享。 Doing everything using a custom type would be bad idea and type erasure is certainly an important tool to avoid too many types. 使用自定义类型执行所有操作都是不好的主意，类型擦除肯定是避免类型太多的重要工具。

That said, where possible it does pay off to preinstantiate templates, especially if there are only a small number of instantations which are generally used. 也就是说，在可能的情况下，预先实例化模板会有所回报，特别是如果通常只使用少量的即时消息。 A great example is the IOStreams library which is unlikely to be used with more than 4 types (typically it is used with just one): moving the template definitions and their instantiations into separate translation units may not reduce the executable size but will certainly reduce the compile time! 一个很好的例子是IOStreams库，它不太可能与超过4种类型一起使用（通常它只与一个一起使用）：将模板定义及其实例化移动到单独的转换单元中可能不会减少可执行文件的大小，但肯定会减少编译时间！ Starting with C++11 it is possible to declare template instantiations as extern which allows the definitions to be visible without getting implicitly instantiated on specializations which are known to be instantiated elsewhere. 从C ++ 11开始，可以将模板实例化声明为extern ，这允许定义可见而不会在已知在其他地方实例化的特化上进行隐式实例化。

Answer 3

I think you're misunderstanding how templates are implemented. 我认为你误解了模板的实现方式。 Templates are compiled on a need-to-use basis into a corresponding class/function. 模板在需要使用的基础上编译成相应的类/函数。

Consider the following code... 考虑以下代码......

template <typename Type>
Type mymax(Type a, Type b) {
    return a > b ? a : b;
}

int main(int argc, char** argv)
{
}

Compiling this, I get the following assembly. 编译这个，我得到以下程序集。

    .file   "example.cpp"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    %edi, -4(%rbp)
    movq    %rsi, -16(%rbp)
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1"
    .section    .note.GNU-stack,"",@progbits

You'll notice it only contains the main function. 你会发现它只包含主要功能。 Now I update my code to use the template function. 现在我更新我的代码以使用模板功能。

int main(int argc, char** argv)
{
    mymax<double>(3,4);
}

Compiling that I get a much longer assembly output including the template function to handle doubles. 编译我得到一个更长的程序集输出，包括处理双精度的模板函数。 The compiler saw the template function was being used by the type "double" so made a function to handle that case. 编译器看到模板函数被“double”类型使用，因此创建了一个处理该情况的函数。

    .file   "example.cpp"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -4(%rbp)
    movq    %rsi, -16(%rbp)
    movabsq $4616189618054758400, %rdx
    movabsq $4613937818241073152, %rax
    movq    %rdx, -24(%rbp)
    movsd   -24(%rbp), %xmm1
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    call    _Z5mymaxIdET_S0_S0_
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .section    .text._Z5mymaxIdET_S0_S0_,"axG",@progbits,_Z5mymaxIdET_S0_S0_,comdat
    .weak   _Z5mymaxIdET_S0_S0_
    .type   _Z5mymaxIdET_S0_S0_, @function
_Z5mymaxIdET_S0_S0_:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movsd   %xmm0, -8(%rbp)
    movsd   %xmm1, -16(%rbp)
    movsd   -8(%rbp), %xmm0
    ucomisd -16(%rbp), %xmm0
    jbe .L9
    movq    -8(%rbp), %rax
    jmp .L6
.L9:
    movq    -16(%rbp), %rax
.L6:
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size   _Z5mymaxIdET_S0_S0_, .-_Z5mymaxIdET_S0_S0_
    .ident  "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1"
    .section    .note.GNU-stack,"",@progbits

Now let's say I change the code to use that function twice. 现在让我们说我改变代码以使用该函数两次。

int main(int argc, char** argv)
{
    mymax<double>(3,4);
    mymax<double>(4,5);

}

Again, let's look at the assembly it creates. 再次，让我们看看它创建的程序集。 It's comparable to the previous output because most of that code was just the compiler creating the function mymax where "Type" is changed to a double. 它与之前的输出相当，因为大多数代码只是编译器创建函数mymax，其中“Type”更改为double。 No matter how many times I use that function, it will only be declared once. 无论我使用该函数多少次，它都只会被声明一次。

    .file   "example.cpp"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -4(%rbp)
    movq    %rsi, -16(%rbp)
    movabsq $4616189618054758400, %rdx
    movabsq $4613937818241073152, %rax
    movq    %rdx, -24(%rbp)
    movsd   -24(%rbp), %xmm1
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    call    _Z5mymaxIdET_S0_S0_
    movabsq $4617315517961601024, %rdx
    movabsq $4616189618054758400, %rax
    movq    %rdx, -24(%rbp)
    movsd   -24(%rbp), %xmm1
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    call    _Z5mymaxIdET_S0_S0_
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .section    .text._Z5mymaxIdET_S0_S0_,"axG",@progbits,_Z5mymaxIdET_S0_S0_,comdat
    .weak   _Z5mymaxIdET_S0_S0_
    .type   _Z5mymaxIdET_S0_S0_, @function
_Z5mymaxIdET_S0_S0_:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movsd   %xmm0, -8(%rbp)
    movsd   %xmm1, -16(%rbp)
    movsd   -8(%rbp), %xmm0
    ucomisd -16(%rbp), %xmm0
    jbe .L9
    movq    -8(%rbp), %rax
    jmp .L6
.L9:
    movq    -16(%rbp), %rax
.L6:
    movq    %rax, -24(%rbp)
    movsd   -24(%rbp), %xmm0
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size   _Z5mymaxIdET_S0_S0_, .-_Z5mymaxIdET_S0_S0_
    .ident  "GCC: (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1"
    .section    .note.GNU-stack,"",@progbits

So basically templates don't affect the exec size any more than writing the functions by hand. 因此，基本上模板不会影响exec大小，只需手动编写函数即可。 It's just a convenience. 这只是一个方便。 The compiler will create a function for one or more uses of a given type so if I use it 1 or 1000 times, there will only be one instance of it. 编译器将为给定类型的一个或多个用途创建一个函数，因此如果我使用它1或1000次，则只有一个实例。 Now if I update my code to also handle a new type like floats, I'll get another function in my executable, but only one no matter how many times I use that function. 现在，如果我更新我的代码以处理像浮点数这样的新类型，我将在我的可执行文件中获得另一个函数，但无论我使用该函数多少次都只有一个函数。

C ++编译器如何优化模板代码？

问题描述

3 个解决方案

解决方案1
6 2013-12-11 20:47:01

解决方案2
5 2013-12-11 20:50:18

解决方案3
3 2013-12-11 20:47:04

C ++编译器如何优化模板代码？

问题描述

3 个解决方案

解决方案1 6 2013-12-11 20:47:01

解决方案2 5 2013-12-11 20:50:18

解决方案3 3 2013-12-11 20:47:04

解决方案1
6 2013-12-11 20:47:01

解决方案2
5 2013-12-11 20:50:18

解决方案3
3 2013-12-11 20:47:04