简体   繁体   English

std::array 在 g++ 上进行聚合初始化会生成大量代码

[英]std::array with aggregate initialization on g++ generates huge code

On g++ 4.9.2 and 5.3.1, this code takes several seconds to compile and produces a 52,776 byte executable:在 g++ 4.9.2 和 5.3.1 上,这段代码需要几秒钟的时间来编译并生成一个 52,776 字节的可执行文件:

#include <array>
#include <iostream>

int main()
{
    constexpr std::size_t size = 4096;

    struct S
    {
        float f;
        S() : f(0.0f) {}
    };

    std::array<S, size> a = {};  // <-- note aggregate initialization

    for (auto& e : a)
        std::cerr << e.f;

    return 0;
}

Increasing size seems to increase compilation time and executable size linearly.增加size似乎会线性增加编译时间和可执行文件的大小。 I cannot reproduce this behaviour with either clang 3.5 or Visual C++ 2015. Using -Os makes no difference.我无法使用 clang 3.5 或 Visual C++ 2015 重现此行为。使用-Os没有区别。

$ time g++ -O2 -std=c++11 test.cpp
real    0m4.178s
user    0m4.060s
sys     0m0.068s

Inspecting the assembly code reveals that the initialization of a is unrolled, generating 4096 movl instructions:检查汇编代码显示a的初始化已展开,生成4096movl指令:

main:
.LFB1313:
    .cfi_startproc
    pushq   %rbx
    .cfi_def_cfa_offset 16
    .cfi_offset 3, -16
    subq    $16384, %rsp
    .cfi_def_cfa_offset 16400
    movl    $0x00000000, (%rsp)
    movl    $0x00000000, 4(%rsp)
    movq    %rsp, %rbx
    movl    $0x00000000, 8(%rsp)
    movl    $0x00000000, 12(%rsp)
    movl    $0x00000000, 16(%rsp)
       [...skipping 4000 lines...]
    movl    $0x00000000, 16376(%rsp)
    movl    $0x00000000, 16380(%rsp)

This only happens when T has a non-trivial constructor and the array is initialized using {} .只有当T有一个非平凡的构造函数并且数组是使用{}初始化时才会发生这种情况。 If I do any of the following, g++ generates a simple loop:如果我执行以下任一操作,g++ 会生成一个简单的循环:

  1. Remove S::S() ;删除S::S()
  2. Remove S::S() and initialize S::f in-class;删除S::S()并在类中初始化S::f
  3. Remove the aggregate initialization ( = {} );移除聚合初始化( = {} );
  4. Compile without -O2 .在没有-O2的情况下编译。

I'm all for loop unrolling as an optimization, but I don't think this is a very good one.我完全支持循环展开作为一种优化,但我认为这不是一个很好的优化。 Before I report this as a bug, can someone confirm whether this is the expected behaviour?在我将此报告为错误之前,有人可以确认这是否是预期的行为吗?

[edit: I've opened a new bug for this because the others don't seem to match. [编辑:我为此打开了一个新错误,因为其他人似乎不匹配。 They were more about long compilation time than weird codegen.]他们更多的是关于长编译时间而不是奇怪的代码生成。]

There appears to be a related bug report, Bug 59659 - large zero-initialized std::array compile time excessive .似乎有一个相关的错误报告,错误 59659 - 大的零初始化 std::array 编译时间过长 It was considered "fixed" for 4.9.0, so I consider this testcase either a regression or an edgecase not covered by the patch.它被认为是 4.9.0 的“固定”,所以我认为这个测试用例要么是回归,要么是补丁未涵盖的边缘情况。 For what it's worth, two of the bug report's test cases 1 , 2 exhibit symptoms for me on both GCC 4.9.0 as well as 5.3.1值得一提的是,错误报告的两个测试用例12在 GCC 4.9.0 和 5.3.1 上都表现出症状

There are two more related bug reports:还有两个相关的错误报告:

Bug 68203 - Аbout infinite compilation time on struct with nested array of pairs with -std=c++11错误 68203 - 带有 -std=c++11 的嵌套数组对结构的无限编译时间

Andrew Pinski 2015-11-04 07:56:57 UTC安德鲁平斯基 2015-11-04 07:56:57 UTC

This is most likely a memory hog which is generating lots of default constructors rather than a loop over them.这很可能是一个内存占用,它生成了许多默认构造函数,而不是对它们进行循环。

That one claims to be a duplicate of this one:那个声称是这个的复制品:

Bug 56671 - Gcc uses large amounts of memory and processor power with large C++11 bitsets错误 56671 - Gcc 使用大量内存和处理器能力以及大型 C++11 位集

Jonathan Wakely 2016-01-26 15:12:27 UTC乔纳森·韦克利 2016-01-26 15:12:27 UTC

Generating the array initialization for this constexpr constructor is the problem:为这个 constexpr 构造函数生成数组初始化是问题所在:

 constexpr _Base_bitset(unsigned long long __val) noexcept : _M_w{ _WordT(__val) } { }

Indeed if we change it to S a[4096] {};事实上,如果我们将其更改为S a[4096] {}; we don't get the problem.我们不明白这个问题。


Using perf we can see where GCC is spending most of its time.使用perf我们可以看到 GCC 大部分时间都花在了什么地方。 First:第一的:

perf record g++ -std=c++11 -O2 test.cpp

Then perf report :然后性能perf report

  10.33%  cc1plus   cc1plus                 [.] get_ref_base_and_extent
   6.36%  cc1plus   cc1plus                 [.] memrefs_conflict_p
   6.25%  cc1plus   cc1plus                 [.] vn_reference_lookup_2
   6.16%  cc1plus   cc1plus                 [.] exp_equiv_p
   5.99%  cc1plus   cc1plus                 [.] walk_non_aliased_vuses
   5.02%  cc1plus   cc1plus                 [.] find_base_term
   4.98%  cc1plus   cc1plus                 [.] invalidate
   4.73%  cc1plus   cc1plus                 [.] write_dependence_p
   4.68%  cc1plus   cc1plus                 [.] estimate_calls_size_and_time
   4.11%  cc1plus   cc1plus                 [.] ix86_find_base_term
   3.41%  cc1plus   cc1plus                 [.] rtx_equal_p
   2.87%  cc1plus   cc1plus                 [.] cse_insn
   2.77%  cc1plus   cc1plus                 [.] record_store
   2.66%  cc1plus   cc1plus                 [.] vn_reference_eq
   2.48%  cc1plus   cc1plus                 [.] operand_equal_p
   1.21%  cc1plus   cc1plus                 [.] integer_zerop
   1.00%  cc1plus   cc1plus                 [.] base_alias_check

This won't mean much to anyone but GCC developers but it's still interesting to see what's taking up so much compilation time.这对 GCC 开发人员以外的任何人都没有多大意义,但看看是什么占用了如此多的编译时间仍然很有趣。


Clang 3.7.0 does a much better job at this than GCC. Clang 3.7.0 在这方面比 GCC 做得更好。 At -O2 it takes less than a second to compile, produces a much smaller executable (8960 bytes) and this assembly:-O2下,编译时间不到一秒钟,生成的可执行文件要小得多(8960 字节),并且此程序集:

0000000000400810 <main>:
  400810:   53                      push   rbx
  400811:   48 81 ec 00 40 00 00    sub    rsp,0x4000
  400818:   48 8d 3c 24             lea    rdi,[rsp]
  40081c:   31 db                   xor    ebx,ebx
  40081e:   31 f6                   xor    esi,esi
  400820:   ba 00 40 00 00          mov    edx,0x4000
  400825:   e8 56 fe ff ff          call   400680 <memset@plt>
  40082a:   66 0f 1f 44 00 00       nop    WORD PTR [rax+rax*1+0x0]
  400830:   f3 0f 10 04 1c          movss  xmm0,DWORD PTR [rsp+rbx*1]
  400835:   f3 0f 5a c0             cvtss2sd xmm0,xmm0
  400839:   bf 60 10 60 00          mov    edi,0x601060
  40083e:   e8 9d fe ff ff          call   4006e0 <_ZNSo9_M_insertIdEERSoT_@plt>
  400843:   48 83 c3 04             add    rbx,0x4
  400847:   48 81 fb 00 40 00 00    cmp    rbx,0x4000
  40084e:   75 e0                   jne    400830 <main+0x20>
  400850:   31 c0                   xor    eax,eax
  400852:   48 81 c4 00 40 00 00    add    rsp,0x4000
  400859:   5b                      pop    rbx
  40085a:   c3                      ret    
  40085b:   0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]

On the other hand with GCC 5.3.1, with no optimizations, it compiles very quickly but still produces a 95328 sized executable.另一方面,使用 GCC 5.3.1,在没有优化的情况下,它编译得非常快,但仍会生成一个 95328 大小的可执行文件。 Compiling with -O2 reduces the executable size to 53912 but compilation time takes 4 seconds.使用-O2编译会将可执行文件的大小减少到 53912,但编译时间需要 4 秒。 I would definitely report this to their bugzilla.肯定会将此报告给他们的 bugzilla。

Your GCC bug 71165 , then merged with 92385 , has been fixed on GCC 12.您的 GCC 错误71165 ,然后与92385合并,已在 GCC 12 上修复。

https://gcc.godbolt.org/z/eGMq16esP https://gcc.godbolt.org/z/eGMq16esP

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM