简体   繁体   English

理解C ++函数内联

[英]Understanding C++ function Inlining

I am using a MS specific keyword to force a global function to be inlined, but I noticed that the function fails to inline itself if it uses an object which does have an explicit trivial destructor. 我正在使用一个MS特定的关键字强制一个全局函数被内联,但我注意到如果它使用一个具有明确的简单析构函数的对象,该函数就无法内联。

Quoting from MSDN 引自MSDN

Even with __forceinline , the compiler cannot inline code in all circumstances. 即使使用__forceinline ,编译器也无法在所有情况下内联代码。 The compiler cannot inline a function if: 如果出现以下情况,编译器无法内联函数:

  • The function or its caller is compiled with /Ob0 (the default option for debug builds). 函数或其调用程序使用/Ob0 (调试版本的默认选项)进行编译。

  • The function and the caller use different types of exception handling (C++ exception handling in one, structured exception handling in the other). 函数和调用者使用不同类型的异常处理(一个是C ++异常处理,另一个是结构化异常处理)。

  • The function has a variable argument list. 该函数具有可变参数列表。

  • The function uses inline assembly, unless compiled with /Og , /Ox , /O1 , or /O2 . 除非使用/Og/Ox/O1/O2编译,否则该函数使用内联汇编。

  • The function is recursive and not accompanied by #pragma inline_recursion(on) . 该函数是递归的,没有#pragma inline_recursion(on) With the pragma, recursive functions are inlined to a default depth of 16 calls. 使用pragma,递归函数被内联到16个调用的默认深度。 To reduce the inlining depth, use inline_depth pragma. 要减少内联深度,请使用inline_depth pragma。

  • The function is virtual and is called virtually. 该功能是虚拟的,虚拟调用。 Direct calls to virtual functions can be inlined. 可以内联直接调用虚函数。

  • The program takes the address of the function and the call is made via the pointer to the function. 程序获取函数的地址,并通过指向函数的指针进行调用。 Direct calls to functions that have had their address taken can be inlined. 可以内联直接调用已经获取其地址的函数。

  • The function is also marked with the naked __declspec modifier. 该函数也标有裸__declspec修饰符。

I am trying the following self contained program to test the behavior 我正在尝试以下自包含程序来测试行为

#include <iostream>
#define INLINE __forceinline
template <class T>
struct rvalue
{
    T& r_;
    explicit INLINE rvalue(T& r) : r_(r) {}
};

template <class T>
INLINE
T movz(T& t)
{
    return T(rvalue<T>(t));
}
template <class T>
class Spam
{
public:
    INLINE operator rvalue<Spam>()  { return rvalue<Spam>(*this); }
    INLINE Spam() : m_value(0)  {}
    INLINE Spam(rvalue<Spam> p) : m_value(p.r_.m_value) {}
    INLINE Spam& operator= (rvalue<Spam> p) 
    {
        m_value = p.r_.m_value;
        return *this; 
    }
    INLINE explicit Spam(T value) : m_value(value) {    }
    INLINE operator T() { return m_value; };
    template <class U, class E> INLINE  Spam& operator= (Spam<U> u) { return *this; }
    INLINE ~Spam() {}
private:
    Spam(Spam<T>&); // not defined
    Spam& operator= (Spam&); // not defined
private:
    T m_value; 
};
INLINE int foo()
{
    Spam<int> p1(int(5)), p2;
    p2 = movz(p1);
    return p2;
}

int main()
{
    std::cout << foo() << std::endl;
} 

With the trivial destructor INLINE ~Spam() {} in place, we have the following disassembly 使用简单的析构函数INLINE ~Spam() {} ,我们有以下反汇编

int main()
{
000000013F4B1010  sub         rsp,28h  
    std::cout << foo() << std::endl;
000000013F4B1014  lea         rdx,[rsp+30h]  
000000013F4B1019  lea         rcx,[rsp+38h]  
000000013F4B101E  mov         dword ptr [rsp+30h],5  
000000013F4B1026  call        movz<Spam<int> > (013F4B1000h)  
000000013F4B102B  mov         rcx,qword ptr [__imp_std::cout (013F4B2050h)]  
000000013F4B1032  mov         edx,dword ptr [rax]  
000000013F4B1034  call        qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013F4B2040h)]  
000000013F4B103A  mov         rdx,qword ptr [__imp_std::endl (013F4B2048h)]  
000000013F4B1041  mov         rcx,rax  
000000013F4B1044  call        qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013F4B2058h)]  
}

where as without the destructor INLINE ~Spam() {} we have the following disassembly 如果没有析构函数INLINE ~Spam() {}我们有以下反汇编

int main()
{
000000013FF01000  sub         rsp,28h  
    std::cout << foo() << std::endl;
000000013FF01004  mov         rcx,qword ptr [__imp_std::cout (013FF02050h)]  
000000013FF0100B  mov         edx,5  
000000013FF01010  call        qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013FF02040h)]  
000000013FF01016  mov         rdx,qword ptr [__imp_std::endl (013FF02048h)]  
000000013FF0101D  mov         rcx,rax  
000000013FF01020  call        qword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (013FF02058h)]  
}
000000013FF01026  xor         eax,eax  
}

I am failing to understand, why in the presence of the destructor, the compiler fails to inline the function T movz(T& t) 我无法理解,为什么在析构函数存在的情况下,编译器无法内联函数T movz(T& t)

  • Note The behavior is consistent from 2008 to 2013 注意从2008年到2013年,行为是一致的
  • Note I checked with cygwin-gcc but the compiler does inlines the code. 注意我用cygwin-gcc检查过但编译器确实内联了代码。 I cannot verify other compilers at this moment, but would update in next 12 hours if required 我目前无法验证其他编译器,但如果需要,将在接下来的12小时内更新

Yes, it's a bug. 是的,这是一个错误。 I have tested it on Qt over MinGW compiler environment. 我已经在Qt上通过MinGW编译器环境测试了它。 It optimizes everything very well. 它非常好地优化了一切。

First, I have changed your code a little bit as below for easier viewing the assembly code: 首先,我已经更改了您的代码,如下所示,以便于查看汇编代码:

int main()
{
    int i = foo();
    std::cout << i << std::endl;
}

And from my Qt's debug disassembly: 从我的Qt的调试反汇编:

        45  int main()
        46  {
0x401600                    lea    0x4(%esp),%ecx
0x401604  <+0x0004>         and    $0xfffffff0,%esp
0x401607  <+0x0007>         pushl  -0x4(%ecx)
0x40160a  <+0x000a>         push   %ebp
0x40160b  <+0x000b>         mov    %esp,%ebp
0x40160d  <+0x000d>         push   %ecx
0x40160e  <+0x000e>         sub    $0x54,%esp
0x401611  <+0x0011>         call   0x402160 <__main>
0x401616  <+0x0016>         movl   $0x5,-0x10(%ebp)
        47      int i = foo();
0x401683  <+0x0083>         mov    %eax,-0xc(%ebp)
        48      std::cout << i << std::endl;
0x401686  <+0x0086>         mov    -0xc(%ebp),%eax
0x401689  <+0x0089>         mov    %eax,(%esp)
0x40168c  <+0x008c>         mov    $0x6fcba2c0,%ecx
0x401691  <+0x0091>         call   0x401714 <_ZNSolsEi>
0x401696  <+0x0096>         sub    $0x4,%esp
0x401699  <+0x0099>         movl   $0x40171c,(%esp)
0x4016a0  <+0x00a0>         mov    %eax,%ecx
0x4016a2  <+0x00a2>         call   0x401724 <_ZNSolsEPFRSoS_E>
0x4016a7  <+0x00a7>         sub    $0x4,%esp
        49  }
0x4016aa  <+0x00aa>         mov    $0x0,%eax
0x4016af  <+0x00af>         mov    -0x4(%ebp),%ecx
0x4016b2  <+0x00b2>         leave
0x4016b3  <+0x00b3>         lea    -0x4(%ecx),%esp
0x4016b6  <+0x00b6>         ret

You can even see that foo() is optimized. 你甚至可以看到foo()已经过优化。 You can see that variable 'i' is directly assigned to 5 and is printed. 您可以看到变量“i”直接分配给5并打印出来。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM