为什么64位VC ++编译器在函数调用后添加nop指令？

Question

I've compiled the following using Visual Studio C++ 2008 SP1, x64 C++ compiler: 我使用Visual Studio C ++ 2008 SP1， x64 C++编译器编译了以下内容：

I'm curious, why did compiler add those nop instructions after those call s? 我很好奇，为什么编译器会在那些call之后添加那些nop指令？

PS1. PS1。 I would understand that the 2nd and 3rd nop s would be to align the code on a 4 byte margin, but the 1st nop breaks that assumption. 我会理解第二和第三个nop将是4字节边距上的代码对齐，但第一个nop打破了这个假设。

PS2. PS2。 The C++ code that was compiled had no loops or special optimization stuff in it: 编译的C ++代码中没有循环或特殊的优化内容：

CTestDlg::CTestDlg(CWnd* pParent /*=NULL*/)
    : CDialog(CTestDlg::IDD, pParent)
{
    m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME);

    //This makes no sense. I used it to set a debugger breakpoint
    ::GdiFlush();
    srand(::GetTickCount());
}

PS3. PS3。 Additional Info: First off, thank you everyone for your input. 附加信息： 首先，谢谢大家的意见。

Here's additional observations: 以下是其他观察结果：

My first guess was that incremental linking could've had something to do with it. 我的第一个猜测是增量链接可能与它有关。 But, the Release build settings in the Visual Studio for the project have incremental linking off. 但是，项目的Visual Studio的Release构建设置具有incremental linking 。
This seems to affect x64 builds only. 这似乎只影响x64版本。 The same code built as x86 (or Win32 ) does not have those nop s, even though instructions used are very similar: 构建为x86 （或Win32 ）的相同代码没有那些nop ，即使使用的指令非常相似：

I tried to build it with a newer linker, and even though the x64 code produced by VS 2013 looks somewhat different, it still adds those nop s after some call s: 我尝试使用更新的链接器构建它，即使VS 2013生成的x64代码看起来有些不同，它仍会在一些call之后添加那些nop ：

Also dynamic vs static linking to MFC made no difference on presence of those nop s. dynamic与static链接到MFC也没有区别存在那些nop 。 This one is built with dynamical linking to MFC dlls with VS 2013 : 这个与VS 2013动态链接到MFC dll：

Also note that those nop s can appear after near and far call s as well, and they have nothing to do with alignment. 还要注意的是那些nop S能后出现near及far call S作为很好，他们什么都没有做比对。 Here's a part of the code that I got from IDA if I step a little bit further on: 以下是我从IDA获得的代码的一部分，如果我再进一步说明：

As you see, the nop is inserted after a far call that happens to "align" the next lea instruction on the B address! 如您所见，在far call之后插入nop ，恰好“对齐” B地址上的下一个lea指令！ That makes no sense if those were added for alignment only. 如果仅为了对齐而添加这些内容毫无意义。

I was originally inclined to believe that since near relative call s (ie those that start with E8 ) are somewhat faster than far call s (or the ones that start with FF , 15 in this case) 我本来倾向于认为，因为near relative call （即那些以E8开头的call ）比far call s（或以FF开头的那些，在这种情况下为15 ）更快一些

the linker may try to go with near call s first, and since those are one byte shorter than far call s, if it succeeds, it may pad the remaining space with nop s at the end. 链接器可能首先尝试near call s，并且因为它们比far call s短一个字节，如果成功，它可以在末尾用nop s填充剩余空间。 But then the example (5) above kinda defeats this hypothesis. 但是上面的例子（5）有点打败了这个假设。

So I still don't have a clear answer to this. 所以我仍然没有明确的答案。

Answer 1

This is purely a guess, but it might be some kind of a SEH optimization. 这纯粹是猜测，但它可能是某种SEH优化。 I say optimization because SEH seems to work fine without the NOPs too. 我说优化是因为SEH似乎在没有NOP的情况下工作正常。 NOP might help speed up unwinding. NOP可能有助于加速平仓。

In the following example ( live demo with VC2017 ), there is a NOP inserted after a call to basic_string::assign in test1 but not in test2 (identical but declared as non-throwing ¹ ). 在下面的示例中（使用VC2017进行实时演示），在test1调用basic_string::assign后插入了NOP ，但在test2没有（相同但声明为非抛出¹ ）。

#include <stdio.h>
#include <string>

int test1() {
  std::string s = "a";  // NOP insterted here
  s += getchar();
  return (int)s.length();
}

int test2() throw() {
  std::string s = "a";
  s += getchar();
  return (int)s.length();
}

int main()
{
  return test1() + test2();
}

Assembly: 部件：

test1:
    . . .
    call     std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
    npad     1         ; nop
    call     getchar
    . . .
test2:
    . . .
    call     std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
    call     getchar

Note that MSVS compiles by default with the /EHsc flag (synchronous exception handling). 请注意，MSVS默认使用/EHsc标志进行编译（同步异常处理）。 Without that flag the NOP s disappear, and with /EHa (synchronous and asynchronous exception handling), throw() no longer makes a difference because SEH is always on. 如果没有那个标志， NOP消失，并且使用/EHa （同步和异步异常处理）， throw()不再/EHa ，因为SEH始终打开。

¹ For some reason only throw() seems to reduce the code size, using noexcept makes the generated code even bigger and summons even more NOP s. ¹由于某些原因，只有throw()似乎减少了代码大小，使用noexcept使生成的代码更大并且召唤更多的NOP 。 MSVC... MSVC ...

Answer 2

这是一个特殊的填充程序，让异常处理程序/展开函数正确检测它是否是函数的序言/结尾/正文。

Answer 3

This is due to a calling convention in x64 which requires the stack to be 16 bytes aligned before any call instruction. 这是由于x64中的调用约定要求堆栈在任何调用指令之前对齐16字节。 This is not (to my knwoledge) a hardware requirement but a software one. 这不是（我的知识）硬件要求，而是软件要求。 This provides a way to be sure that when entering a function (that is, after a call instruction), the value of the stack pointer is always 8 modulo 16. Thus permitting simple data alignement and storage/reads from aligned location in stack. 这提供了一种方法来确保在进入函数时（即，在调用指令之后），堆栈指针的值总是8模16。因此允许从堆栈中的对齐位置进行简单的数据对齐和存储/读取。

为什么64位VC ++编译器在函数调用后添加nop指令？

问题描述

3 个解决方案

解决方案1
3 2017-09-14 21:04:33

解决方案2
0 2019-02-08 07:59:48

解决方案3
-2 2017-09-13 17:08:32

为什么64位VC ++编译器在函数调用后添加nop指令？

问题描述

3 个解决方案

解决方案1 3 2017-09-14 21:04:33

解决方案2 0 2019-02-08 07:59:48

解决方案3 -2 2017-09-13 17:08:32

解决方案1
3 2017-09-14 21:04:33

解决方案2
0 2019-02-08 07:59:48

解决方案3
-2 2017-09-13 17:08:32