Why does 64-bit VC++ compiler add nop instruction after function calls?

Question

I've compiled the following using Visual Studio C++ 2008 SP1, x64 C++ compiler:

I'm curious, why did compiler add those nop instructions after those call s?

PS1. I would understand that the 2nd and 3rd nop s would be to align the code on a 4 byte margin, but the 1st nop breaks that assumption.

PS2. The C++ code that was compiled had no loops or special optimization stuff in it:

CTestDlg::CTestDlg(CWnd* pParent /*=NULL*/)
    : CDialog(CTestDlg::IDD, pParent)
{
    m_hIcon = AfxGetApp()->LoadIcon(IDR_MAINFRAME);

    //This makes no sense. I used it to set a debugger breakpoint
    ::GdiFlush();
    srand(::GetTickCount());
}

PS3. Additional Info: First off, thank you everyone for your input.

Here's additional observations:

My first guess was that incremental linking could've had something to do with it. But, the Release build settings in the Visual Studio for the project have incremental linking off.
This seems to affect x64 builds only. The same code built as x86 (or Win32 ) does not have those nop s, even though instructions used are very similar:

I tried to build it with a newer linker, and even though the x64 code produced by VS 2013 looks somewhat different, it still adds those nop s after some call s:

Also dynamic vs static linking to MFC made no difference on presence of those nop s. This one is built with dynamical linking to MFC dlls with VS 2013 :

Also note that those nop s can appear after near and far call s as well, and they have nothing to do with alignment. Here's a part of the code that I got from IDA if I step a little bit further on:

As you see, the nop is inserted after a far call that happens to "align" the next lea instruction on the B address! That makes no sense if those were added for alignment only.

I was originally inclined to believe that since near relative call s (ie those that start with E8 ) are somewhat faster than far call s (or the ones that start with FF , 15 in this case)

the linker may try to go with near call s first, and since those are one byte shorter than far call s, if it succeeds, it may pad the remaining space with nop s at the end. But then the example (5) above kinda defeats this hypothesis.

So I still don't have a clear answer to this.

Answer 1

This is purely a guess, but it might be some kind of a SEH optimization. I say optimization because SEH seems to work fine without the NOPs too. NOP might help speed up unwinding.

In the following example ( live demo with VC2017 ), there is a NOP inserted after a call to basic_string::assign in test1 but not in test2 (identical but declared as non-throwing ¹ ).

#include <stdio.h>
#include <string>

int test1() {
  std::string s = "a";  // NOP insterted here
  s += getchar();
  return (int)s.length();
}

int test2() throw() {
  std::string s = "a";
  s += getchar();
  return (int)s.length();
}

int main()
{
  return test1() + test2();
}

Assembly:

test1:
    . . .
    call     std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
    npad     1         ; nop
    call     getchar
    . . .
test2:
    . . .
    call     std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign
    call     getchar

Note that MSVS compiles by default with the /EHsc flag (synchronous exception handling). Without that flag the NOP s disappear, and with /EHa (synchronous and asynchronous exception handling), throw() no longer makes a difference because SEH is always on.

¹ For some reason only throw() seems to reduce the code size, using noexcept makes the generated code even bigger and summons even more NOP s. MSVC...

Answer 2

这是一个特殊的填充程序，让异常处理程序/展开函数正确检测它是否是函数的序言/结尾/正文。

Answer 3

This is due to a calling convention in x64 which requires the stack to be 16 bytes aligned before any call instruction. This is not (to my knwoledge) a hardware requirement but a software one. This provides a way to be sure that when entering a function (that is, after a call instruction), the value of the stack pointer is always 8 modulo 16. Thus permitting simple data alignement and storage/reads from aligned location in stack.

Why does 64-bit VC++ compiler add nop instruction after function calls?

Question

3 answers

solution1
3 2017-09-14 21:04:33

solution2
0 2019-02-08 07:59:48

solution3
-2 2017-09-13 17:08:32

Why does 64-bit VC++ compiler add nop instruction after function calls?

Question

3 answers

solution1 3 2017-09-14 21:04:33

solution2 0 2019-02-08 07:59:48

solution3 -2 2017-09-13 17:08:32

solution1
3 2017-09-14 21:04:33

solution2
0 2019-02-08 07:59:48

solution3
-2 2017-09-13 17:08:32