简体   繁体   中英

Will Thread.SpinWait be inlined when called?

I have following code:

while(flag)
{
  Thread.SpinWait(1);
}

following is implementation of SpinWait in Rotor(sscli20\\clr\\src\\vm\\comsynchronizable.cpp)

FCIMPL1(void, ThreadNative::SpinWait, int iterations)
{
    WRAPPER_CONTRACT;
    STATIC_CONTRACT_SO_TOLERANT;

    for(int i = 0; i < iterations; i++)
        YieldProcessor();
}
FCIMPLEND

Will Thread.SpinWait be inlined when called?

if not, in each loop cycle, it will spend more time on stack operations(push and pop) and consume more execution resource of CPU.

if yes, how does clr accomplish that while ThreadNative::SpinWait is implemented as standard function instruction sequence including stack operations(push and pop)?

By testing of Eren, no inline occurs in debug mode. Is it possible to clr optimize and produce inlined code?

Summary : thanks for your answer. I wish one day clr can inline pre-compiled code by one mechanism such as MethodImplOptions.InternalCall. Then it can eliminate stack operations and spend most time on check of flag and spinning-wait(consuming less cpu resource than nop).

Best to try and see. Sample code:

static void Main(string[] args)
{
    while (true) 
        Thread.SpinWait(1);
} 

The optimized disassembly shows:

x86:

00000000  push        ebp 
00000001  mov         ebp,esp 
00000003  mov         ecx,1 
00000008  call        6F11D3FE 
0000000d  jmp         00000003 

x64:

00000000  sub         rsp,28h 
00000004  mov         ecx,1 
00000009  call        000000005F815434 
0000000e  jmp         0000000000000004 
00000010  add         rsp,28h 
00000014  ret 

So there is no inlining in either case.

Maybe I'm missing something but I don't quite understand why you care about the stack operations as spinning the CPU consumes cycles anyway (the whole purpose is to not yield).

No, the jitter is not capable of inlining pre-compiled C++ code, only managed code that started as IL.

This is entirely irrelevant for a SpinWait() call. The point of spin-waiting is to have the processor execute code rather then paying the cost of a thread-context switch. With the expectation that flag will turn false in 10,000 cpu cycles or less. It doesn't matter what kind of code. CALL is a fine way to execute code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM