简体   繁体   中英

What's the compative cost of an atomic RMW operation and a function call?

My understanding is that atomic machine instructions may be up to two orders of magnitude slower than a non-atomic operation. For example, given

int x;
x++

and

std::atomic<int> y;
y++;

my understanding is that x++ typically runs much faster than y++ . (I'm assuming that the increment operation maps to an underlying machine instruction. I'm sure the exact comparative cost varies from architecture to architecture, but I'm talking about a rule of thumb.)

I'm interested in the relative cost of an atomic RMW operation and a non-inline function call, again, as a general rule of thumb. For example, given this non-inline function,

void f(){}

what can we generally say about the cost of y++ (ie, the atomic increment) compared to the cost of executing a non-inline call to f ?

My motivation is to try to put the common claim that "atomic operations are much more expensive than non-atomic operations" in perspective. One way to do that is to try to get some idea how expensive an atomic RMW operation is compared to calling and returning from a non-inline function.

Please don't reply with "the only way to know is to measure." I'm not asking about an atomic RMW operation versus a function call in a particular context on a particular architecture. I'm asking about a general rule of thumb that could be used as the basis of discussion for people who might think, "We can never use atomic instructions, because they're too expensive," yet who wouldn't think twice about making function calls.

The question as asked has issues.

One is that you use pseudo code syntax that doesn't have a clear storage class and that seems to operate on local objects. A local atomic object is meaningless. Atomic operations are for objects shared by different threads.

A compiler could well notice that a non volatile local variable is used only in a function and not generate any special atomic operation (I don't know of any compiler that does presently does that).

We have to assume that the object is not local (or is volatile).

The cost of any memory operation depends a lot on caching. If the location is not in our cache the operation will be much more costly.

All the end of stack (the most recent part) is almost always in our cache.

By definition the value of shared objects must travel between caches (they are modified or read by multiple threads).

So what are you really comparing here? Until you say precisely, the question can't be answered.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM