I am using this simple example to illustrate a problem in which I am trying to optimize stack usage. Let's say I have a struct like this:
// Something.h
struct Something {
int val;
bool operator==(const Something& rhs);
bool operator!=(const Something& rhs);
};
// Something.cpp
bool Something::operator==(const Something& rhs) {
return val == rhs.val;
}
bool Something::operator!=(const Something& rhs) {
return !(*this == rhs);
}
Calling operator.=() will push two stack frames onto the stack (one for?= and another for ==). Should I inline operator!=() so that both == and != use same amount of stack?
You should use link-time optimization (LTO) so either of them can fully inline into the call-site , especially when it's near-trivial like this.
But if you don't want to use LTO for cross-file inlining, then yes it would be a good idea to put the operator !=
return;(*this == rhs);
definition inside the class definition in the .h
) so it's visible to every caller and can inline there into files that just included the .h
. Then the asm for callers will call the same operator==
definition but use the result the opposite way. eg test al,al
/ jnz
instead of jz
if you're branching on the result.
If you don't use LTO and don't make the definition visible for compile-time inlining, the best that will happen is the compiler will inline operator==
into the operator!=
stand-alone definition when compiling that one .cpp
. Then you have two similar-sized functions in the machine code that differ only by one boolean inversion. Users of these functions (from other files) will call one or the other, so they're both taking up space in your I-cache / code footprint.
// Something.h
struct Something {
int val;
bool operator==(const Something& rhs);
bool operator!=(const Something& rhs) { return !(*this == rhs); }
};
// simulated #include for one-file demo purposes
// Some other .cpp file, operator== definition not visible.
int foo(Something &a, Something &b)
{
if (a != b) {
return a.val;
} else {
return b.val;
}
}
GCC -O3 for x86-64 ( Godbolt ) compiles as follows:
foo(Something&, Something&):
push rbp
mov rbp, rsi
push rbx
mov rbx, rdi # save the pointers in call-preserved regs
sub rsp, 8
call Something::operator==(Something const&)
test al, al # set FLAGS from the bool retval
cmovne rbx, rbp # select the right pointer
mov eax, DWORD PTR [rbx] # and load from it
add rsp, 8 # epilogue
pop rbx
pop rbp
ret
Notice that this code calls Something::operator==
which couldn't inline at compile time (it could at link time with LTO). It just uses cmovne
instead of cmove
if it had called an actual separate operator!=
.
The operator!=
inlined to literally zero extra cost, and all calls to either function use the same stand-alone definition, saving code footprint. Good for performance especially if you have code that uses both operators enough for it to stay hot in cache.
Of course, letting operator==
inline as well would give significant savings when the class is just an int
; no call at all is often a lot better because there's no need to preserve registers around something.
(Of course in this case my example is too trivial: if they are equal, then it can still return a.val
because it knows that's the same as b.val
. So if you uncomment the operator==
definition in the Godbolt link, foo
compiles to mov eax, DWORD PTR [rdi]
/ ret
, never even touching b
.)
You have a misconception about what the inline
keyword does. For clarification, see this answer https://stackoverflow.com/a/66379889/15284149
Answering your question, yes it is probably faster to "inline", which is why the compiler will in fact automatically optimize it (as long as you have at least -O1): https://godbolt.org/z/9Kn3hE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.