I was curious to see what the cost is of accessing a data member through a pointer compared with not through a pointer, so came up with this test:
#include <iostream>
struct X{
int a;
};
int main(){
X* xheap = new X();
std::cin >> xheap->a;
volatile int x = xheap->a;
X xstack;
std::cin >> xstack.a;
volatile int y = xstack.a;
}
the generated x86 is:
int main(){
push rbx
sub rsp,20h
X* xheap = new X();
mov ecx,4
call qword ptr [__imp_operator new (013FCD3158h)]
mov rbx,rax
test rax,rax
je main+1Fh (013FCD125Fh)
xor eax,eax
mov dword ptr [rbx],eax
jmp main+21h (013FCD1261h)
xor ebx,ebx
std::cin >> xheap->a;
mov rcx,qword ptr [__imp_std::cin (013FCD3060h)]
mov rdx,rbx
call qword ptr [__imp_std::basic_istream<char,std::char_traits<char> >::operator>> (013FCD3070h)]
volatile int x = xheap->a;
mov eax,dword ptr [rbx]
X xstack;
std::cin >> xstack.a;
mov rcx,qword ptr [__imp_std::cin (013FCD3060h)]
mov dword ptr [x],eax
lea rdx,[xstack]
call qword ptr [__imp_std::basic_istream<char,std::char_traits<char> >::operator>> (013FCD3070h)]
volatile int y = xstack.a;
mov eax,dword ptr [xstack]
mov dword ptr [x],eax
It looks like the non-pointer access takes two instructions, compared to oneinstruction for the access through a pointer. Could somebody please tell me why this is and which would take fewer CPU cycles to retrieve?
I am trying to understand if pointers do incur more CPU instructions/cycles when accessing data members through them as opposed to non-pointer-access.
That's a terrible test.
The complete assignment to x
is this:
mov eax,dword ptr [rbx]
mov dword ptr [x],eax
(the compiler is allowed to re-order the instructions somewhat, and has).
The assignment to y
(which the compiler has given the same address as x
) is
mov eax,dword ptr [xstack]
mov dword ptr [x],eax
which is almost the same (read memory pointed to by register, write to the stack).
The first one would be more complicated except that the compiler kept xheap
in register rbx
after the call to new
, so it doesn't need to re-load it.
In either case I would be more worried about whether any of those accesses misses the L1 or L2 caches than about the precise instructions. (The processor doesn't even directly execute those instructions, they get converted internally to a different instruction set, and it may execute them in a different order.)
Accessing via a pointer instead of directly accessing from the stack costs you one extra indirection in the worst case (fetching the pointer). This is almost always irrelevant in itself; you need to look at your whole algorithm and how it works with the processor's caches and branch prediction logic.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.