Understanding assembly instructions for a function summing three ints of an std::array

Question

I have the following c++ function which simply sums the three elements of the given input array.

#include <array>
using namespace std;

int square(array<int, 3> ar) {
    int res = 0;
    for(int idx = 0; idx < ar.size(); idx++){
        res += ar[idx];
    }
    return res;
}

This code compiled with Clang (gcc and icc produce the same code) and the compiler flag -O3 produces the following x86-64 assembly

sum(std::array<int, 3ul>):
        mov     rax, rdi
        shr     rax, 32
        add     eax, edi
        add     eax, esi
        ret

My current interpretation of the assembly is that the following happens:

64 bits are moved from the 64 bit input register rdi into the 64 bit output register rax. This corresponds to 32 bit ints.
shr shifts the contents of rax by 32 bits thus keeping only the first 32 bit int contained in rdi.
the contents of the 32 bit input register edi are added to the 32 bit output register eax
the contents of the second 32 bit input register esi are added to eax
eax is returned

I am however left with some questions:

Can the computer simply shift between 32 and 64 bit registers as is done in the first two instructions?
Shouldn't the use of shr result in the first int being added two times because the second int is shifted out? (Does this have to do with endianes?)

As an extra note: the compiler produces the same assembly instructions when supplied with a range based for loop.

#include <array>
using namespace std;

int sum(array<int, 3> ar) {
    int res = 0;
    for(const auto& in: ar){
        res += in;
    }
    return res;
}

You can find the example here: https://godbolt.org/z/s3fera7ca

Answer 1

The array is packed into registers for parameter passing as if it was a simple struct of 3 int s.

So, two 32-bit int elements are passed in the first argument register, and the remaining one in the second argument register.

How those first two are packed into one register may seem somewhat arbitrary, given that there is no memory involved in this example, and to be clear, the registers themselves alone have no notion endianness. Endianness is introduced by numeric data that takes more than one memory address — not by anything in or of the registers: registers can only be named (in machine code instructions), but not addressed, and as such, so there is no concept of endianness within the registers.

However, for some other operations that do involve storing and loading that same structure from memory, it is effective if that packing follows the endianness of the processor, so that is the logical choice for the designers of an ABI, who specify (by rules) where the first element, second element and third element of a struct go when passed as parameters in registers.

When the processor endianness is followed, then programs can use a quad word load or store and a double word load or store to copy the struct — a 64-bit operation followed by a 32-bit operation. If the processor's natural endianness weren't followed in the registers (which would actually still work) then three double word load or store operations would be needed instead, to get the proper order of the array elements from/into memory.

By following the natural endianness, machine code can mix 64-bit and 32-bit load and store operations even though the structure holds only 32-bit items.

How does edi fit into this?

edi is the the first element of the array/structure. rdi >> 32 is the 2nd as it is packed into the upper 32-bits of rdi , while the first element is packed into the lower 32-bits of rdi . And esi is the third.

Understanding assembly instructions for a function summing three ints of an std::array

Question

1 answers

solution1
0 2021-12-23 01:02:14

Understanding assembly instructions for a function summing three ints of an std::array

Question

1 answers

solution1 0 2021-12-23 01:02:14

solution1
0 2021-12-23 01:02:14