简体   繁体   中英

'Bus Error' on ARMv6 when working with doubles

I'm creating a C++ program for ARMv6 which crashes with BUS ERROR. Using GDB I have traced the problem to the following code

double d = *(double*)pData; pData += sizeof(int64_t);  // char *pData

The program goes through a received message and has to extract some double values using the above code. The received message has several fields, some doubles some not.

On x86 architectures this works fine, but on ARM I get the 'bus error'. So, I suspect my problem is alignment of data -- the double fields have to be aligned to word boundaries in memory on the ARM architecture.

I have tried the following as a fix, which did not work (still got the error):

int64_t i = *(int64_t*)pData;
double d = *((double*)&i);

The following worked (so far):

double d = 0;
memcpy(&d, pData, sizeof(double));

Is using 'memcpy' the best approach? Or, is there a better way?

In my case I do not have control over the packing of the data in the buffer or the order of the fields in the message.

Related question: std::atomic<double> on Armv7 (RPi2) and alignment/bus errors

Is using 'memcpy' the best approach?

In general it's the only correct approach, unless you're targeting a single ABI in which no type requires greater than 1-byte alignment.

The C++ standard is rather verbose, so I'll quote the C standard expressing the same thing much more succinctly:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

There it is: that ever-present spectre of undefined behaviour. Even an x86 compiler is perfectly well allowed to break into your house and rub jam into your hair while you sleep instead of loading that data the way you expect, if its ABI says so.

One thing to note, though, is that modern compilers tend to be clever enough that correctness doesn't necessarily come at the cost of performance. Let's flesh out that example code:

#include <string.h>

double func(char *data) {
    double d;
    memcpy(&d, data, sizeof d);
    return d;
}

...and throw it at a compiler:

$ clang -target arm -march=armv6 -mfpu=vfpv3 -mfloat-abi=hard -O1 -S test.c
...
func:                                   @ @func
        .fnstart
@ BB#0:
        push    {r4, r5, r11, lr}
        sub     sp, sp, #8
        mov     r2, r0
        ldrb    r1, [r0, #3]
        ldrb    r3, [r0, #2]
        ldrb    r12, [r0]
        ldrb    lr, [r0, #1]
        ldrb    r4, [r2, #4]!
        orr     r5, r3, r1, lsl #8
        ldrb    r3, [r2, #2]
        ldrb    r2, [r2, #3]
        ldrb    r0, [r0, #5]
        orr     r1, r12, lr, lsl #8
        orr     r2, r3, r2, lsl #8
        orr     r0, r4, r0, lsl #8
        orr     r1, r1, r5, lsl #16
        orr     r0, r0, r2, lsl #16
        str     r1, [sp]
        str     r0, [sp, #4]
        vpop    {d0}
        pop     {r4, r5, r11, pc}

OK, so it's playing things safe with a bytewise memcpy ; at least it's inlined. But hey, ARMv6 does at least support unaligned word and halfword accesses if the CPU is configured appropriately - let's tell the compiler we're cool with that:

$ clang -target arm -march=armv6 -mfpu=vfpv3 -mfloat-abi=hard -O1 -S -munaligned-access test.c
...
func:                                   @ @func
        .fnstart
@ BB#0:
        sub     sp, sp, #8
        ldr     r1, [r0]
        ldr     r0, [r0, #4]
        str     r0, [sp, #4]
        str     r1, [sp]
        vpop    {d0}
        bx      lr

There we go, that's about the best you can do with just integer word loads. Now, what if we compile it for something a bit newer?

$ clang -target arm -march=armv7 -mfpu=neon-vfpv4 -mfloat-abi=hard -O1 -S test.c
...
func:                                   @ @func
        .fnstart
@ BB#0:
        vld1.8  {d0}, [r0]
        bx      lr

I can guarantee that, even on a machine where it would "work", no undefined-behaviour-hackery would correctly load that unaligned double in fewer than one instructions. Note that NEON is the key player here - vld1 only requires the base address to be aligned to the element size, so for 8-bit elements it can never be unaligned. In the more general case (say, if it were a long long instead of a double ) you might still need -munaligned-access to convince the compiler as before.

For comparison, let's just see how everyone's favourite mutant-grandchild-of-a-1970s-calculator-chip fares as well:

clang -O1 -S test.c
...
func:                                   # @func
# BB#0:
        movl    4(%esp), %eax
        fldl    (%eax)
        retl

Yup, the correct code still also looks like the best code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM