简体   繁体   中英

clang/LLVM ARM ABI, non-volatile registers being destroyed

I am trying to use clang/llvm as a cross compiler for ARM cortex-m.

Based on an/some LLVM pages this is how I am building the toolchain

rm -rf /opt/llvm/llvm10armv6m
rm -rf llvm-project
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout llvmorg-10.0.0
mkdir build
cd build
cmake -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_CROSSCOMPILING=True -DCMAKE_INSTALL_PREFIX=/opt/llvm/llvm10armv6m -DLLVM_DEFAULT_TARGET_TRIPLE=armv6m-none-eabi -DLLVM_TARGET_ARCH=ARM -DLLVM_TARGETS_TO_BUILD=ARM -G "Unix Makefiles" ../llvm
make -j 8
make -j 4
make
sudo make install

test.c

void fun ( unsigned int, unsigned int );
int test ( void )
{
    unsigned int ra;
    unsigned int rx;

    for(rx=0;;rx++)
    {
        ra=rx;
        ra|=((~rx)&0xFF)<<16;
        fun(0x12345678,ra);
    }
    return(0);
}

clang -Wall -O2 -nostdlib -ffreestanding -fomit-frame-pointer -c test.c -o test.o
arm-none-eabi-objdump -D test.o


Disassembly of section .text:

00000000 <test>:
   0:   20ff        movs    r0, #255    ; 0xff
   2:   0405        lsls    r5, r0, #16
   4:   2600        movs    r6, #0
   6:   4c06        ldr r4, [pc, #24]   ; (20 <test+0x20>)
   8:   4637        mov r7, r6
   a:   4629        mov r1, r5
   c:   43b1        bics    r1, r6
   e:   4339        orrs    r1, r7
  10:   4620        mov r0, r4
  12:   f7ff fffe   bl  0 <fun>
  16:   2001        movs    r0, #1
  18:   0400        lsls    r0, r0, #16
  1a:   1836        adds    r6, r6, r0
  1c:   1c7f        adds    r7, r7, #1
  1e:   e7f4        b.n a <test+0xa>
  20:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

(gnu's output is much better)

The problem here is arms abi says don't destroy r4 and above, certainly not r4 and r7 as it does here, also it isn't preserving the link register for a return from this function (although I guess it sees this is an infinite loop and doesn't return (please don't tell me I fell into the llvm infinite loop bug again)).

with the frame pointer it doesn't get any better

00000000 <test>:
   0:   b580        push    {r7, lr}
   2:   af00        add r7, sp, #0
   4:   20ff        movs    r0, #255    ; 0xff
   6:   0405        lsls    r5, r0, #16
   8:   2400        movs    r4, #0
   a:   4626        mov r6, r4
   c:   4629        mov r1, r5
   e:   43a1        bics    r1, r4
  10:   4331        orrs    r1, r6
  12:   4804        ldr r0, [pc, #16]   ; (24 <test+0x24>)
  14:   f7ff fffe   bl  0 <fun>
  18:   2001        movs    r0, #1
  1a:   0400        lsls    r0, r0, #16
  1c:   1824        adds    r4, r4, r0
  1e:   1c76        adds    r6, r6, #1
  20:   e7f4        b.n c <test+0xc>
  22:   46c0        nop         ; (mov r8, r8)
  24:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

and building the toolchain for

armv6m-none-gnueabi

didn't make it any better

but if I take a generic apt-gotten clang/llvm

clang -Wall -O2 -nostdlib -ffreestanding -fomit-frame-pointer -target armv6m-none-gnueabi -mthumb -mcpu=cortex-m0 -c test.c -o test.o
arm-none-eabi-objdump -D test.o

Disassembly of section .text:

00000000 <test>:
   0:   b5f0        push    {r4, r5, r6, r7, lr}
   2:   b081        sub sp, #4
   4:   20ff        movs    r0, #255    ; 0xff
   6:   0405        lsls    r5, r0, #16
   8:   2600        movs    r6, #0
   a:   4c06        ldr r4, [pc, #24]   ; (24 <test+0x24>)
   c:   4637        mov r7, r6
   e:   4629        mov r1, r5
  10:   43b1        bics    r1, r6
  12:   4339        orrs    r1, r7
  14:   4620        mov r0, r4
  16:   f7ff fffe   bl  0 <fun>
  1a:   2001        movs    r0, #1
  1c:   0400        lsls    r0, r0, #16
  1e:   1836        adds    r6, r6, r0
  20:   1c7f        adds    r7, r7, #1
  22:   e7f4        b.n e <test+0xe>
  24:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

problem is gone.

Now yes at the time of this writing the built one is v10 and the apt-gotten one is v6 (building a v10 one, why does it take an eternity to build? why are the binaries so huge?)

Using the same command line against the built one no change has the abi problem.

Now if I don't optimize perhaps it is just dumb luck:

00000000 <test>:
   0:   b580        push    {r7, lr}
   2:   b082        sub sp, #8
   4:   2000        movs    r0, #0
   6:   9000        str r0, [sp, #0]
   8:   e7ff        b.n a <test+0xa>
   a:   9800        ldr r0, [sp, #0]
   c:   9001        str r0, [sp, #4]
   e:   4668        mov r0, sp
  10:   7800        ldrb    r0, [r0, #0]
  12:   21ff        movs    r1, #255    ; 0xff
  14:   4048        eors    r0, r1
  16:   0400        lsls    r0, r0, #16
  18:   9901        ldr r1, [sp, #4]
  1a:   4301        orrs    r1, r0
  1c:   9101        str r1, [sp, #4]
  1e:   9901        ldr r1, [sp, #4]
  20:   4803        ldr r0, [pc, #12]   ; (30 <test+0x30>)
  22:   f7ff fffe   bl  0 <fun>
  26:   e7ff        b.n 28 <test+0x28>
  28:   9800        ldr r0, [sp, #0]
  2a:   1c40        adds    r0, r0, #1
  2c:   9000        str r0, [sp, #0]
  2e:   e7ec        b.n a <test+0xa>
  30:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

Links are bad at SO, so

How To Cross-Compile Clang/LLVM using Clang/LLVM

Is the title of the page it has info like this

The CMake options you need to add are:

-DCMAKE_CROSSCOMPILING=True
-DCMAKE_INSTALL_PREFIX=<install-dir>
-DLLVM_TABLEGEN=<path-to-host-bin>/llvm-tblgen
-DCLANG_TABLEGEN=<path-to-host-bin>/clang-tblgen
-DLLVM_DEFAULT_TARGET_TRIPLE=arm-linux-gnueabihf
-DLLVM_TARGET_ARCH=ARM
-DLLVM_TARGETS_TO_BUILD=ARM

I started with the gnu triple I use as the page mentions but then saw that llvm has sub architecture so added that in and initially it all looked good until I made a program with more than a few lines in it.

Am I building llvm incorrectly? Or is this simply the llvm infinite loop thing? (or other...)

Edit

Updated build script:

export THEPLACE=/opt/llvm/llvm10armv6m
export THETARGET=armv6m-none-eabi

rm -rf $THEPLACE
rm -rf llvm-project
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout llvmorg-10.0.0
mkdir build
cd build
cmake \
-DLLVM_ENABLE_PROJECTS='clang;lld' \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CROSSCOMPILING=True \
-DCMAKE_INSTALL_PREFIX=$THEPLACE \
-DLLVM_DEFAULT_TARGET_TRIPLE=$THETARGET \
-DLLVM_TARGET_ARCH=ARM \
-DLLVM_TARGETS_TO_BUILD=ARM \
-G "Unix Makefiles" \
../llvm

make -j 8
make -j 4
make
sudo make install

the tbl-gen stuff isn't needed apparently. In theory the -G Unix Makefiles is supposed to allow for parallel buildable makefiles, but I did have an issue with that. One or two places it worked one it didn't and would have to run again and again or eventually serially. thus the makes at the end being that way.

With the Release build the binaries are SIGNIFICANTLY smaller instead of tens of GB it is like 1.somethingGB for the whole install.

I don't think the build is any faster. Still on par with building gcc in the 1990s for duration.

The answer is pretty easy: your function never returns. Therefore it does not make any sense to save / restore callee-saved registers.

If you'd change you source to allow the function terminate, like this:

void fun ( unsigned int, unsigned int );
unsigned bar();
int test ( void )
{
    unsigned int ra;
    unsigned int rx;

    for(rx=0;rx<bar();rx++)
    {
        ra=rx;
        ra|=((~rx)&0xFF)<<16;
        fun(0x12345678,ra);
    }
    return(0);
}

Everything will be saved / restored as you expected.

PS: I would not comment on whether infinite loop is UB

PPS: You may certainly want to compile llvm/clang in Release mode – the binaries will be smaller and the linking time will reduce dramatically.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM