How to make gcc generate stack in bare-metal environment?

Question

当我使用 GCC 进行 ARM 操作系统开发时，我不能使用本地变量，因为堆栈没有初始化，那么我如何告诉编译器初始化 SP？

Answer 1

My experience is with the Cortex-M, and as @n-pronouns-m said it is the linker, not the compiler or assembler, that "sets up" the stack. All that is necessary is to place the initial stack pointer value at location 0x0 in program memory. This is typically the (highest RAM address + 4). Since different processors have different amounts of RAM, the proper address is processor dependent and is usually a literal in the linker file.

Answer 2

This is a variation on the code I use at a global level in my bare metal C code, aarch64, Pi3. It calls a C function called enter having set up a simple stack, given a variable stacks and a size of the stack you want for each core STACK_SIZE (you can't use sizeof).

asm (
    "\n.global  _start"
    "\n.type    _start, %function"
    "\n.section .text"
    "\n_start:"
    "\n\tmrs     x0, mpidr_el1"
    "\n\ttst     x0, #0x40000000"
    "\n\tand     x1, x0, #0xff"
    "\n\tcsel    x1, x1, xzr, eq" // core
    "\n\tadr     x0, stacks"
    "\n\tmov     x3, #"STACK_SIZE                                                                                       
    "\n\tmul     x2, x1, x3"
    "\n\tadd     x0, x0, x2"
    "\n\tadd     sp, x0, x3"
    "\n\tb     enter"
    "\n\t.previous"
    "\n.align 10" ); // Alignment to avoid GPU overwriting code

Answer 3

Your question is confusing since you do not specify the target, there are different answers for the different flavors of ARM architecture. But independent of that gcc has nothing to do with this. Gcc is a C compiler and as such you need a bootstrap written in some other language ideally (otherwise it looks bad and you are fighting a chicken and egg problem anyway). Generally done in assembly language.

For the armv4t up into the armv7-a cores you have different processor modes, user, system, supervisor, etc. When you look at the Architectural Reference Manual, you see that the stack pointer is banked, one for each mode or at least many of the modes have their one plus a little sharing. Which means you need to have a way to access that register. For those cores how that works is you need to switch modes set the stack pointer switch mode set the stack pointer, until you have all the ones you are going to use setup (see the tens to hundreds of thousands of examples on the internet with respect to how to do this). And then often come back to supervisor mode to then boot into the application/kernel whatever you want to call it.

Then with the armv8-a and I think armv7-a as well you have a hypervisor mode which is a different. And certainly armv8-a which is the 64 bit core (has an armv7-a compatible core inside for aarch32 execution).

All of the above though you need to set the stack pointer in your code

reset:
    mov sp,=0x8000

or some such thing. On the early Pis, that is the kind of thing you could do as that loader would put your kernel.img at 0x8000 unless otherwise instructed so from just below the entry point to just above the ATAGs is free space and after booted if you use the ATAG entries then you are free down to the exception table (which you need to setup, the easiest way is to let the tools work for you and generate the addresses, then simply copy them to their proper location. This kind of thing.

.globl _start
_start:
    ldr pc,reset_handler
    ldr pc,undefined_handler
    ldr pc,swi_handler
    ldr pc,prefetch_handler
    ldr pc,data_handler
    ldr pc,unused_handler
    ldr pc,irq_handler
    ldr pc,fiq_handler
reset_handler:      .word reset
undefined_handler:  .word hang
swi_handler:        .word hang
prefetch_handler:   .word hang
data_handler:       .word hang
unused_handler:     .word hang
irq_handler:        .word irq
fiq_handler:        .word hang

reset:
    mov r0,#0x8000
    mov r1,#0x0000
    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9}
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}
    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9}
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}


    ;@ (PSR_IRQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
    mov r0,#0xD2
    msr cpsr_c,r0
    mov sp,#0x8000

    ;@ (PSR_FIQ_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
    mov r0,#0xD1
    msr cpsr_c,r0
    mov sp,#0x4000

    ;@ (PSR_SVC_MODE|PSR_FIQ_DIS|PSR_IRQ_DIS)
    mov r0,#0xD3
    msr cpsr_c,r0
    mov sp,#0x8000000

    ;@ SVC MODE, IRQ ENABLED, FIQ DIS
    ;@mov r0,#0x53
    ;@msr cpsr_c, r0

The armv8-m has an exception table but the exceptions are spaced out as shown in the ARM documentation.

The above the well known address documented by ARM is an entry point, code starts executing there so you need to place the instructions there, then if it is the reset handler that is normally where you would add code to setup the stack pointer, copy .data, zero .bss and any other bootstrapping needed before C code can be entered.

The cortex-ms which are armv6-m, armv7-m and armv8-m (so far compatible with one or the other) uses a vector table. Meaning the well known address(es) are vectors, addresses to the handler, not the instructions so you would do something like this

.thumb

.globl _start
_start:
.word 0x20001000
.word reset
.word loop
.word loop
.word loop

.thumb_func
reset:
    bl main
    b .
.thumb_func
loop:
    b .

As documented by ARM, the cortex-m vector table has an entry for stack pointer initialization, so you do not have to add code, just put the address there. On reset the logic reads from 0x00000000 places that value in the stack pointer, reads from 0x00000004 checks and strips the lsbit and starts execution at that address (lsbit needs to be set in the vector table, please do not do the reset + 1 thing, use the tools properly).

Note _start is not actually necessary, it is just a distraction these are bare-metal so there is no loader that needs to know what an entry point is, likewise you are ideally making your own bootstrap and linker script so there is no need for _start if you do not put it in your linker script. Just a habit more than anything to include it, saves on questions later.

When you read the architectural reference manual, any of them, you notice how the description of the stm/push instruction does a decrement first then store, so if you set 0x20001000 then the first thing pushed is at address 0x20000FFC, not 0x20001000, not necessarily true for non-ARMs so as always get and read the docs first, then start coding.

You the bare-metal programmer are wholly responsible for the memory map within the implementation by the chip vendor. So if there is 64KBytes of ram from 0x20000000 to 0x20010000 you decide how to slice that up. It is super easy to just go with the traditional stack comes down from the top, data at the bottom, heap in the middle although why would you ever possibly have a heap on an mcu if this is an mcu you are talking about (you did not specify). So for a 64K byte ram cortex-m you would likely just want to put 0x20010000 in the first entry of the vector table, stack pointer init question done. Some folks like to grossly over-complicate linker scripts in general and for some reason I cannot fathom, define the stack in the linker script. In that case you simply use a variable defined in the linker script to indicate top of stack and you use that in your vector table for a cortex-m or in the bootstrap code for a full sized ARM.

Also part of being wholly responsible for the memory space within the limits of the chip implementation means that you setup the linker script to match, you need to know the exception or vector table well known addresses as documented in the documents you already read by this point yes?

For a cortex-m maybe something like this

MEMORY
{
    /* rom : ORIGIN = 0x08000000, LENGTH = 0x1000 *//*AXIM*/
    rom : ORIGIN = 0x00200000, LENGTH = 0x1000 /*ITCM*/
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > rom
    .rodata : { *(.rodata*) } > rom
    .bss    : { *(.bss*)    } > ram
}

For aa Pi Zero maybe something like this:

MEMORY
{
    ram : ORIGIN = 0x8000, LENGTH = 0x1000
}
SECTIONS
{
    .text : { *(.text*) } > ram
    .rodata : { *(.rodata*) } > ram
    .bss : { *(.bss*) } > ram
    .data : { *(.data*) } > ram
}

and you can overcomplicate it from there.

The stack pointer is the easy part of the bootstrap you just put a number in you picked when you designed your memory map. Initializing .data and .bss is more complicated, although for a |Pi Zero if you know what you are doing the linker script can be as above and the bootstrap can be this simple

reset:
    ldr sp,=0x8000
    bl main
hang: b hang

If you do not change modes and do not use argc/argv. You can complicate it from there.

For a cortex-m you can make it simpler than that

reset:
    bl main
hang: b hang

Or if you do not use .data or .bss or do not need them initialized you can technically do this:

.word 0x20001000
.word main
.word handler
.word handler
...

But most folks other than me rely on .bss to be zero and .data to be initialized. You also cant return from main, which is perfectly fine for a bare-metal system like an mcu if your software design is event driven and there is no need for the foreground after setting everything up. Most folks think you cant return from main.

gcc has nothing to do with any of this, gcc is just a compiler it cant assemble it cant link, it cant even compile, gcc is a front end that calls other tools that do those jobs a parser a compiler an assembler and a linker unless told not to. The parser and compiler are part of gcc. The assembler and linker are part of a different package called binutils which has many binary utilities and also happens to include the gnu assembler or gas. It includes the gnu linker as well. Assembly languages are specific to an assembler not the target, linker scripts are specific to the linker, and inline assembly is specific to the compiler so these things are not assumed to port from one toolchain to another. It is generally not wise to use inline assembly, you have to be pretty desperate, better to use real assembly nor none at all, depends on what the real problem is. But yes with gnu you could inline the bootstrap if you really felt the need.

If this is a Raspberry Pi question the GPU bootloader copies the ARM program to ram for you so the whole thing is in ram making it so much easier compared to other bare metal. For an mcu though the logic simply boots using the documented solution, you are responsible for initializing ram so if you have any .data or .bss that you want initialized you have to do that in the bootstrap. The info needs to be in non-volatile ram so you use the linker to do two things one put this info in the non-volatile space (rom/flash) as well as tell it where you are going to have it in ram, if you use the tools right the linker will tell you were it put each thing in flash/ram and you can then programmatically using variables init those spaces. (before calling main of course).

There is a very intimate relationship between the bootstrap and the linker script for this reason for a platform where you are responsible for .data and .bss (plus other complications you create that you use the linker to solve). Certainly with gnu as you use your memory map design to specify where the .text, .data, .bss sections will live, you create variables in the linker script to know the starting point, end point and/or size, and those variables are used by the bootstrap to copy/init those sections. Since asm and the linker script are tool dependent these are not expected to be portable so you have to redo it possibly for each tool (where the C is more portable if you use no inline asm and no pragmas, etc. (no need for those anyway)) so the simpler the solution the less code you have to port if you wish to try the application on different tools wish to support different tools for the end user to use the application, etc.

The newest cores with aarch64 are quite complicated in general, but especially if you want to pick a specific mode there is very delicate bootstrap code you may need to write. The nice thing is that for banked registers you can access them directly from higher privileged modes and do not have to do the mode switchy thing like the armv4t and such. Not much of a savings as the execution levels, all the stuff you need to know and setup and maintain is quite detailed. Including the stacks for each execution layer and for applications when you launch them if you are creating an operating system.

How to make gcc generate stack in bare-metal environment?

Question

3 answers

solution1
1 ACCPTED 2020-10-03 12:21:07

solution2
1 2020-10-12 17:29:11

solution3
0 2020-10-03 15:03:09

How to make gcc generate stack in bare-metal environment?

Question

3 answers

solution1 1 ACCPTED 2020-10-03 12:21:07

solution2 1 2020-10-12 17:29:11

solution3 0 2020-10-03 15:03:09

solution1
1 ACCPTED 2020-10-03 12:21:07

solution2
1 2020-10-12 17:29:11

solution3
0 2020-10-03 15:03:09