Tracing a simple program in assembly

Question

I have created a simple c program to add two numbers:

void main(){
     int a = 4;
     int b = 5;
     int c = a+b;
}

and named it test.c I used "arm-linux-gcc -S test.c" to create test.s (Assembly code) Now I want to see the value of each of 16 registers after each assembly instruction. What should I do? I dont have any experience in assembly and I am relatively new to linux so I am not much aware of the tools used. Please help. Thanks in advance.

Answer 1

Well you are talking about two different things. If you want to see the contents of registers you need to execute the program so you need to make a binary. Targeted at a system, and then single step through it. yes gdb will work if you have the right gdb pointing at the right system. You can also use a jtag debugger, single step and then dump registers.

Little of this has anything to do with assembly language, you will want to see the instructions at an assembly language level when single stepping, sure, but you need to compile to a binary to run it.

Since your program does not do anything you need to be careful not to optimize, even a -O1 optimization will remove your code.

Here is something to try. I have a thumb instruction set simulator, thumb is the 16 bit subset to ARM (still in the ARM family, one to one relationship to ARM instructions). Go to github and download it. In thumbulator.c change this loop:

int run ( void )
{
    unsigned int ra;
    reset();
    while(1)
    {
        printf("-- 0x%08X --\n",reg_norm[15]-3);
        if(execute()) break;
        for(ra=0;ra< 8;ra++) printf("r%u 0x%08X ",ra,reg_norm[ra]); printf("\n");
        for(    ;ra<16;ra++) printf("r%u 0x%08X ",ra,reg_norm[ra]); printf("\n");

    }
    dump_counters();
    return(0);
}

To add the printfs to show the registers.

Go into the blinker directory, change notmain() to resemble your program:

int notmain ( void )
{
    int a = 4;
    int b = 5;
    int c;
    c = a + b;
    return(0);
}

Edit the Makefile

Change this line to use your compiler (I add the -gcc during the build).

ARMGNU = arm-linux

remove the -O2 from this line:

COPS = -Wall -mthumb -nostdlib -nostartfiles -ffreestanding

And change this to just build the gnu/gcc binary not the llvm binary.

all: gnotmain.bin

Now build that.

Look at the file gnotmain.list, you will see something like this but not necessarily exactly this, depends on your gcc.

00000074 <notmain>:
  74:   b580        push    {r7, lr}
  76:   b084        sub sp, #16
  78:   af00        add r7, sp, #0
  7a:   2304        movs    r3, #4
  7c:   60fb        str r3, [r7, #12]
  7e:   2305        movs    r3, #5
  80:   60bb        str r3, [r7, #8]
  82:   68fa        ldr r2, [r7, #12]
  84:   68bb        ldr r3, [r7, #8]
  86:   18d3        adds    r3, r2, r3
  88:   607b        str r3, [r7, #4]
  8a:   2300        movs    r3, #0
  8c:   1c18        adds    r0, r3, #0
  8e:   46bd        mov sp, r7
  90:   b004        add sp, #16
  92:   bd80        pop {r7, pc}

You will also see some code that boots up the processor:

00000000 <hang-0x50>:
   0:   40080000    andmi   r0, r8, r0
   4:   00000053    andeq   r0, r0, r3, asr r0
   8:   00000051    andeq   r0, r0, r1, asr r0
...

00000052 <_start>:
  52:   f000 f80f   bl  74 <notmain>
  56:   df01        svc 1
  58:   e7fe        b.n 58 <_start+0x6>

Which is different than what you will see on an ARM, thumbulator boots like an ARM cortex-m3 not like a traditional ARM instruction set based ARM. So the number at address 4 in this case is the address to reset to, (lsbit is set to indicate thumb mode, so the address is really 0x52). Then the _start code calls notmain and you get to your code.

The reason I mention this is because when you run thumbulator (./thumbulator blinker/gnotmain.bin) and those printfs you added dump out all the registers you will see it doing a few things before and after notmain.

-- 0x00000052 -- r0 0x00000000 r1 0x00000000 r2 0x00000000 r3 0x00000000 r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x00000000 r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x40080000 r14 0xFFFFFFFF r15 0x00000057 -- 0x00000054 -- r0 0x00000000 r1 0x00000000 r2 0x00000000 r3 0x00000000 r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x00000000 r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x40080000 r14 0x00000057 r15 0x00000077 -- 0x00000074 -- r0 0x00000000 r1 0x00000000 r2 0x00000000 r3 0x00000000 r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x00000000 r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x4007FFF8 r14 0x00000057 r15 0x00000079 -- 0x00000076 -- r0 0x00000000 r1 0x00000000 r2 0x00000000 r3 0x00000000 r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x00000000 r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x4007FFE8 r14 0x00000057 r15 0x0000007B -- 0x0000007 8 -- r0 0x00000000 r1 0x00000000 r2 0x00000000 r3 0x00000000 r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x4007FFE8 r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x4007FFE8 r14 0x00000057 r15 0x0000007D -- 0x0000007A -- r0 0x00000000 r1 0x00000000 r2 0x00000000 r3 0x00000004 r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x4007FFE8 r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x4007FFE8 r14 0x00000057 r15 0x0000007F -- 0x0000007C --

-- 0x00000052 -- is the first instruction executed, which is the first instruction after _start, it is a two instruction instruction so 0x52 and 0x54, that branches to 0x74 which is the start of notmain. Which starts with a push of r7 in the case of my compile, so r13 should change to reflect something was pushed. Next instruction sub sp, #16, again r13 will change (sp is r13, the stack pointer).

At 0x7A we get to the first bit of your C code. mov r3,#4 then it stores that to the stack in 0x7C (this is not optimized code). Then the b = 5 line of code the c = a + b stuff with lots of stack involved (not optimized). and it winds its way down.

What does optimization have to do with it? Well your program actually does nothing, so if you were to optimize it (put the -O2 back into the COPS environment variable) you would get this:

00000074 <notmain>:
  74:   2000        movs    r0, #0
  76:   4770        bx  lr

Basically it is this:

int notmain ( void )
{
    return(0);
}

The meat of the program.

If you want to see optimize code of what you are trying to do, IN A SEPARATE.C FILE put this code:

int xfun ( int a, int b )
{
    return(a+b);
}

add that to the project (compile it SEPARATELY. to its own.o file).

change notmain to this

int xfun ( int, int );
int notmain ( void )
{
    return(xfun(4,5));
}

And now you see the kind of thing you are probably interested in.

00000080 <xfun>:
  80:   1808        adds    r0, r1, r0
  82:   4770        bx  lr

Simulate that with thumbulator and look at the before and after instruction 0x80

-- 0x0000007C -- r0 0x00000004 r1 0x00000005 r2 0x00000000 r3 0x00000000 r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x00000000 r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x4007FFF8 r14 0x0000007F r15 0x00000083 -- 0x00000080 -- r0 0x00000009 r1 0x00000005 r2 0x00000000 r3 0x00000000 r4 0x00000000 r5 0x00000000 r6 0x00000000 r7 0x00000000 r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0x00000000 r12 0x00000000 r13 0x4007FFF8 r14 0x0000007F r15 0x00000085

Before r0 is 4, r1 is 5 the add r0 = r0 + r1 happens and r0 is now 9.

Now you dont really need to look at the static registers to follow the code, it is a fairly painful way to do it. Back to just disassembling, and not using thumbulator or anything to execute and dump instructions:

00000074 <notmain>:
  74:   b508        push    {r3, lr}
  76:   2004        movs    r0, #4
  78:   2105        movs    r1, #5
  7a:   f000 f801   bl  80 <xfun>
  7e:   bd08        pop {r3, pc}

00000080 <xfun>:
  80:   1808        adds    r0, r1, r0
  82:   4770        bx  lr

the movs is a move with the flags updated, will get to the ARM code in a sec. mov the number 4 into r0, then move the number 5 into r1, the bl branch link basically it is a branch with a return value set in r14 so you can get back (a function call instead of a branch). bl to xfun we see the add of r0, r1, r0 which means r0 = r1 + r0 so we know that r0 is destroyed, was a 4 now it is a 9. And basically the meat of your program is finished.

So now, back to ARM instructions, and this was interesting with the compiler I am using, using your compiler variation for demonstration:

arm-linux-gcc -S -O2 notmain.c

gives

mov r0, #4
mov r1, #5
b   xfun

when you dig into the meat of it, does the tail optimization thing the return value from xfun is the same as the return value from notmain so it leaves r14 unmodified and lets xfun return to whomever called notmain.

And the other file:

arm-linux-gcc -S -O2 xbox.c

Gives

add r0, r1, r0
bx  lr

Since ARM instructions can choose to modify the flags or not you dont see movs and adds you see mov and add because the code is not doing any conditionals related to those instructions so they did not generate the s version.

I am literally in the middle of a few gidhub projects as I type this. mbed_samples which is an ARM cortex-m3 based (thumb/thumb2 instructions only) microcontroller board. I wrote similarly long ramblings about the thing booting and using the tools and such in order to build and execute binaries (something you will be wanting to do before long). I also just posted lsasim the other day, not an ARM, not related to an ARM, but has LEARNASM.txt in there which might be useful for learning ASM for the first time. The C compiler backend is very crude, and I wouldnt really mess with it much, walk through the LEARNASM.txt tutorial. Then go to ARMs website http://infocenter.arm.com on the left under contents click on to expand ARM architecture, then click on Reference Manual, to see the reference manuals available. On the right side it will show you that the ARMv5 ARM (Architectural Reference Manual) which used to be just called the ARM ARM before they had so many different cores. That will have a listing of the traditional ARM 32 bit instruction set which is what you were building. it also has the thumb instruction set which runs on most of the current ARM able cores. Thumb2 is only on some cores and you would want something like the ARMv7-M ARM for the cortex-m3, which is replacing the ARM7 (the ARM7 is an ARMv4T, I know the numbers can be confusing) for embedded microcontrollers. QEMU can run ARM, thumb and thumb2 instructions, but getting some programs running where you have visibility into what is going on will take you quite a while. gdb has the ARMulator in it, which is what ARM used for a long time for various reasons, I have no use for gdb so dont know anything more than there is an ARM simulator in there (it does thumb and maybe thumb2 as well if that matters at all). that might be your fastest path to running something and dumping registers, an arm based gdb. Perhaps codesourcery comes with one, if not look for emdebian or just build your own. thumbulator is an option as well but I have strictly limited it to thumb instructions and it boots like a cortex-m3 not an arm (pretty easy to change if you wanted to). mame has an arm it it but getting that to compile anywhere outside of mame and feeding it programs is probably more work than the armulator source in gdb.

A completely different path would be to get something like say an olimex sam7-h64 board from sparkfun.com they are selling them off at about $16 or so each, used to be double or triple that. it is an ARM7 and will run ARM instructions. get an olimex wiggler or I prefer the amontek jtag-tiny. you can then use jtag to load programs into ram, and single step through them dumping any or all of the registers or memory whenever you want. I cannot think off hand of any of the boards I know if that run ARM instructions, but have a jtag built into the board, would be trivial today to do with the ftdi chips, instead the cortex-m based microcontrollers are showing up with that kind of thing either an ftdi chip up front that provides serial and bit banging jtag or more common the board actually has two microcontrollers from that vendor on it, one is the usb interface, that you cannot reprogram the other is the chip you bought the board to play with, the up front microcontroller handles the jtag stuff. Of those, the stellaris get you in with jtag

Long response and lots to digest, I know. Worth the effort if you or someone else reading this takes the time to learn assembly and keeps that knowledge alive. someone has to carry on the ability to develop processors and compilers as the old timers who grew up before there was even a C language, retire. just the knowledge you appeared to be after, taking a few lines of C code and looking at the assembly, even if you never use assembly day to day is teaches you to write better, faster, more reliable programs.

Good luck.

Answer 2

objdump should be able to disassemble your ARM assembly code. Another option might be gdb, but I am not sure if that supports ARM or not.

Tracing a simple program in assembly

Question

2 answers

solution1
5 ACCPTED 2011-08-14 01:46:18

solution2
1 2011-08-13 16:39:48

Tracing a simple program in assembly

Question

2 answers

solution1 5 ACCPTED 2011-08-14 01:46:18

solution2 1 2011-08-13 16:39:48

solution1
5 ACCPTED 2011-08-14 01:46:18

solution2
1 2011-08-13 16:39:48