C - How to create a pattern in code segment to recognize it in memory dump?

Question

I dump my RAM (a piece of it - code segment only) in order to find where is which C function being placed. I have no map file and I don't know what boot/init routines exactly do.

I load my program into RAM, then if I dump the RAM, it is very hard to find exactly where is what function. I'd like to use different patterns build in the C source, to recognize them in the memory dump.

I've tryed to start every function with different first variable containing name of function, like:

char this_function_name[]="main";

but it doesn't work, because this string will be placed in the data segment.

I have simple 16-bit RISC CPU and an experimental proprietary compiler (no GCC or any well-known). The system has 16Mb of RAM, shared with other applications (bootloader, downloader). It is almost impossible to find say a unique sequence of N NOPs or smth. like 0xABCD. I would like to find all functions in RAM, so I need unique identificators of functions visible in RAM-dump.

What would be the best pattern for code segment?

Answer 1

If it were me, I'd use the symbol table, eg "nm a.out | grep main". Get the real address of any function you want.

If you really have no symbol table, make your own.

struct tab {
    void *addr;
    char name[100];  // For ease of searching, use an array.
} symtab[] = {
    { (void*)main, "main" },
    { (void*)otherfunc, "otherfunc" },
};

Search for the name, and the address will immediately preceed it. Goto address. ;-)

Answer 2

If your compiler has inline asm you can use it to create a pattern. Write some NOP instructions which you can easily recognize by opcodes in memory dump:

MOV r0,r0
MOV r0,r0
MOV r0,r0
MOV r0,r0

Answer 3

Numeric constants are placed in the code segment, encoded in the function's instructions. So you could try to use magic numbers like 0xDEADBEEF and so on.

Ie here's the disassembly view of a simple C function with Visual C++:

void foo(void)
{
00411380  push        ebp  
00411381  mov         ebp,esp 
00411383  sub         esp,0CCh 
00411389  push        ebx  
0041138A  push        esi  
0041138B  push        edi  
0041138C  lea         edi,[ebp-0CCh] 
00411392  mov         ecx,33h 
00411397  mov         eax,0CCCCCCCCh 
0041139C  rep stos    dword ptr es:[edi] 
    unsigned id = 0xDEADBEEF;
0041139E  mov         dword ptr [id],0DEADBEEFh

You can see the 0xDEADBEEF making it into the function's source. Note that what you actually see in the executable depends on the endianness of the CPU (tx. Richard).

This is a x86 example. But RISC CPUs (MIPS, etc) have instructions moving immediates into registers - these immediates can have special recognizable values as well (although only 16-bit for MIPS, IIRC).

Psihodelia - it's getting harder and harder to catch your intention. Is it just a single function you want to find? Then can't you just place 5 NOPs one after another and look for them? Do you control the compiler/assembler/linker/loader? What tools are at your disposal?

Answer 4

As you noted, this:

char this_function_name[]="main";

... will end up setting a pointer in your stack to a data segment containing the string. However, this:

char this_function_name[]= { 'm', 'a', 'i', 'n' };

... will likely put all these bytes in your stack so you will be able to recognize the string in your code (I just tried it on my platform).

Hope this helps

Answer 5

How about a completely different approach to your real problem, which is finding a particular block of code: Use diff.

Compile the code once with the function in question included, and once with it commented out. Produce RAM dumps of both. Then, diff the two dumps to see what's changed -- and that will be the new code block. (You may have to do some sort of processing of the dumps to remove memory addresses in order to get a clean diff, but the order of instructions ought to be the same in either case.)

Answer 6

Why not get each function to dump its own address. Something like this:

void* fnaddr( char* fname, void* addr )
{
    printf( "%s\t0x%p\n", fname, addr ) ;
    return addr ;
}


void test( void )
{
    static void* fnaddr_dummy = fnaddr( __FUNCTION__, test ) ;
}

int main (int argc, const char * argv[]) 
{
    static void* fnaddr_dummy = fnaddr( __FUNCTION__, main ) ;
    test() ;
    test() ;
}

By making fnaddr_dummy static, the dump is done once per-function. Obviously you would need to adapt fnaddr() to support whatever output or logging means you have on your system. Unfortunately, if the system performs lazy initialisation, you'll only get the addresses of the functions that are actually called (which may be good enough).

Answer 7

You could start each function with a call to the same dummy function like:

void identifyFunction( unsigned int identifier) { }

Each of your functions would call the identifyFunction-function with a different parameter (1, 2, 3, ...). This will not give you a magic mapfile, but when you inspect the code dump you should be able to quickly find out where the identifyFunction is because there will be lots of jumps to that address. Next scan for those jump and check before the jump to see what parameter is passed. Then you can make your own mapfile. With some scripting this should be fairly automatic.

C - How to create a pattern in code segment to recognize it in memory dump?

Question

7 answers

solution1
7 ACCPTED 2010-01-15 12:13:28

solution2
3 2010-01-15 12:28:06

solution3
1 2010-01-15 12:10:41

solution4
1 2010-01-15 13:58:12

solution5
1 2010-01-16 02:18:57

solution6
1 2010-01-16 11:09:06

solution7
0 2010-01-17 13:08:47

C - How to create a pattern in code segment to recognize it in memory dump?

Question

7 answers

solution1 7 ACCPTED 2010-01-15 12:13:28

solution2 3 2010-01-15 12:28:06

solution3 1 2010-01-15 12:10:41

solution4 1 2010-01-15 13:58:12

solution5 1 2010-01-16 02:18:57

solution6 1 2010-01-16 11:09:06

solution7 0 2010-01-17 13:08:47

solution1
7 ACCPTED 2010-01-15 12:13:28

solution2
3 2010-01-15 12:28:06

solution3
1 2010-01-15 12:10:41

solution4
1 2010-01-15 13:58:12

solution5
1 2010-01-16 02:18:57

solution6
1 2010-01-16 11:09:06

solution7
0 2010-01-17 13:08:47