简体   繁体   English

C - 如何在代码段中创建模式以在内存转储中识别它?

[英]C - How to create a pattern in code segment to recognize it in memory dump?

I dump my RAM (a piece of it - code segment only) in order to find where is which C function being placed. 我转储我的RAM(它的一部分 - 仅代码段),以便找到放置C函数的位置。 I have no map file and I don't know what boot/init routines exactly do. 我没有map文件,我不知道boot / init例程究竟是做什么的。

I load my program into RAM, then if I dump the RAM, it is very hard to find exactly where is what function. 我将程序加载到RAM中,然后如果我转储RAM,很难找到确切的功能在哪里。 I'd like to use different patterns build in the C source, to recognize them in the memory dump. 我想在C源代码中使用不同的模式,以便在内存转储中识别它们。

I've tryed to start every function with different first variable containing name of function, like: 我尝试使用包含函数名称的不同第一个变量来启动每个函数,例如:

char this_function_name[]="main";

but it doesn't work, because this string will be placed in the data segment. 但它不起作用,因为此字符串将放在数据段中。

I have simple 16-bit RISC CPU and an experimental proprietary compiler (no GCC or any well-known). 我有简单的16位RISC CPU和实验专有编译器(没有GCC或任何众所周知的)。 The system has 16Mb of RAM, shared with other applications (bootloader, downloader). 系统具有16Mb的RAM,与其他应用程序共享(引导加载程序,下载程序)。 It is almost impossible to find say a unique sequence of N NOPs or smth. 几乎不可能找到一个独特的N NOP或smth序列。 like 0xABCD. 像0xABCD。 I would like to find all functions in RAM, so I need unique identificators of functions visible in RAM-dump. 我想在RAM中找到所有函数,所以我需要在RAM-dump中可见的唯一函数标识符。

What would be the best pattern for code segment? 什么是代码段的最佳模式?

If it were me, I'd use the symbol table, eg "nm a.out | grep main". 如果是我,我会使用符号表,例如“nm a.out | grep main”。 Get the real address of any function you want. 获取您想要的任何功能的真实地址。

If you really have no symbol table, make your own. 如果你真的没有符号表,那就自己做吧。

struct tab {
    void *addr;
    char name[100];  // For ease of searching, use an array.
} symtab[] = {
    { (void*)main, "main" },
    { (void*)otherfunc, "otherfunc" },
};

Search for the name, and the address will immediately preceed it. 搜索名称,地址将紧接在其之前。 Goto address. 转到地址。 ;-) ;-)

If your compiler has inline asm you can use it to create a pattern. 如果您的编译器具有内联asm,则可以使用它来创建模式。 Write some NOP instructions which you can easily recognize by opcodes in memory dump: 编写一些NOP指令,您可以通过内存转储中的操作码轻松识别这些指令:

MOV r0,r0
MOV r0,r0
MOV r0,r0
MOV r0,r0

Numeric constants are placed in the code segment, encoded in the function's instructions. 数字常量放在代码段中,在函数说明中编码。 So you could try to use magic numbers like 0xDEADBEEF and so on. 所以你可以尝试使用魔术数字,如0xDEADBEEF等。

Ie here's the disassembly view of a simple C function with Visual C++: 即这是使用Visual C ++的简单C函数的反汇编视图:

void foo(void)
{
00411380  push        ebp  
00411381  mov         ebp,esp 
00411383  sub         esp,0CCh 
00411389  push        ebx  
0041138A  push        esi  
0041138B  push        edi  
0041138C  lea         edi,[ebp-0CCh] 
00411392  mov         ecx,33h 
00411397  mov         eax,0CCCCCCCCh 
0041139C  rep stos    dword ptr es:[edi] 
    unsigned id = 0xDEADBEEF;
0041139E  mov         dword ptr [id],0DEADBEEFh 

You can see the 0xDEADBEEF making it into the function's source. 您可以看到0xDEADBEEF使其成为函数的源代码。 Note that what you actually see in the executable depends on the endianness of the CPU (tx. Richard). 请注意,您在可执行文件中实际看到的内容取决于CPU的字节顺序(tx.Richard)。

This is a x86 example. 这是一个x86示例。 But RISC CPUs (MIPS, etc) have instructions moving immediates into registers - these immediates can have special recognizable values as well (although only 16-bit for MIPS, IIRC). 但RISC CPU(MIPS等)具有将指令移动到寄存器中的指令 - 这些指令也可以具有特殊的可识别值(尽管MIPS只有16位,IIRC)。


Psihodelia - it's getting harder and harder to catch your intention. Psihodelia - 抓住你的意图变得越来越难。 Is it just a single function you want to find? 它只是您想要找到的单一功能吗? Then can't you just place 5 NOPs one after another and look for them? 那么你不能一个接一个地放置5个NOP并寻找它们吗? Do you control the compiler/assembler/linker/loader? 你控制编译器/汇编器/链接器/加载器吗? What tools are at your disposal? 您可以使用哪些工具?

As you noted, this: 如你所说,这个:

char this_function_name[]="main";

... will end up setting a pointer in your stack to a data segment containing the string. ...最终会将堆栈中的指针设置为包含该字符串的数据段。 However, this: 但是,这个:

char this_function_name[]= { 'm', 'a', 'i', 'n' };

... will likely put all these bytes in your stack so you will be able to recognize the string in your code (I just tried it on my platform). ...可能会将所有这些字节放在您的堆栈中,这样您就能够识别代码中的字符串(我只是在我的平台上尝试过)。

Hope this helps 希望这可以帮助

How about a completely different approach to your real problem, which is finding a particular block of code: Use diff. 如何找到一个完全不同的方法来解决你的真正问题,即找到一个特定的代码块:使用diff。

Compile the code once with the function in question included, and once with it commented out. 使用所包含的函数编译一次代码,并将其注释掉一次。 Produce RAM dumps of both. 生成两者的RAM转储。 Then, diff the two dumps to see what's changed -- and that will be the new code block. 然后,区分两个转储以查看更改的内容 - 这将是新的代码块。 (You may have to do some sort of processing of the dumps to remove memory addresses in order to get a clean diff, but the order of instructions ought to be the same in either case.) (您可能必须对转储进行某种处理以删除内存地址以获得干净的差异,但在任何一种情况下,指令的顺序应该相同。)

Why not get each function to dump its own address. 为什么不让每个函数转储自己的地址。 Something like this: 像这样的东西:

void* fnaddr( char* fname, void* addr )
{
    printf( "%s\t0x%p\n", fname, addr ) ;
    return addr ;
}


void test( void )
{
    static void* fnaddr_dummy = fnaddr( __FUNCTION__, test ) ;
}

int main (int argc, const char * argv[]) 
{
    static void* fnaddr_dummy = fnaddr( __FUNCTION__, main ) ;
    test() ;
    test() ;
}

By making fnaddr_dummy static, the dump is done once per-function. 通过使fnaddr_dummy为静态,转储每个函数执行一次。 Obviously you would need to adapt fnaddr() to support whatever output or logging means you have on your system. 显然,您需要调整fnaddr()以支持您在系统上的任何输出或日志记录方式。 Unfortunately, if the system performs lazy initialisation, you'll only get the addresses of the functions that are actually called (which may be good enough). 不幸的是,如果系统执行延迟初始化,您将只获得实际调用的函数的地址(这可能足够好)。

You could start each function with a call to the same dummy function like: 您可以通过调用相同的虚函数来启动每个函数,如:

void identifyFunction( unsigned int identifier) { } void identifyFunction(unsigned int identifier){}

Each of your functions would call the identifyFunction-function with a different parameter (1, 2, 3, ...). 您的每个函数都会使用不同的参数(1,2,3,...)调用identifyFunction函数。 This will not give you a magic mapfile, but when you inspect the code dump you should be able to quickly find out where the identifyFunction is because there will be lots of jumps to that address. 这不会给你一个神奇的mapfile,但是当你检查代码转储时,你应该能够快速找到identifyFunction的位置,因为会有很多跳转到该地址。 Next scan for those jump and check before the jump to see what parameter is passed. 接下来扫描那些跳转并在跳转之前检查以查看传递的参数。 Then you can make your own mapfile. 然后你可以制作自己的mapfile。 With some scripting this should be fairly automatic. 使用一些脚本,这应该是相当自动的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM