简体   繁体   English

理解格式字符串利用的困难

[英]Difficulties Understanding Format String Exploitation

I am reading a book, Hacking: The Art of Exploitation 2nd Edition, and I'm at the chapter of format string vulnerability.我正在读一本书,Hacking: The Art of Exploitation 2nd Edition,我正在阅读格式字符串漏洞的章节。 I read the chapter multiple times but I'm unable to clearly understand it, even with some googling.我多次阅读该章节,但即使使用谷歌搜索,我也无法清楚地理解它。

So, in the book there is this vulnerable code:所以,书中有这个易受攻击的代码:

 char text[1024];
...
 strcpy(text, argv[1]);
 printf("The right way to print user-controlled input:\n");
 printf("%s", text);
 printf("\nThe wrong way to print user-controlled input:\n");
 printf(text);

Then after compiling,然后编译后,

reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "%08x."x40')
The right way to print user-controlled input:
%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.
%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.
%08x.%08x.
The wrong way to print user-controlled input:
bffff320.b7fe75fc.00000000.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252
e78.252e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.2
52e7838.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.30252e78.252e78
38.2e783830.78383025.3830252e.30252e78.252e7838.2e783830.78383025.3830252e.

The bytes 0x25, 0x30, 0x38, 0x78, and 0x2e seem to be repeating a lot.字节 0x25、0x30、0x38、0x78 和 0x2e 似乎重复了很多。

reader@hacking:~/booksrc $ printf "\x25\x30\x38\x78\x2e\n"
%08x.

First, why is that value repeating itself?首先,为什么这个价值会重演?

As you can see, they're the memory for the format string itself.如您所见,它们是格式字符串本身的 memory。 Because the format function will always be on the highest stack frame, as long as the format string has been stored anywhere on the stack, it will be located below the current frame pointer (at a higher memory address).因为格式 function 将始终位于最高堆栈帧上,只要格式字符串已存储在堆栈上的任何位置,它将位于当前帧指针下方(位于更高的 memory 地址)。

But it seems to me this contradicts what he previously wrote and the way stack frames are organized但在我看来,这与他之前写的内容以及堆栈帧的组织方式相矛盾

When this printf() function is called (as with any function), the arguments are pushed to the stack in reverse order.当调用此 printf() function 时(与任何函数一样),arguments 以相反的顺序被推入堆栈。

So, shouldn't the format string be at a lower memory address since it is the first argument?那么,格式字符串不应该位于较低的 memory 地址,因为它是第一个参数吗? And where is the format string stored?格式字符串存储在哪里?

reader@hacking:~/booksrc $ ./fmt_vuln AAAA%08x.%08x.%08x.%08x
The right way to print user-controlled input:
AAAA%08x.%08x.%08x.%08x
The wrong way to print user-controlled input:
AAAAbffff3d0.b7fe75fc.00000000.41414141

Here again, why is AAAA repeated in 41414141 .再次,为什么在41414141中重复AAAA From what I understand, the printf function prints AAAA first, then when it sees the first %08x , it gets a value from a memory address in the preceding stack frame, then does the same with the second %08x , thus the value of the second is located in a memory address higher than the first one, and finally returns to the value of AAAA located in a lower memory address, in the stack frame of printf function. From what I understand, the printf function prints AAAA first, then when it sees the first %08x , it gets a value from a memory address in the preceding stack frame, then does the same with the second %08x , thus the value of the second is located in a memory address higher than the first one, and finally returns to the value of AAAA located in a lower memory address, in the stack frame of printf function.

I debugged the first example with $(perl -e 'print "%08x."x40') as argument.我用$(perl -e 'print "%08x."x40')作为参数调试了第一个示例。 I run: Linux 5.3.0-40-generic, 18.04.1-Ubuntu, x86_64我运行:Linux 5.3.0-40-generic,18.04.1-Ubuntu,x86_64

(gdb) run $(perl -e 'print "%08x." x 40')
Starting program: /home/kuro/fmt_vuln $(perl -e 'print "%08x." x 40')
The right way to print user-controlled input:
%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.
The wrong way to print user-controlled input:
07a51260.4b3eb8c0.4b10e154.00000000.4b16c3a0.9d357fc8.9d357b10.78383025.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838.78383025.30252e78.2e783830.3830252e.252e7838.4b618d00.4b5fd000.00000000.9d357c80.00000000.00000000.00000000.4b3ef6f0.

Breakpoint 1, main (argc=2, argv=0x7ffd9d357fc8) at fmt_vuln.c:19
19      printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val, test_val);
(gdb) x/-100xw $rsp
0x7ffd9d357940: 0x00000400  0x00000000  0x4b07c1aa  0x00007fb8
0x7ffd9d357950: 0x00000016  0x00000000  0x00000003  0x00000000
0x7ffd9d357960: 0x00000001  0x00000000  0x00002190  0x000003e8
0x7ffd9d357970: 0x00000005  0x00000000  0x00008800  0x00000000
0x7ffd9d357980: 0x00000000  0x00000000  0x00000400  0x00000000
0x7ffd9d357990: 0x00000000  0x00000000  0x5e970730  0x00000000
0x7ffd9d3579a0: 0x65336234  0x30663666  0x90890300  0x79e57be9
0x7ffd9d3579b0: 0x1cd79dbf  0x00000000  0x00000000  0x00000000
0x7ffd9d3579c0: 0x05cec660  0x000055ef  0x9d357fc0  0x00007ffd
0x7ffd9d3579d0: 0x00000000  0x00000000  0x00000000  0x00000000
0x7ffd9d3579e0: 0x9d357ee0  0x00007ffd  0x4b062f26  0x00007fb8
0x7ffd9d3579f0: 0x00000030  0x00000030  0x9d357be8  0x00007ffd
0x7ffd9d357a00: 0x9d357a10  0x00007ffd  0x90890300  0x79e57be9
0x7ffd9d357a10: 0x4b3ea760  0x00007fb8  0x07a51260  0x000055ef
0x7ffd9d357a20: 0x4b3eb8c0  0x00007fb8  0x4b0891bd  0x00007fb8
0x7ffd9d357a30: 0x00000000  0x00000000  0x4b3ea760  0x00007fb8
0x7ffd9d357a40: 0x00000d68  0x00000000  0x00000169  0x00000000
0x7ffd9d357a50: 0x07a51260  0x000055ef  0x4b08af51  0x00007fb8
0x7ffd9d357a60: 0x4b3e62a0  0x00007fb8  0x4b3ea760  0x00007fb8
0x7ffd9d357a70: 0x0000000a  0x00000000  0x05cec660  0x000055ef
0x7ffd9d357a80: 0x9d357fc0  0x00007ffd  0x00000000  0x00000000
0x7ffd9d357a90: 0x00000000  0x00000000  0x4b08b403  0x00007fb8
0x7ffd9d357aa0: 0x4b3ea760  0x00007fb8  0x9d357ee0  0x00007ffd
0x7ffd9d357ab0: 0x05cec660  0x000055ef  0x4b0808f5  0x00007fb8
0x7ffd9d357ac0: 0x00000000  0x00000000  0x05cec824  0x000055ef
(gdb) x/100xw $rsp
0x7ffd9d357ad0: 0x9d357fc8  0x00007ffd  0x9d357b10  0x00000002
0x7ffd9d357ae0: 0x78383025  0x3830252e  0x30252e78  0x252e7838
0x7ffd9d357af0: 0x2e783830  0x78383025  0x3830252e  0x30252e78
0x7ffd9d357b00: 0x252e7838  0x2e783830  0x78383025  0x3830252e
0x7ffd9d357b10: 0x30252e78  0x252e7838  0x2e783830  0x78383025
0x7ffd9d357b20: 0x3830252e  0x30252e78  0x252e7838  0x2e783830
0x7ffd9d357b30: 0x78383025  0x3830252e  0x30252e78  0x252e7838
0x7ffd9d357b40: 0x2e783830  0x78383025  0x3830252e  0x30252e78
0x7ffd9d357b50: 0x252e7838  0x2e783830  0x78383025  0x3830252e
0x7ffd9d357b60: 0x30252e78  0x252e7838  0x2e783830  0x78383025
0x7ffd9d357b70: 0x3830252e  0x30252e78  0x252e7838  0x2e783830
0x7ffd9d357b80: 0x78383025  0x3830252e  0x30252e78  0x252e7838
0x7ffd9d357b90: 0x2e783830  0x78383025  0x3830252e  0x30252e78
0x7ffd9d357ba0: 0x252e7838  0x2e783830  0x4b618d00  0x00007fb8
0x7ffd9d357bb0: 0x4b5fd000  0x00007fb8  0x00000000  0x00000000
0x7ffd9d357bc0: 0x9d357c80  0x00007ffd  0x00000000  0x00000000
0x7ffd9d357bd0: 0x00000000  0x00000000  0x00000000  0x00000000
0x7ffd9d357be0: 0x4b3ef6f0  0x00007fb8  0x4b6184c8  0x00007fb8
0x7ffd9d357bf0: 0x9d357c80  0x00007ffd  0x4b3ef000  0x00007fb8
0x7ffd9d357c00: 0x4b3ef914  0x00007fb8  0x4b3ef3c0  0x00007fb8
0x7ffd9d357c10: 0x4b617048  0x00007fb8  0x00000000  0x00000000
0x7ffd9d357c20: 0x00000000  0x00000000  0x4b6179f0  0x00007fb8
0x7ffd9d357c30: 0x4b0030e8  0x00007fb8  0x00000000  0x00000000
0x7ffd9d357c40: 0x4b3efa00  0x00007fb8  0x00000480  0x00000000
0x7ffd9d357c50: 0x00000027  0x00000000  0x00000000  0x00000000

The values, that appear before "%08x."出现在“%08x”之前的值。 in the Wrong way output, appear in lower addresses than "%08x."以错误的方式 output,出现在低于“%08x”的地址中。 values.价值观。 Why?为什么? The format string is supposed to be at the top of the stack.格式字符串应该位于堆栈的顶部。

The values, that appear after the "%08x."出现在“%08x”之后的值。 values in the Wrong way output, appear in higher addresses than"%08x."错误方式 output 的值,出现在比“%08x”更高的地址中。 values.价值观。 So in the preceding stack.所以在前面的堆栈中。

Why is it like this?为什么会这样? Shouldn't the output begin from the format string values, or after? output 不应该从格式字符串值开始还是之后?

Also, in the book, it doesn't print values after "%08x."此外,在书中,它不会在“%08x”之后打印值。 values.价值观。 But some are printed in my case.但有些是在我的情况下打印的。 And some values in the output don't even figure in the stack, like 4b16c3a0 . output 中的某些值甚至不在堆栈中,例如4b16c3a0

I have to recommend against what you're doing.我必须反对你正在做的事情。 You're focussing on security vulnerabilities in C without a strong understanding of the language itself.您专注于 C 中的安全漏洞,但对语言本身没有深入了解。 That's an exercise in frustration.这是一种沮丧的练习。 As evidence, I offer that every question you're posing about the exercise is answered by understanding printf (3), not stack vulnerabilities.作为证据,我提出你提出的关于练习的每个问题都是通过理解printf (3) 而不是堆栈漏洞来回答的。

The output of your perl line (the contents of argv[1] ) starts with, %08x.%08x.%08x.%08x.%08x .您的 perl 行的 output ( argv[1]的内容)以%08x.%08x.%08x.%08x.%08x开头。 Thats a format string.那是一个格式字符串。 Each %08x is looking for a further printf argument, an integer to print in hex representation.每个%08x都在寻找另一个printf参数,一个 integer 以十六进制表示形式打印。 Normally, you might do something like,通常,你可能会做类似的事情,

int a = 'B';
printf( "%02x\n", a );

which produces 42 much faster than the computer in the Hitchhiker's Guide to the Galaxy .银河系漫游指南》中的计算机快 42 倍。

What you've done is pass a long format string with zero arguments.您所做的是传递一个arguments 的长格式字符串。 printf (3) can't know how many arguments it was passed; printf (3)不知道通过了多少个arguments; it has to infer them from the format string.它必须从格式字符串中推断出它们。 Your format string tells printf to print a long list of integers.您的格式字符串告诉 printf 打印一长串整数。 Since none were provided, it looks for them "up the stack" (wherever they should have been).由于没有提供任何内容,它会在“堆栈上”(无论它们应该在哪里)查找它们。 You print nonsense because the contents of those memory locations is unpredictable.您打印废话是因为那些 memory 位置的内容是不可预测的。 Or, at any rate, weren't defined by you.或者,无论如何,不是由你定义的。

In the "good" case, the format string is "%s" , declaring one argument of type string, which you provided.在“好”的情况下,格式字符串是"%s" ,声明一个您提供的字符串类型的参数。 That works much better, yes.效果更好,是的。

Most compilers nowadays take special care with printf.现在大多数编译器都特别注意 printf。 They can produce warnings if the format string isn't a compile-time constant, and they can verify that each argument is of the correct type for its corresponding format specifier.如果格式字符串不是编译时常量,它们会产生警告,并且它们可以验证每个参数的类型是否与其对应的格式说明符正确。 The whole chapter in your book can thus be made moot simply by using the compiler's capabilities and paying attention to its diagnostics.因此,只需使用编译器的功能并注意其诊断,就可以使您书中的整个章节变得毫无意义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM