sprintf 上的堆缓冲区溢出

Question

I'm getting a heap-buffer-overflow error on this code:我在此代码上收到heap-buffer-overflow错误：

// ast.c
char *not_last_prefix = malloc(strlen(next_prefix) + 4); // line 204

sprintf(not_last_prefix, "%s│  ", next_prefix); // line 206

=================================================================
==3394==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000279 at pc 0x7f0d9e6d7715 bp 0x7fff975bcf60 sp 0x7fff975bc6f0
WRITE of size 11 at 0x602000000279 thread T0
    #0 0x7f0d9e6d7714 in vsprintf (/lib/x86_64-linux-gnu/libasan.so.5+0x9e714)
    #1 0x7f0d9e6d7bce in sprintf (/lib/x86_64-linux-gnu/libasan.so.5+0x9ebce)
    #2 0x55708e40b909 in print_ast_impl src/ast.c:206
    #3 0x55708e40b7ef in print_ast src/ast.c:192
    #4 0x55708e4112ad in main src/main.c:50
    #5 0x7f0d9e46f1e2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x271e2)
    #6 0x55708e40a5cd in _start (/home/michael/Code/Baby-C/debug/bcc+0x65cd)

0x602000000279 is located 0 bytes to the right of 9-byte region [0x602000000270,0x602000000279)
allocated by thread T0 here:
    #0 0x7f0d9e746ae8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dae8)
    #1 0x55708e40b8cd in print_ast_impl src/ast.c:204
    #2 0x55708e40b7ef in print_ast src/ast.c:192
    #3 0x55708e4112ad in main src/main.c:50
    #4 0x7f0d9e46f1e2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x271e2)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/lib/x86_64-linux-gnu/libasan.so.5+0x9e714) in vsprintf
Shadow bytes around the buggy address:
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff8000: fa fa 00 fa fa fa 02 fa fa fa 00 00 fa fa 00 00
  0x0c047fff8010: fa fa 02 fa fa fa 00 00 fa fa 00 00 fa fa 02 fa
  0x0c047fff8020: fa fa 00 00 fa fa 00 00 fa fa 02 fa fa fa 02 fa
  0x0c047fff8030: fa fa 02 fa fa fa 02 fa fa fa 02 fa fa fa 02 fa
=>0x0c047fff8040: fa fa 02 fa fa fa fd fa fa fa 00 01 fa fa 00[01]
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==3394==ABORTING

Everything I can find suggests that I'm not allocating enough space for the result of the sprintf , but I can't see how that could be the case.我能找到的一切都表明我没有为sprintf的结果分配足够的空间，但我看不出这是怎么回事。 I allocate space for the length of next_prefix , 3 bytes for the "│ " that follows it, and 1 for the NULL terminator.我为next_prefix的长度分配空间，为它next_prefix的"│ "分配 3 个字节，为NULL终止符分配 1 个字节。 The resulting string should fit.结果字符串应该适合。 What am I missing here?我在这里缺少什么？

Answer 1

The problem is that the length of the string literal is not 3, but 5. This is due to the fact that the vertical bar is not the standard ASCII character, but a unicode character (UTF-8 encoded as three bytes).问题是字符串字面量的长度不是3，而是5。这是因为竖线不是标准的ASCII字符，而是unicode字符（UTF-8编码为三个字节）。

To avoid problems like this, one should assign the literal to a char * and take its length, like this为了避免这样的问题，应该将文字分配给char *并取其长度，如下所示

char *separator = "│  ";
char *not_last_prefix = malloc(strlen(next_prefix) + strlen(separator) + 1);
sprintf(not_last_prefix, "%s%s", next_prefix, separator);

Answer 2

The problem, as was pointed out to me, was that my format string contained a unicode character.正如我所指出的，问题是我的格式字符串包含一个 unicode 字符。 I wrongly assumed that mallocing one more byte would solve the problem - turns out UTF-8 characters can be as many as 4 bytes long!我错误地认为再分配一个字节就可以解决问题——原来 UTF-8 字符的长度可以多达 4 个字节！ The good news is that you can check exactly how many bytes they take up by checking this simple table ( found here ).好消息是您可以通过检查这个简单的表（在这里找到）来准确地检查它们占用了多少字节。

Character code (decimal) | Bytes used
-------------------------|------------
0-127                    | 1 byte
128-2047                 | 2 bytes
2048-65535               | 3 bytes
65536-1114111            | 4 bytes

In my case, the vertical bar character I was using ( │ ) is unicode "\│" , which means it takes up 3 bytes!就我而言，我使用的竖线字符 ( │ ) 是 unicode "\│" ，这意味着它占用 3 个字节！

sprintf 上的堆缓冲区溢出

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-04-27 08:35:27

解决方案2
1 2020-04-27 08:36:41

sprintf 上的堆缓冲区溢出

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-04-27 08:35:27

解决方案2 1 2020-04-27 08:36:41

解决方案1
2 已采纳 2020-04-27 08:35:27

解决方案2
1 2020-04-27 08:36:41