具有有效指针的Memcpy segfaulting

Question

I'm using libcurl in my program, and running into a segfault. 我在程序中使用libcurl，并遇到了段错误。 Before I filed a bug with the curl project, I thought I'd do a little debugging. 在我向curl项目提交错误之前，我想我需要进行一些调试。 What I found seemed very odd to me, and I haven't been able to make sense of it yet. 我发现的东西对我来说似乎很奇怪，而且我还无法理解。

First, the segfault traceback: 首先，段故障回溯：

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe77f6700 (LWP 592)]
0x00007ffff6a2ea5c in memcpy () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff6a2ea5c in memcpy () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff5bc29e5 in x509_name_oneline (a=0x7fffe3d9c3c0,
    buf=0x7fffe77f4ec0 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%", size=255) at ssluse.c:629
#2  0x00007ffff5bc2a6f in cert_verify_callback (ok=1, ctx=0x7fffe77f50b0)
    at ssluse.c:645
#3  0x00007ffff72c9a80 in ?? () from /lib/libcrypto.so.0.9.8
#4  0x00007ffff72ca430 in X509_verify_cert () from /lib/libcrypto.so.0.9.8
#5  0x00007ffff759af58 in ssl_verify_cert_chain () from /lib/libssl.so.0.9.8
#6  0x00007ffff75809f3 in ssl3_get_server_certificate ()
   from /lib/libssl.so.0.9.8
#7  0x00007ffff7583e50 in ssl3_connect () from /lib/libssl.so.0.9.8
#8  0x00007ffff5bc48f0 in ossl_connect_step2 (conn=0x7fffe315e9a8, sockindex=0)
    at ssluse.c:1724
#9  0x00007ffff5bc700f in ossl_connect_common (conn=0x7fffe315e9a8,
    sockindex=0, nonblocking=false, done=0x7fffe77f543f) at ssluse.c:2498
#10 0x00007ffff5bc7172 in Curl_ossl_connect (conn=0x7fffe315e9a8, sockindex=0)
    at ssluse.c:2544
#11 0x00007ffff5ba76b9 in Curl_ssl_connect (conn=0x7fffe315e9a8, sockindex=0)
...

The call to memcpy looks like this: 对memcpy的调用如下所示：

  memcpy(buf, biomem->data, size);
(gdb) p buf
$46 = 0x7fffe77f4ec0 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%"
(gdb) p biomem->data
$47 = 0x7fffe3e1ef60 "C=US; O=The Go Daddy Group, Inc.; OU=Go Daddy Class 2 Certification Authority\375\034<M_r\206\233\261\310\340\371\023.Jg\205\244\304\325\347\372\016#9Ph%"
(gdb) p size
$48 = 255

If I go up a frame, I see that the pointer passed in for buf came from a local variable defined in the calling function: 如果上一帧，我看到传递给buf的指针来自调用函数中定义的局部变量：

char buf[256];

Here's where it starts to get weird. 这是开始变得奇怪的地方。 I can manually inspect all 256 bytes of both buf and biomem->data without gdb complaining that the memory isn't accesible. 我可以手动检查buf和biomem-> data的所有256字节，而gdb不会抱怨该内存不可访问。 I can also manually write all 256 bytes of buf using the gdb set command, without any error. 我还可以使用gdb set命令手动写入所有256个buf字节，而不会出现任何错误。 So if all the memory involved is readable and writable, why does memcpy fail? 因此，如果涉及的所有内存都是可读写的，那么memcpy为什么会失败？

Also interesting is that I can use gdb to manually call memcpy with the pointers involved. 同样有趣的是，我可以使用gdb使用涉及的指针手动调用memcpy。 As long as I pass a size <= 160, it runs without a problem. 只要我通过大小<= 160，它就可以正常运行。 As soon as I pass 161 or higher, gdb gets a sigsegv. 一旦我通过161或更高，gdb就会得到一个sigsegv。 I know buf is larger than 160, because it was created on the stack as an array of 256. biomem->data is a little harder to figure, but I can read well past byte 160 with gdb. 我知道buf大于160，因为它是在堆栈上创建为256的数组。biomem-> data有点难以理解，但是我可以用gdb读完字节160。

I should also mention that this function (or rather the curl method I call that leads to this) completes successfully many times before the crash. 我还应该提到，此函数（或更确切地说是我导致的curl方法）在崩溃之前成功完成了很多次。 My program uses curl to repeatedly call a web service API while it runs. 我的程序在运行时使用curl反复调用Web服务API。 It calls the API every five seconds or so, and runs for about 14 hours before it crashes. 它每五秒钟左右调用一次API，并在崩溃前运行约14个小时。 It's possible that something else in my app is writing out of bounds and stomping on something that creates the error condition. 我的应用程序中的其他内容可能会超出范围并踩踏会导致错误情况的内容。 But it seems suspicious that it crashes at exactly the same point every time, although the timing varies. 但是，尽管时间有所不同，但它每次都在完全相同的位置崩溃似乎令人怀疑。 And all the pointers seem ok in gdb, but memcpy still fails. 并且所有指针在gdb中似乎都可以，但是memcpy仍然失败。 Valgrind doesn't find any bounds errors, but I haven't let my program run with valgrind for 14 hours. Valgrind找不到任何边界错误，但是我没有让我的程序与valgrind一起运行14个小时。

Within memcpy itself, the disassembly looks like this: 在memcpy本身中，反汇编如下所示：

(gdb) x/20i $rip-10
   0x7ffff6a2ea52 <memcpy+242>: jbe    0x7ffff6a2ea74 <memcpy+276>
   0x7ffff6a2ea54 <memcpy+244>: lea    0x20(%rdi),%rdi
   0x7ffff6a2ea58 <memcpy+248>: je     0x7ffff6a2ea90 <memcpy+304>
   0x7ffff6a2ea5a <memcpy+250>: dec    %ecx
=> 0x7ffff6a2ea5c <memcpy+252>: mov    (%rsi),%rax
   0x7ffff6a2ea5f <memcpy+255>: mov    0x8(%rsi),%r8
   0x7ffff6a2ea63 <memcpy+259>: mov    0x10(%rsi),%r9
   0x7ffff6a2ea67 <memcpy+263>: mov    0x18(%rsi),%r10
   0x7ffff6a2ea6b <memcpy+267>: mov    %rax,(%rdi)
   0x7ffff6a2ea6e <memcpy+270>: mov    %r8,0x8(%rdi)
   0x7ffff6a2ea72 <memcpy+274>: mov    %r9,0x10(%rdi)
   0x7ffff6a2ea76 <memcpy+278>: mov    %r10,0x18(%rdi)
   0x7ffff6a2ea7a <memcpy+282>: lea    0x20(%rsi),%rsi
   0x7ffff6a2ea7e <memcpy+286>: lea    0x20(%rdi),%rdi
   0x7ffff6a2ea82 <memcpy+290>: jne    0x7ffff6a2ea30 <memcpy+208>
   0x7ffff6a2ea84 <memcpy+292>: data32 data32 nopw %cs:0x0(%rax,%rax,1)
   0x7ffff6a2ea90 <memcpy+304>: and    $0x1f,%edx
   0x7ffff6a2ea93 <memcpy+307>: mov    -0x8(%rsp),%rax
   0x7ffff6a2ea98 <memcpy+312>: jne    0x7ffff6a2e969 <memcpy+9>
   0x7ffff6a2ea9e <memcpy+318>: repz retq
(gdb) info registers
rax            0x0      0
rbx            0x7fffe77f50b0   140737077268656
rcx            0x1      1
rdx            0xff     255
rsi            0x7fffe3e1f000   140737016623104
rdi            0x7fffe77f4f60   140737077268320
rbp            0x7fffe77f4e90   0x7fffe77f4e90
rsp            0x7fffe77f4e48   0x7fffe77f4e48
r8             0x11     17
r9             0x10     16
r10            0x1      1
r11            0x7ffff6a28f7a   140737331236730
r12            0x7fffe3dde490   140737016358032
r13            0x7ffff5bc2a0c   140737316137484
r14            0x7fffe3d69b50   140737015880528
r15            0x0      0
rip            0x7ffff6a2ea5c   0x7ffff6a2ea5c <memcpy+252>
eflags         0x10203  [ CF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb) p/x $rsi
$50 = 0x7fffe3e1f000
(gdb) x/20x $rsi
0x7fffe3e1f000: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f010: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f020: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f030: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffe3e1f040: 0x00000000      0x00000000      0x00000000      0x00000000

I'm using libcurl version 7.21.6, c-ares version 1.7.4, and openssl version 1.0.0d. 我正在使用libcurl版本7.21.6，c-ares版本1.7.4和openssl版本1.0.0d。 My program is multithreaded, but I have registered mutex callbacks with openssl. 我的程序是多线程的，但是我已经在openssl中注册了互斥体回调。 The program is running on Ubuntu 11.04 desktop, 64-bit. 该程序在64位Ubuntu 11.04桌面上运行。 libc is 2.13. libc是2.13。

Answer 1

Clearly libcurl is over-reading the source buffer, and stepping into unreadable memory (page at 0x7fffe3e1f000 -- you can confirm that memory is unreadable by looking at /proc/<pid>/maps for the program being debugged). 显然， libcurl正在过度读取源缓冲区，并进入不可读的内存（页面位于0x7fffe3e1f000 –您可以通过查看/proc/<pid>/maps来调试程序来确认该内存不可读）。

Here's where it starts to get weird. 这是开始变得奇怪的地方。 I can manually inspect all 256 bytes of both 我可以手动检查两者的全部256个字节
buf and biomem->data without gdb complaining that the memory isn't accesible. buf和biomem->data而不让gdb抱怨该内存不可访问。

There is a well-known Linux kernel flaw: even for memory that has PROT_NONE (and causes SIGSEGV on attempt to read it from the process itself), attempt by GDB to ptrace(PEEK_DATA,...) succeeds. 有一个著名的Linux内核缺陷：即使对于具有PROT_NONE内存（并导致尝试从进程本身读取SIGSEGV内存）， GDB尝试ptrace(PEEK_DATA,...)成功。 That explains why you can examine 256 bytes of the source buffer in GDB, even though only 96 of them are actually accessible. 这就解释了为什么您可以检查GDB中的256个字节的源缓冲区，即使实际上只有96个字节可以访问。

Try running your program under Valgrind, chances are it will tell you that you are memcpy ing into heap-allocated buffer that is too small. 尝试运行Valgrind的在你的程序，很有可能它会告诉你，你memcpy荷兰国际集团到过小堆分配的缓冲区。

Answer 2

Do you any possibility of creating a "crumple zone"? 您是否有可能创建“皱折区”？

That is, deliberately increasing the size of the two buffers, or in the case of the structure putting an extra unused element after the destination? 也就是说，故意增加两个缓冲区的大小，还是在结构中将额外的未使用元素放在目标后面？

You then seed the source crumple with something such as "0xDEADBEEF", and the destination with som with something nice. 然后，使用诸如“ 0xDEADBEEF”之类的东西来使源崩溃，而使用som的目标来注入目标。 If the destination every changes you've got something to work with. 如果目的地每一个变化，您都可以使用。

256 is a bit suggestive, any possibility it could somehow be being treated as signed quantity, becoming -1, and hence very big? 256有点暗示，是否有可能以某种方式将其视为有符号数量，变成-1，因此非常大？ Can't see how gdb wouldn't show it, but ... 无法看到gdb不会显示它，但是...

具有有效指针的Memcpy segfaulting

问题描述

2 个解决方案

解决方案1
4 2011-07-31 04:23:32

解决方案2
1 2011-07-28 15:56:25

具有有效指针的Memcpy segfaulting

问题描述

2 个解决方案

解决方案1 4 2011-07-31 04:23:32

解决方案2 1 2011-07-28 15:56:25

解决方案1
4 2011-07-31 04:23:32

解决方案2
1 2011-07-28 15:56:25