如何查找导致“ malloc（）：内存损坏：0x00”的行

Question

I have a project that reads data from ethernet port and runs a set of algorithms on it.The program runs fine for a couple of hours and then produces the below shown error. 我有一个从以太网端口读取数据并在其上运行一组算法的项目。该程序可以正常运行几个小时，然后产生以下所示的错误。

Could some suggest how to debug, find the line thats causing error ?? 可以建议如何调试，查找导致错误的行吗？

   *** Error in `objs/x64Linux3gcc5.4.0/lidarToBoxes': malloc(): memory corruption: 0x00000000051fc640 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f230dc167e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7f230dc2113e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f230dc23184]
/usr/lib/nvidia-384/tls/libnvidia-tls.so.384.111(+0x24c0)[0x7f2304e6e4c0]

======= Memory map: ========
00400000-00dc6000 r-xp 00000000 08:03 38407960                           /home/sai/sai_workspace/avt_17_003/modules/lidarToBoxes/objs/x64Linux3gcc5.4.0/lidarToBoxes
00fc5000-00fcf000 r--p 009c5000 08:03 38407960                           /home/sai/sai_workspace/avt_17_003/modules/lidarToBoxes/objs/x64Linux3gcc5.4.0/lidarToBoxes
00fcf000-00fd5000 rw-p 009cf000 08:03 38407960                           /home/sai/sai_workspace/avt_17_003/modules/lidarToBoxes/objs/x64Linux3gcc5.4.0/lidarToBoxes
00fd5000-00ff0000 rw-p 00000000 00:00 0 
0220b000-0614a000 rw-p 00000000 00:00 0     
[heap]

7f22d0000000-7f22d0022000 rw-p 00000000 00:00 0 
7f22d0022000-7f22d4000000 ---p 00000000 00:00 0 
7f22d4000000-7f22d4021000 rw-p 00000000 00:00 0 
7f22d4021000-7f22d8000000 ---p 00000000 00:00 0 
7f22d8000000-7f22d8021000 rw-p 00000000 00:00 0 
7f22d8021000-7f22dc000000 ---p 00000000 00:00 0 
7f22dc000000-7f22dc07c000 rw-p 00000000 00:00 0 
7f22dc07c000-7f22e0000000 ---p 00000000 00:00 0 
7f22e0000000-7f22e0021000 rw-p 00000000 00:00 0 
7f22e0021000-7f22e4000000 ---p 00000000 00:00 0 
7f22e6ffe000-7f22e6fff000 ---p 00000000 00:00 0 
7f22e6fff000-7f22e77ff000 rwxp 00000000 00:00 0 
7f22e8000000-7f22e8021000 rw-p 00000000 00:00 0 
7f22e8021000-7f22ec000000 ---p 00000000 00:00 0 
7f22eeffe000-7f22eefff000 ---p 00000000 00:00 0 
7f22eefff000-7f22ef7ff000 rwxp 00000000 00:00 0 
7f22ef7ff000-7f22ef800000 ---p 00000000 00:00 0 
7f22ef800000-7f22f0000000 rwxp 00000000 00:00 0 
7f22f0000000-7f22f00a6000 rw-p 00000000 00:00 0 
7f22f00a6000-7f22f4000000 ---p 00000000 00:00 0 
7f22f4000000-7f22f4021000 rw-p 00000000 00:00 0 
7f22f4021000-7f22f8000000 ---p 00000000 00:00 0 
7f22f8000000-7f22f8021000 rw-p 00000000 00:00 0 
7f22f8021000-7f22fc000000 ---p 00000000 00:00 0 
7f22fc093000-7f22fc291000 rw-p 00000000 00:00 0 
7f22fc291000-7f22fc491000 rw-s 00000000 00:09 323133    
socket:[323133]

Thank you! 谢谢！

Answer 1

Based on this first message, the list of unsorted chunks for one particular arena has become corrupted: 根据此第一条消息，一个特定竞技场的未排序块列表已损坏：

* Error in `objs/x64Linux3gcc5.4.0/lidarToBoxes': malloc(): memory corruption: 0x00000000051fc640 * * objs / x64Linux3gcc5.4.0 / lidarToBoxes中的错误：malloc（）：内存损坏：0x00000000051fc640 *

You can see that by the fact that it says "malloc(): memory corruption" and not, for example "malloc(): memory corruption (fast)". 您可以通过以下事实看到它：“ malloc（）：内存损坏”，而不是，例如“ malloc（）：内存损坏（快速）”。 The values at the end of the message vary between releases of glibc but in your particular case what you are seeing is the value of "victim" from code something like this: 消息末尾的值在glibc的发行版之间有所不同，但是在您的特定情况下，您看到的是类似这样的代码中的“ victim”值：

while ( (victim = unsorted_chunks(av)->bk) != unsorted_chunks(av)) {
  bck = victim->bk;
  if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0)
      || __builtin_expect (victim->size > av->system_mem, 0))
    malloc_printerr (check_action, "malloc(): memory corruption",
                     chunk2mem (victim));
  size = chunksize(victim);

In my particular case I grabbed this from glibc-2.18/malloc/malloc.c because, based on the fact that you had one number after the message, your version of glibc seemed near 5.18 but that was just a guess. 在我的特殊情况下，我从glibc-2.18 / malloc / malloc.c中获取了该信息，因为基于消息后您只有一个数字的事实，您的glibc版本似乎接近5.18，但这只是一个猜测。 Your backtrace specifies "/lib/x86_64-linux-gnu/libc.so.6" as the library, which is a bit vague, but if you wanted to find more specific information one way to do so would be to do something like this: 回溯将“ /lib/x86_64-linux-gnu/libc.so.6”指定为库，这有点含糊，但是如果您想找到更具体的信息，一种方法是做这样的事情：

ls -l /lib/x86_64-linux-gnu/libc.so.6 ls -l /lib/x86_64-linux-gnu/libc.so.6

The output would likely show you that your path is a symbolic link, and the target of that link would be more informative. 输出结果可能会显示路径是符号链接，而该链接的目标将提供更多信息。 In this case I don't think you really need to know the exact version but having it would allow you to download matching glibc source if you see an error message from libc malloc and want to understand what it means. 在这种情况下，我认为您确实不需要知道确切的版本，但是拥有该版本将允许您下载匹配的glibc源代码，前提是您看到来自libc malloc的错误消息并希望了解其含义。

So back to what that code shows you is that "victim" is set to point to the last entry of the doubly linked list for the arena recognized by av. 因此，回到该代码向您显示的是，“受害者”被设置为指向由AV识别的竞技场的双向链接列表的最后一个条目。 It also shows you that the value at the end of the line is from "chunk2mem(victim)". 它还显示行尾的值来自“ chunk2mem（victim）”。 In your case, with a 64 bit process, the macro chunk2mem is adding 16, so you can reconstruct the value of victim as 0x00000000051fc640-16 = 0x00000000051fc630. 在您的情况下，使用64位进程，宏chunk2mem将添加16，因此您可以将受害者的值重构为0x00000000051fc640-16 = 0x00000000051fc630。

You can look at what victim has by doing: 您可以通过执行以下操作来查看受害者的状况：

x/4gx 0x00000000051fc630 x / 4gx 0x00000000051fc630

The second value shown will be the value of victim->size. 显示的第二个值是受害者->大小的值。

If you happen to have a core dump, you can probably use the free open source tool https://github.com/vmware/chap to gather more information because chap often detects such corruption at startup. 如果碰巧有核心转储，则可能可以使用免费的开源工具https://github.com/vmware/chap收集更多信息，因为chap经常在启动时检测到这种损坏。 To start it, use: 要启动它，请使用：

chap core-file-path 核心文件路径

Given the likelihood that the size field is corrupted, it may also be helpful to understand how the adjacent allocation just prior to the one listed as 0x00000000051fc640 was being used. 给定size字段损坏的可能性，了解如何使用紧接在列为0x00000000051fc640的分配之前的相邻分配也可能会有帮助。 Possibly the corruption was due to a buffer overrun on that previous allocation. 损坏的原因可能是由于先前分配的缓冲区溢出。 To see the contents of the previous allocation, type show allocation 51fc630 from the chap prompt. 要查看先前分配的内容，请在chap提示符下键入show distribution 51fc630 。 If chap tells you that the given address is not part of an allocation, use describe 51fc630 from the chap prompt to get an idea of what that allocation might be. 如果chap告诉您给定的地址不是分配的一部分，请在chap提示符下使用describe 51fc630来了解该分配可能是什么。

Answer 2

Before ruining the valgrind compile your program with adding -ggdb3 debug flag 在破坏valgrind之前，请添加-ggdb3调试标志来编译程序

gcc -o executable -std=c11 -Wall -ggdb3 main.c

To run valgrind, pass the executable as an argument 要运行valgrind，请将可执行文件作为参数传递

valgrind --leak-check=full \
         --show-leak-kinds=all \
         --track-origins=yes \
         --verbose \
         --log-file=valgrind-out.txt \
         ./executable

Answer 3

我使用valgrind来检测许多内存管理错误。

如何查找导致“ malloc（）：内存损坏：0x00”的行

问题描述

3 个解决方案

解决方案1
1 2018-06-14 10:58:50

解决方案2
0 2018-03-23 06:33:02

解决方案3
-2 2018-03-23 06:20:55

如何查找导致“ malloc（）：内存损坏：0x00”的行

问题描述

3 个解决方案

解决方案1 1 2018-06-14 10:58:50

解决方案2 0 2018-03-23 06:33:02

解决方案3 -2 2018-03-23 06:20:55

解决方案1
1 2018-06-14 10:58:50

解决方案2
0 2018-03-23 06:33:02

解决方案3
-2 2018-03-23 06:20:55