简体   繁体   English

为什么在 malloc() memory 未被释放后地址消毒器不指示 memory 泄漏?

[英]Why is address sanitizer not indicating a memory leak after malloc() memory was not freed?

(I did not write this code, my professor did...) I was looking at some code that my professor wrote and it all made sense to me, except for one thing. (我没有写这段代码,是我的教授写的……)我正在看我的教授写的一些代码,除了一件事之外,这对我来说都很有意义。 (Because we were running out of time, he did not bother to free any of the memory), however, he was compiling with address sanitizer on. (因为我们快没时间了,他没有费心去释放任何内存),但是,他在地址清理器打开的情况下进行编译。 But when he ran the code, no address sanitizer error warning was shown?但是当他运行代码时,没有显示地址消毒器错误警告?

We were running gcc 9.3 on an Ubuntu machine.我们在 Ubuntu 机器上运行gcc 9.3 When I comment out the add_line function, it throws leaks, only for crnt .当我注释掉add_line function 时,它会引发泄漏,仅针对crnt I guess lines does not throw a memory leak because it was declared in the global space?我猜lines不会引发 memory 泄漏,因为它是在全局空间中声明的? But why doesn't crnt throw a memory leak when the add_line function is called?但是为什么在add_line crnt抛出 memory 泄漏?

(Also, here are the compile flags that are used. -g -std=c99 -Wall -Wvla -fsanitize=address,undefined ) (此外,这里是使用的编译标志。 -g -std=c99 -Wall -Wvla -fsanitize=address,undefined

Here is the code:这是代码:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>

#define DEBUG 1

#define BUFSIZE 8
#define LISTLEN 16

char **lines;
int line_count, line_array_size;

void add_line(char *p)
{
    if (DEBUG) printf("Adding |%s|\n", p);
    if (line_count == line_array_size) {
    line_array_size *= 2;
    lines = realloc(lines, line_array_size * sizeof(char *));
    // TODO: check whether lines is NULL
    }

    lines[line_count] = p;
    line_count++;
}

int main(int argc, char **argv)
{
    int fd, bytes;
    char buf[BUFSIZE];
    char *crnt;
    int len;
    int pos, start;

    // TODO: move array list management to separate functions
    lines = malloc(sizeof(char *) * LISTLEN);
    if (!lines) {
    printf("malloc failed\n");
    return EXIT_FAILURE;
    }

    line_array_size = LISTLEN;
    line_count = 0;

    if (argc > 1) {
    fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
        perror(argv[1]);
        return EXIT_FAILURE;
    }
    } else {
    fd = 0;
    }

    crnt = NULL;
    len = 0;
    while ((bytes = read(fd, buf, BUFSIZE)) > 0) {
    // read buffer and break file into lines

    start = 0;
    for (pos = 0; pos < bytes; pos++) {
        if (buf[pos] == '\n') {
        if (crnt == NULL) {
            len = pos - start;
            crnt = malloc(len + 1);
            memcpy(crnt, &buf[start], len);
        } else {
            len += pos;
            crnt = realloc(crnt, len + 1);
            memcpy(&crnt[len - pos], buf, pos);
        }
        crnt[len] = '\0';
        // add_line(crnt); <------------- When I uncomment this line, no address-sanitizer leak is detected. With this line commented, asan does throw a leak only for the crnt variable. Why is that?
        crnt = NULL;
        start = pos + 1;
        }
    }

    if (start < pos) {
        if (crnt == NULL) {
        len = pos - start;
        crnt = malloc(len + 1);
        memcpy(crnt, &buf[start], len);
        } else {
        int newlen = len + (pos - start);
        crnt = realloc(crnt, newlen + 1);
        memcpy(&crnt[len], &buf[start], pos - start);
        len = newlen;
        }
        crnt[len] = '\0';  // technically unnecessary
    }
    }
    if (bytes == -1) {
    perror("read");
    return EXIT_FAILURE;
    }

    // if we reach here, we have read the entire file
    // sort and print the list
    

    return 0;
}

The issue here is the definition of "memory leak".这里的问题是“内存泄漏”的定义。 I would have liked to have quoted a section in LeakSanitizer's documentation where it offers a clear and precise definition of the concept, which seems fundamental to its operation, but I couldn't find one, so you'll have to bear with a bit of projection on my part.我本来想引用 LeakSanitizer 文档中的一个部分,其中提供了对该概念的清晰和精确的定义,这似乎是其操作的基础,但我找不到,所以你必须忍受一点我的投影。

A region of dynamically allocated (ie with malloc or friends) memory has leaked when there is no possible way for it to be free d.动态分配的区域(即与malloc或朋友一起)memory 在没有可能free它时泄漏 d。 In other words, if your program allocates memory and throws away the address before the allocation has been free'd, the memory has leaked.换句话说,如果您的程序分配了 memory 并在分配释放之前丢弃了地址,则 memory 已经泄漏。

That's subtly different from what you might think the definition is.这与您可能认为的定义略有不同。 You might think that memory has leaked if your program terminates without freeing every block of memory it allocated.如果您的程序终止时没有释放它分配的 memory 的每个块,您可能会认为 memory 已经泄漏。 That's certainly a possible definition, and I'm not going to criticise it (much), but it's actually not very precise.这当然是一个可能的定义,我不打算批评它(太多),但它实际上不是很精确。

At what point has the program terminated?程序在什么时候终止? It hasn't really terminated when main() returns, because you might still have clean-up functions registered with atexit() , and those functions don't execute until after main() returns.main()返回时它并没有真正终止,因为您可能仍然有使用atexit()注册的清理函数,并且这些函数直到main()返回后才执行。 (Or when exit() is called, which is effectively the same thing.) It's actually pretty common (though, to my mind, pointless) to use atexit() functions precisely in order to free() objects which might not have been deallocated before exit() . (或者当exit()被调用时,这实际上是一回事。)实际上很常见(尽管在我看来,这是毫无意义的)精确地使用atexit()函数来free()可能没有被释放的对象在exit()之前。

OK, you can't check whether a memory allocation has been freed by checking whether it has been free d when main returns.好的,您不能通过在main返回时检查它是否已被free来检查 memory 分配是否已被释放。 If you want to do it that way, you need to defer the test until really the last possible moment.如果你想那样做,你需要将测试推迟到最后可能的时刻。 But at what's really the last possible moment, the process is about to cease to exist and the operating system is going to reclaim all the memory used by the process, including whatever resources were acquired by the memory allocation library.但在真正可能的最后时刻,该进程将不复存在,操作系统将回收该进程使用的所有 memory,包括 memory 分配库获取的所有资源。 So at the last possible moment, there is no memory leak, because there is no memory.所以在最后一刻,没有 memory 泄漏,因为没有 memory。

(There are embedded systems which have no concept of separate processes memory, etc., and so what I wrote up there might not apply to every possible computation system . But it applies to everything on which AddressSanitizer is implemented.) (有些嵌入式系统没有单独进程的概念 memory 等,所以我在那里写的内容可能不适用于所有可能的计算系统。但它适用于实现 AddressSanitizer 的所有东西。)

A key point is that the atexit() handler needs to be able to find the objects it is cleaning up, and since it executes after main() has terminated, it cannot use any automatic (ie stack-allocated) object. Only objects with static lifetime are available to it.一个关键点是atexit()处理程序需要能够找到它正在清理的对象,并且由于它在main()终止后执行,它不能使用任何自动(即堆栈分配的)object。只有具有它可以使用 static 生命周期。 So for it to be able to do its task, the address of the object to be cleaned up on termination must be stored in global memory. If the region's memory is not stored somewhere persistent, the memory has leaked (as per my definition above) and we don't actually have to wait to see whether an atexit manages to free the memory.因此,为了能够完成其任务,终止时要清理的 object 的地址必须存储在全局 memory 中。如果该区域的 memory 没有存储在某个持久的地方,则 memory 已经泄漏(根据我上面的定义)我们实际上不必等待atexit是否设法free memory。

Which brings us back to what I claim is a workable definition of a memory leak: dynamically allocated memory whose address is no longer present in the executable.这让我们回到我所说的 memory 泄漏的可行定义:动态分配的 memory 其地址不再存在于可执行文件中。 That memory region can no longer be used, so it's garbage, but it cannot be free d because there the program doesn't know what its address is.那个 memory 区域不能再使用,所以它是垃圾,但它不能被free ,因为程序不知道它的地址是什么。

Your lines array is a global variable.您的lines数组是一个全局变量。 Indeed, you point that out in your question:确实,您在问题中指出了这一点:

I guess lines does not throw a memory leak because it was declared in the global space?我猜lines不会引发 memory 泄漏,因为它是在全局空间中声明的?

That's correct.这是正确的。 lines is a global variable, so its contents are still accessible even after main() returns. lines是一个全局变量,因此即使在main()返回后它的内容仍然可以访问。 Not only are its contents accessible, so is any memory pointed to by some object in the array it points to.不仅它的内容是可访问的,它指向的数组中的某些 object 指向的任何 memory 也是如此。 You could, if you wanted to, free the saved lines in an atexit handler:如果你愿意,你可以在atexit处理程序中free保存的行:

void cleanup(void) {
  for (int i = 0; i < line_count; ++i) { free(lines[i]); }
  free(lines);
}

(To use that, you only need to call atexit(cleanup) just after you initialise lines and line_count .) (要使用它,您只需要在初始化linesline_count之后调用atexit(cleanup) 。)

So that brings us to:因此,这将我们带到:

But why doesn't crnt throw a memory leak when the add_line function is called?但是为什么在add_line crnt抛出 memory 泄漏?

crnt contains the address of a dynamically-allocated buffer which contains the current line. crnt包含包含当前行的动态分配缓冲区的地址。 If you call add_line(crnt) , that pointer is stored in lines .如果您调用add_line(crnt) ,该指针将存储在lines中。 So it's available for the clean-up function, as above.因此它可用于清理 function,如上所述。 You can set crnt to NULL at your convenience, because it is no longer the only pointer to that buffer.您可以在方便时将crnt设置为NULL ,因为它不再是指向该缓冲区的唯一指针。

But if you don't call add_line , then crnt is the only pointer to that buffer and when you set crnt to NULL , there is no longer a pointer to the buffer.但是,如果您不调用add_line ,则crnt指向该缓冲区的唯一指针,当您将crnt设置为NULL时,不再有指向该缓冲区的指针。 The buffer has leaked and AddressSanitizer is there to tell you about it.缓冲区已泄漏,AddressSanitizer 会告诉您这件事。 (AddressSanitizer would have caught the problem even if you hadn't set crnt to NULL , because crnt ceases to exist when main() returns or calls exit() , and at that point the address has been lost. Or if you overwrite crnt with a different allocation's address.) (即使您没有将crnt设置为NULL AddressSanitizer 也会发现问题,因为当main()返回或调用exit()crnt不复存在,此时地址已经丢失。或者如果您用 覆盖crnt不同的分配地址。)

For a much simpler example, try these two very similar programs:举一个更简单的例子,试试这两个非常相似的程序:

Memory leak Memory 泄漏

#include <stdlib.h>
int main(void) {
  void* megabyte = malloc(1<<20);
  (void)megabyte; /* Suppress unused variable warning */
}

No memory leak没有 memory 泄漏

#include <stdlib.h>
void* megabyte;
int main(void) {
  megabyte = malloc(1<<20);
}

Note that the Valgrind memcheck tool can report on memory, like megabyte in the second example, which is never free d even though it is still reachable at what Valgrind considers the end of execution.请注意,Valgrind memcheck 工具可以在 memory 上报告,就像第二个示例中的megabyte一样,即使在 Valgrind 认为执行结束时它仍然可以访问,它也永远不会被free But it doesn't do so by default.但默认情况下它不会这样做。 If you run Valgrind on the second program with the flags --show-leak-kinds=all --leak-check=full , it will report that a megabyte of memory is "still reachable".如果您在带有标志--show-leak-kinds=all --leak-check=full的第二个程序上运行 Valgrind,它将报告 memory 兆字节“仍然可以访问”。 (To try valgrind, you have to compile the program without AddressSanitizer, I believe. The two tools are not completely compatible.) (要尝试 valgrind,我相信你必须在没有 AddressSanitizer 的情况下编译程序。这两个工具并不完全兼容。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM