简体   繁体   中英

Unexpected memory leak in multithread program

I'm working on a program using a large number of threads, each thread allocating in heap a few megabytes of memory. When these threads end, a large part of the RAM is kept by the program.

Here is an example of code, allocating and freeing 1 MB in 500 threads, which shows this problem:

#include <future>
#include <iostream>
#include <vector>

// filling a 1 MB array with 0
void task() {
    const size_t S = 1000000;
    int * tab = new int[S];
    std::fill(tab, tab + S, 0);
    delete[] tab;
}

int main() {
    std::vector<std::future<void>> threads;
    const size_t N = 500;

    std::this_thread::sleep_for(std::chrono::seconds(5));
    std::cout << "Starting threads" << std::endl;

    for (size_t i = 0 ; i < N ; ++i) {
        threads.push_back(std::async(std::launch::async, [=]() { return task(); }));
    }

    for (size_t i = 0 ; i < N ; ++i) {
        threads[i].get();
    }

    std::cout << "Threads ended" << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(25));

    return 0;
}

On my computer, this code, simply built with g++ -o exe main.cpp -lpthread , uses 1976 kB before the message "Starting threads", and 419 MB after the message "Threads ended". These values are just examples: when I run the program multiple times, I can get different values.

I have tried valgrind / memcheck, but it doesn't detect any leak.

I have noticed that locking the "std::fill" operation with a mutex seems to solve this issue (or largely reduce it), but I don't think this is a race condition problem, as there is no shared memory here. I guess the mutex simply creates an execution order between the threads which avoid (or reduce) the conditions in which the memory leaks.

I am using Ubuntu 18.04, with GCC 7.4.0.

Thank you for your help.

Aurélien

There is no memory leak at all, as Valgrind/memcheck already confirmed to you.

[...] uses 1976 kB before the message "Starting threads", and 419 MB after the message "Threads ended".

Two things:

  • At the beginning, your vector is empty.
  • At the end, your vector contains 500 std::future<void> objects.

This is why your memory consumption increased. Everything has a cost, you cannot store something in memory for free.
Consequently, your program behaves as expected.


By the way, you don't need to use a lambda, you could pass your function directly :)

Edit: For completeness, you should read the @Marek R's answer which mention another side of the topic which is that memory released by the program (threads, dynamically allocated, ...) may not be immediately returned to the OS.


Edit2:

Concerning your point about the reduced memory consumption when you use a mutex. The thing is that the mutex forces all of your threads to be executed sequentially (one at a time).

Knowing this, I guess the compiler may be able to optimize it by using only one thread and reuse it 500 times.
Since creating a thread has a cost (any thread copies the stack for example), creating one thread instead of 500 can significantly reduce your memory consumption.

The whole mystery is hidden in standard library which is responsible for managing memory. Mutithreading has impact on memory consumption only because each tread needs quite a lot of memory (for some reasons most beginners do not remember about that).

When you call delete (or free in C) it doesn't mean that memory returns to the system. It only means that standard library marks this piece of memory as not needed any more.

Now since requesting or releasing memory from/to the system is quite expensive and can be done in huge chunks (page size is 8-32 kB depending on hardware), standard library tries optimize that and doesn't return all memory back to the system immediately. It assumes that program may need this memory again soon.

So memory consumed by process is not a good number indicating of memory leak. Only when process running for a longer time, stays in the same state and continuously gains memory, then you can suspect that program leaks memory.
In all other cases you should relay on tools like valgrind (I recommend to use address sanitizer).

There is also other optimization which has impact on what you are seeing. Spawning thread is costly, so when thread completes its job, it is not destroyed completely. It is kept in a "thread pool" for future reuse.

I will assume you don't have 500 cores, so some of the threads will not run at the same time, some of the threads will finish before the last starts which is why you don't get to use

S * sizeof(int) * n = 1000000 * 4 * 500 = 2000000000 (~2GB)

what happen is that you at most allocate ~419 MB, the freed memory from the first are then reused for the last threads.

And the program doesn't return its max used memory to the OS before its quit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM