简体   繁体   English

C ++程序在不同的机器上显示出非常不同的内存行为

[英]c++ program shows very different memory behaviour on different machines

I have written a computer simulation in C++ that needs a lot of memory. 我用C ++编写了一个计算机模拟,需要大量内存。 It runs in iterations, and in each iteration allocates a large amount of memory that should be freed at the end of the iteration. 它在迭代中运行,并且在每个迭代中分配大量的内存,这些内存应在迭代结束时释放。 It also uses c++11's implementation of <thread> to run stuff in parallel. 它还使用c ++ 11的<thread>来并行运行东西。

When I test the program on my desktop machine, it behaves fine: It never exceeds the memory I allow it and during time and iterations, nothing stacks up. 当我在台式机上测试该程序时,它的运行情况很好:它永远不会超出我允许的内存,并且在时间和迭代过程中,不会堆积任何东西。 When I submit the program to our computation cluster, however, the used memory (to which I have access only through the queuing software) grows with time and by far exceeds the memory used on my machine. 但是,当我将程序提交到计算集群时,使用的内存(我只能通过排队软件访问)随时间增长,并且远远超出了我的计算机上使用的内存。

Let me first show you very roughly how the software is structured: 首先让我大致向您展示软件的结构:

for thread in n_threads:
    vector<Object> container;
    for iteration in simulation:
        container.clear()
        container.resize(a_large_number)
        ... do stuff ...

Let's say, on my machine the container eats up 2GB of memory. 假设,在我的机器上,容器占用了2GB的内存。 I can see both in htop and in valgrind --tool=massif that these 2GB are never exceeded. 我可以在htopvalgrind --tool=massif看到这两个2GB从未被超出。 Nothing piles up. 什么都没有堆积。 On the cluster, however, I can see the memory grow and grow, until it becomes much more than the 2GB (and the jobs are killed/the computation node freezes...). 但是,在群集上,我可以看到内存在不断增长,直到它变得超过2GB (并且作业被杀死/计算节点冻结...)为止。 Note, that I limit the numbers of threads on both machines and can be sure that they are equal. 请注意,我限制了两台计算机上的线程数,并且可以确保它们相等。

What I do know, is that the libc on the cluster is very old. 我所知道的是,群集上的libc很旧。 To compile my program, I needed to compile a new version of g++ and update the libc on the front node of the cluster. 要编译程序,我需要编译g++的新版本并在集群的前端节点上更新libc The software does run fine on the computation nodes (except for this memory issue), but the libc is much older there. 该软件在计算节点上运行正常(除了此内存问题),但是libc在该节点上要早得多。 Could this be an issue, especially together with threading, for memory allocation? 这可能是一个问题,尤其是与线程分配内存有关吗? How could I investigate that? 我该如何调查?

Yes, depending on how old the GNU libc is, you might be missing some important memory allocation optimizations. 是的,取决于GNU libc的年代,您可能会缺少一些重要的内存分配优化。 Here are some things to try out (needless to say, risking performance penalties): 以下是一些可以尝试的事情(不用说,会有性能损失的风险):

  1. You can try tweaking the malloc/free behaviour through mallopt() ; 您可以尝试通过mallopt()调整malloc / free行为; use the M_MMAP_MAX and M_MMAP_THRESHOLD options to encourage more allocations to go through mmap() , this way the memory is guaranteed to be returned to the system after free() . 使用M_MMAP_MAXM_MMAP_THRESHOLD选项来鼓励更多分配通过mmap() ,这样可以保证在free()之后将内存返回给系统。

  2. Try making your container's allocator be __gnu_cxx::malloc_allocator , to ensure the mallopt() tweaks affect the container. 尝试将容器的分配器设置为__gnu_cxx::malloc_allocator ,以确保mallopt()调整会影响容器。

  3. Try calling container.shrink_to_fit() after the resize, to make sure the vector is not withholding more memory than strictly needed. 尝试在调整大小后调用container.shrink_to_fit() ,以确保该向量没有保留超出严格需要的内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM