简体   繁体   English

C ++ MPI程序中可能发生内存泄漏?

[英]Possible Memory Leak in C++ MPI Program?

I'm writing some C++ MPI code for a Parallel Computing class. 我正在为并行计算类编写一些C ++ MPI代码。 My code works, and I've turned the assignment in but the code is using a lot more memory that I anticipated. 我的代码可以工作,我已经完成了分配,但是代码占用了我预期的更多内存。 As I increase the number of processors the memory requirements per node are growing rapidly. 随着处理器数量的增加,每个节点的内存需求也在迅速增长。 This is the first real C/C++ or MPI program I've ever had to write, so I think I have a memory leak of some kind somewhere. 这是我必须编写的第一个真正的C / C ++或MPI程序,因此我认为我在某处存在某种内存泄漏。 Can someone take a look at this code and tell me where? 有人可以看一下这段代码,然后告诉我在哪里吗? Whenever I create a variable using new, I delete it, so I'm not sure what else I should be looking for. 每当我使用new创建变量时,都会将其删除,因此我不确定我还要寻找什么。 I suppose some of the problem could come from the objects that I'm creating, but should the destructors for these objects be called at the end of their scope to free any memory that they have allocated on the heap? 我想一些问题可能来自我正在创建的对象,但是这些对象的析构函数应该在它们作用域的末尾调用以释放它们在堆上分配的任何内存吗? I come from a heavy java background and most of my C/C++ is self taught so doing my own memory management is difficult to wrap my head around. 我来自Java丰富的背景,我的大多数C / C ++都是自学的,因此执行自己的内存管理非常困难。

The problem is very simple. 问题很简单。 I have a matrix (stored as a single dimensional vector) of size MSIZE * MSIZE . 我有一个大小为MSIZE * MSIZE的矩阵(存储为一维向量)。 Each processor is responsible for some contiguous block of data. 每个处理器负责一些连续的数据块。 Then I run 500 iterations where each non-edge element A[r][c] is set to the maximum of A[r][c], A[r+1][c], A[r-1][c], A[r][c+1], A[r-1][c-1] . 然后,我运行500次迭代,其中每个非边缘元素A[r][c]设置为A[r][c], A[r+1][c], A[r-1][c], A[r][c+1], A[r-1][c-1] The new value of A[r][c] is not stored until the entire update process for that iterations has finished. 在该迭代的整个更新过程完成之前,不会存储A[r][c]的新值。 Processors have to communicate values that are on the boundaries to other processors. 处理器必须将边界上的值传达给其他处理器。

Here's my code (I think the problem is occurring somewhere in here, but if you want to see the rest of the code (mostly helper & initialization functions) let me know and I'll post it): 这是我的代码(我认为问题出在这里的某个地方,但是如果您想查看其余代码(主要是帮助程序和初始化函数),请告诉我,我将其发布):

#include <math.h> 
#include "mpi.h" 
#include <iostream>
#include <float.h>
#include <math.h>
#include <assert.h>
#include <algorithm>
#include <map>
#include <vector>
#include <set>
using namespace std;

#define MSIZE 4000
#define TOTAL_SIZE (MSIZE * MSIZE)
#define NUM_ITERATIONS 500

int myRank;
int numProcs;
int start, end;
int numIncomingMessages;

double startTime;

vector<double> a;

map<int, set<int> > neighborsToNotify;


/*
 * Send the indices that have other processors depending on them to those processors.
 * Once the messages have been sent, receive messages until we've received all the messages
 * we are expecting to receive.
 */
void doCommunication(){
    int messagesReceived = 0;
    map<int, set<int> >::iterator iter;
    for(iter = neighborsToNotify.begin(); iter != neighborsToNotify.end(); iter++){
        int destination = iter->first;
        set<int> indices = iter->second;

        set<int>::iterator setIter;
        for(setIter = indices.begin(); setIter != indices.end(); setIter++){
            double val = a.at(*setIter);
            MPI_Bsend(&val, 1, MPI_DOUBLE, destination, *setIter, MPI_COMM_WORLD);
        }

        MPI_Status s;
        int flag;
        MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, &s);
        while(flag){
            double message;
            MPI_Recv(&message, 1, MPI_DOUBLE, s.MPI_SOURCE, s.MPI_TAG, MPI_COMM_WORLD, &s);
            a.at(s.MPI_TAG) = message;
            messagesReceived++;
            MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, &s);
        }

    }

    while(messagesReceived < numIncomingMessages){
        MPI_Status s;
        MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &s);
        double message;
        MPI_Recv(&message, 1, MPI_DOUBLE, s.MPI_SOURCE, s.MPI_TAG, MPI_COMM_WORLD, &s);
        a.at(s.MPI_TAG) = message;
        messagesReceived++;
    }
}

/*
 * Perform one timestep of iteration.
 */
void doIteration(){
    int pos;
    vector<double> temp;
    temp.assign(end - start + 1, 0);
    for(pos = start; pos <= end; pos++){
        int i;
        double max;

        if(isEdgeNode(pos))
            continue;

        int dependents[4];
        getDependentsOfPosition(pos, dependents);

        max = a.at(pos);

        for(i = 0; i < 4; i++){
            if(isInvalidPos(dependents[i]))
                continue;

            max = std::max(max, a.at(dependents[i]));
        }

        temp.at(pos - start) = max;
    }

    for(pos = start; pos <= end; pos++){
        if(! isEdgeNode(pos)){
            a.at(pos) = temp.at(pos - start);
        }
    }
}

/*
 * Compute the checksum for this processor
 */
double computeCheck(){
    int pos;
    double sum = 0;
    for(pos = start; pos <= end; pos++){
        sum += a.at(pos) * a.at(pos);
    }
    return sum;
}

int main(int argc, char *argv[]) {
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);

    findStartAndEndPositions();

    initializeArray();

    findDependentElements();

    MPI_Barrier(MPI_COMM_WORLD);

    if(myRank == 0){
        startTime = MPI_Wtime();
    }

    int i;
    for(i = 0; i < NUM_ITERATIONS; i++){
        if(myRank == 0)
            cout << ".";
        doCommunication();
        MPI_Barrier(MPI_COMM_WORLD);
        doIteration();
    }

    double check = computeCheck();
    double receive = 0;

    MPI_Reduce(&check, &receive, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

    if(myRank == 0){
        cout << "n = " << MSIZE << " and p = " << numProcs << "\n";
        cout << "The total time was: " << MPI_Wtime() - startTime << " seconds \n";
        cout << "The checksum was: " << receive << " \n";
    }

    MPI_Finalize();
    return 0;
}

I do not think that you have a memory leak. 我认为您没有内存泄漏。 But you can test this with valgrind. 但是您可以使用valgrind进行测试。 Be aware that the output looks terrifying. 请注意,输出看起来很可怕。

 mpirun -n8 valgrind ./yourProgram

I think the reason is MPI. 我认为原因是MPI。 You use buffered send, so each node will generate an own buffer, the more nodes you have the more buffer will be generated. 您使用缓冲发送,因此每个节点将生成一个自己的缓冲区,您拥有的节点越多,缓冲区就会越多。 To make sure that your algorithm scales in relation to memory use unbuffered send (only for debugging purposes, as it will kill your speedup). 为了确保您的算法相对于内存可扩展,请使用无缓冲发送(仅用于调试目的,因为这会降低您的加速速度)。 Alternatively try to increase the matrix, at the moment you are using only 112 MB, that not really a problem to parallelize. 或者,尝试仅增加112 MB的矩阵,这实际上并不是并行化的问题。 Try to find some size so that the nearly all of the memory of one node is used. 尝试找到一些大小,以便使用一个节点的几乎所有内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM