简体   繁体   English

卡恩算法的并行版本

[英]A parallel version of Kahn's algorithm

I am trying to make a parallelised version of Kahn's Algorithm using OpenMP.我正在尝试使用 OpenMP 制作卡恩算法的并行版本。 Because i'm quite new to OpenMP i don't know if i made the parallelisation correctly.因为我对 OpenMP 很陌生,所以我不知道我是否正确地进行了并行化。 The pseudocode from which i drew inspiration is the bellow我从中获得灵感的伪代码如下

L ← Empty list that will contain the sorted elements 
S ← Set of all nodes with no incoming edge 

while S is non-empty do     
    remove a node n from S     
    add n to tail of L
    for each node m with an edge e from n to m do
         remove edge e from the graph
         if m has no other incoming edges then
             insert m into S 

if graph has edges then
    return error   (graph has at least one cycle) 
else      
    return L       (a topologically sorted order) 

My problem is that my serial version, which is the same as bellow, with the exception of the pragma commands and the thread related commands, is faster than my parallel version.我的问题是我的串行版本与下面的相同,除了编译指示命令和线程相关命令外,比我的并行版本快。 I measure the time using the gettimeofday() function.我使用 gettimeofday() function 测量时间。 Is there something wrong with my code?我的代码有问题吗? I also compiled the code using -O0 and -O3 in both cases.在这两种情况下,我还使用 -O0 和 -O3 编译了代码。

void TopSort(int rows, int columns, int matrix[][columns], list graph[], list S[], int L[]){
    int i, j, sum_S = 1, count = 0, k = 1;
    int start, end, threads, elements, id;

    while(sum_S != 0)   //While S is non - empty do
    {
        printf("\n\n--------- Iteration no. %d ---------", k);  //Print the iteration number
        k++;

        for(i = 1; i < rows; i++){  //With each iteration
            int sum = 0;        //we calculate the new degree of each node after deletion
        for(j = 1; j < columns; j++)
            sum = sum + matrix[j][i];

        if(graph[i - 1].id != 0)    //If the node hasn't been deleted
            graph[i - 1].degree = sum;  //We update the degree
    }


    printf("\nGraph: \n");  //and print the graph
    print(graph, rows);

    sum_S = 0;  //We set the sum_S to 0 to calculate it with the remaining nodes
    #pragma omp parallel num_threads(4) shared(rows, columns, S, L, graph, matrix, count) private(i, id, threads, elements, start, end)
    {
        id = omp_get_thread_num();
        threads = omp_get_num_threads();
        elements = (rows-1)/threads;
        start = elements*id;

        if(id != (threads - 1))
            end = start + elements;
        else
            end = (rows-1);

        for(i = start; i < end; i++)
            #pragma omp critical
            if(graph[i].degree == 0){   //If there is a node with no incoming edges
                #pragma omp task
                {
                    printf("thread %d of %d entering critical region\n", id, threads);
                    S[count].id = graph[i].id;  //add it's id to S
                    S[count].degree = graph[i].degree;//and the appropriate degree
                    L[count] = S[count].id; //and add the node from S to L
                    graph[i].id = 0;    //Delete the node from the graph list
                    graph[i].degree = INT_MAX;  //and set it's degree to infinity(INT_MAX)
                    for(j = 1; j < columns; j++)
                        matrix[i+1][j] = 0; //Also delete the node from the graph matrix
                    count+=1;
                    printf("exiting critical region\n");
                }
            }
    }

    printf("\nS: \n"); //Print the S list
    print(S, rows);

    for(i = 0; i < (rows - 1); i++)//Recalcute the ID sum of nodes inside S list
        sum_S = S[i].id + sum_S;

    for(i = 0; i < (rows - 1); i++){    //Reset S
        S[i].id = 0;    //by setting the ID to zero
        S[i].degree = INT_MAX;  //and the degree to infinity
    }

    printf("\nTopological order: ");    //Print the current topological order

    for(i = 0; i < (rows - 1); i++)
        if(L[i] != 0)//by printing the nodes with non-zero IDs 
            printf("%d --> ", L[i]);
    }


    for(i = 0; i < (rows - 1); i++)
        if(L[i] == 0){  //If the L list has a node with a zero ID
            printf("\n\nError! Circle detected! No topological order!\n");  //there is a circle inside the graph
            break;  //and so there isn't a topological order
        }
    else if(i == (rows - 2))
        printf("END\n");
}

The main problem is that all the computational work is put in a critical section.主要问题是所有计算工作都放在关键部分。 Since the critical section serializes operations, you should not expect any speed up regarding the sequential version.由于关键部分序列化操作,因此您不应该期望关于顺序版本的任何加速。 In practice, it can even be slower.在实践中,它甚至可以更慢。

Moreover, I am not sure this code is actually correct.此外,我不确定这段代码实际上是否正确。 Indeed, you use the #pragma omp task in a critical section.实际上,您在关键部分使用了#pragma omp task The runtime may or may not execute the task directly or defer its execution (see Section 2.10 of the OpenMP 5.0 specification ).运行时可能会或可能不会直接执行任务或延迟其执行(请参阅OpenMP 5.0 规范的第 2.10 节)。 In the first case, your code seems correct, but it will not run faster than the sequential version.在第一种情况下,您的代码似乎是正确的,但它不会比顺序版本运行得更快。 In the second case, the thread entering the critical section submits a task and then leave the critical section so that other threads can submit also further tasks and execute task of other threads in parallel.第二种情况,进入临界区的线程先提交一个任务,然后离开临界区,这样其他线程也可以提交更多的任务,并行执行其他线程的任务。 In this case, there are data races in the executed tasks because count is shared and the increment unprotected as well as the access to S[count].id and others.在这种情况下,执行任务中存在数据竞争,因为count是共享的,并且增量不受保护以及对S[count].id的访问等。

You need to redesign your parallel algorithm.您需要重新设计并行算法。 Working on graphs in parallel is not simple, especially when the structure of the graph need to be mutated in parallel.并行处理图并不简单,尤其是当需要并行改变图的结构时。 I do not expect this approach to scale well (or being faster than an optimized sequential algorithm) even by using fine-grained locks or atomic instructions carefully.即使仔细使用细粒度锁或原子指令,我也不希望这种方法能够很好地扩展(或者比优化的顺序算法更快)。 I advise you to look for parallel topological sorts in the state of the art.我建议您在现有技术的 state 中寻找并行拓扑排序。 You can find some very interesting solutions here .您可以在这里找到一些非常有趣的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM