[英]OpenMP C program run slower than sequential code
我是OpenMP的新手,試圖並行化Jarvis的算法。 但是,事實證明,與順序代碼相比,並行程序要花費2-3倍的時間。
問題本身不能並行化嗎? 或者在我如何並行化方面出了點問題。
這是我針對該問題的openMP程序,其中兩部分並行化:
#include <stdio.h>
#include <sys/time.h>
#include <omp.h>
typedef struct Point
{
int x, y;
} Point;
// To find orientation of ordered triplet (p, q, r).
// The function returns
// 0 for colinear points
// 1 as Clockwise
// 2 as Counterclockwise
int orientation(Point p, Point i, Point q)
{
int val = (i.y - p.y) * (q.x - i.x) -
(i.x - p.x) * (q.y - i.y);
if (val == 0) return 0; // colinear
return (val > 0)? 1: 2; // clock or counterclock wise
}
// Prints convex hull of a set of n points.
void convexHull(Point points[], int n)
{
// There must be at least 3 points
if (n < 3) return;
// Initialize array to store results
Point results[n];
int count = 0;
// Find the leftmost point
int l = 0,i;
#pragma omg parallel shared (n,l) private (i)
{
#pragma omp for
for (i = 1; i < n; i++)
{
#pragma omp critical
{
if (points[i].x < points[l].x)
l = i;
}
}
}
// Start from leftmost point, keep moving counterclockwise
// until reach the start point again.
int p = l, q;
do
{
// Add current point to result
results[count]= points[p];
count++;
q = (p+1)%n;
int k;
#pragma omp parallel shared (p) private (k)
{
#pragma omp for
for (k = 0; k < n; k++)
{
// If i is more counterclockwise than current q, then
// update i as new q
#pragma omp critical
{
if (orientation(points[p], points[k], points[q]) == 2)
q = k;
}
}
}
// Now q is the most counterclockwise with respect to p
// Set p as q for next iteration, to add q to result
p = q;
} while (p != l); // While algorithm does not return to first point
// Print Result
int j;
for (j = 0; j < count; j++){
printf("(%d,%d)\n", results[j].x,results[j].y);
}
}
int main()
{
//declaration for start time, end time
//and total executions for the algorithm
struct timeval start, end;
int i, num_run = 100;
gettimeofday(&start,NULL);
Point points[] = {{0, 3}, {2, 2}, {1, 1}, {2, 1},
{3, 0}, {0, 0}, {3, 3}};
int n = sizeof(points)/sizeof(points[0]);
convexHull(points, n);
gettimeofday(&end,NULL);
int cpu_time_used = (((end.tv_sec - start.tv_sec) * 1000000) + (end.tv_usec
- start.tv_usec));
printf("\n\nExecution time: %d ms\n", cpu_time_used);
return 0;
}
嘗試通過添加以下代碼行來使輸入足夠實際:
Point points[3000];
int i;
for(i=0;i<3000;i++) {
points[i].x = rand()%100;
points[i].y = rand()%100;
int j;
for(j=i+1;j<3000;j++) {
if(points[i].x==points[j].x) {
if(points[i].y==points[j].y) {
i--;
break;
}
}
}
}
但有時會崩潰
在您的以下代碼中,parallel for
循環的全部內容都包裝在critical
語句中。 這意味着這部分代碼永遠不會一次被線程輸入。 一次有多個線程工作不會比單個線程經過所有迭代要快。 但是最重要的是,同步開銷浪費了一些時間(每個線程必須在進入關鍵部分之前獲取一個互斥體,然后再釋放它)。
int l = 0,i;
#pragma omp parallel shared (n,l) private (i)
{
#pragma omp for
for (i = 1; i < n; i++)
{
#pragma omp critical
{
if (points[i].x < points[l].x)
l = i;
}
}
}
需要對串行代碼進行某種程度的重構以實現並行化。 簡化通常是簡單操作的一種好方法:讓每個線程在迭代的一部分上計算部分結果(例如部分最小值,部分和),而不是將所有結果合並為全局結果。 對於受支持的操作,可以使用#pragma omp for reduction(op:var)
語法。 但是在這種情況下,必須手動完成減少操作。
了解以下代碼如何依賴局部變量來跟蹤最小值x
的索引,然后使用單個關鍵部分來計算全局最小值索引。
int l = 0,i;
#pragma omp parallel shared (n,l) private (i)
{
int l_local = 0; //This variable is private to the thread
#pragma omp for nowait
for (i = 1; i < n; i++)
{
// This part of the code can be executed in parallel
// since all write operations are on thread-local variables
if (points[i].x < points[l_local].x)
l_local = i;
}
// The critical section is entered only once by each thread
#pragma omp critical
{
if (points[l_local].x < points[l].x)
l = l_local;
}
#pragma omp barrier
// a barrier is needed in case some more code follow
// otherwise there is an implicit barrier at the end of the parallel region
}
第二個並行循環應采用相同的原理,該並行循環實際上受到critical
語句完全序列化的困擾。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.