openmp-文本文件读取和使用管道的while循环

Question

我发现openmp不支持while循环（或者至少不太喜欢它们）。 而且也不喜欢'！='运算符。

我有这段代码。

int count = 1;
#pragma omp parallel for
    while ( fgets(buff, BUFF_SIZE, f) != NULL )
    {
        len = strlen(buff);
        int sequence_counter = segment_read(buff,len,count);
        if (sequence_counter == 1)
        {
            count_of_reads++;
            printf("\n Total No. of reads: %d \n",count_of_reads);
        }
    count++;
    }

有关如何处理此问题的任何线索？ 我读过某个地方（包括关于stackoverflow的另一篇文章），我可以使用管道。 那是什么？ 以及如何实施？

Answer 1

太糟糕了，人们是如此之快地选择最佳答案。 这是我的答案。
首先，您应该将文件读入带有fread之类的缓冲区。 这是非常快的。 有关如何执行此操作的示例，请参见http://www.cplusplus.com/reference/cstdio/fread/

然后，您可以与OpenMP并行处理缓冲区。 我已经为您实现了大部分。 下面是代码。 您没有提供segment_read函数，所以我创建了一个虚拟的。 我使用了C ++中的一些函数，例如std :: vector和std :: sort，但通过做更多的工作，您也可以在纯C语言中执行此操作。

编辑：我编辑了此代码，并能够删除排序和关键部分。

我用g++ foo.cpp -o foo -fopenmp -O3编译

#include <stdio.h>
#include <omp.h>
#include <vector>

using namespace std;

int segment_read(char *buff, const int len, const int count) {
  return 1;  
}

void foo(char* buffer, size_t size) {
    int count_of_reads = 0;
    int count = 1;
    std::vector<int> *posa;
    int nthreads;

    #pragma omp parallel 
    {
        nthreads = omp_get_num_threads();
        const int ithread = omp_get_thread_num();
        #pragma omp single 
        {
            posa = new vector<int>[nthreads];
            posa[0].push_back(0);
        }

        //get the number of lines and end of line position
        #pragma omp for reduction(+: count)
        for(int i=0; i<size; i++) {
            if(buffer[i] == '\n') { //should add EOF as well to be safe
                count++;
                posa[ithread].push_back(i);
            }
        }

        #pragma omp for     
        for(int i=1; i<count ;i++) {    
            const int len = posa[ithread][i] - posa[ithread][i-1];
            char* buff = &buffer[posa[ithread][i-1]];
            const int sequence_counter = segment_read(buff,len,i);
            if (sequence_counter == 1) {
                #pragma omp atomic
                count_of_reads++;
                printf("\n Total No. of reads: %d \n",count_of_reads);
            }

        }
    }
    delete[] posa;
}

int main () {
  FILE * pFile;
  long lSize;
  char * buffer;
  size_t result;

  pFile = fopen ( "myfile.txt" , "rb" );
  if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

  // obtain file size:
  fseek (pFile , 0 , SEEK_END);
  lSize = ftell (pFile);
  rewind (pFile);

  // allocate memory to contain the whole file:
  buffer = (char*) malloc (sizeof(char)*lSize);
  if (buffer == NULL) {fputs ("Memory error",stderr); exit (2);}

  // copy the file into the buffer:
  result = fread (buffer,1,lSize,pFile);
  if (result != lSize) {fputs ("Reading error",stderr); exit (3);}

  /* the whole file is now loaded in the memory buffer. */
  foo(buffer, result);
  // terminate


  fclose (pFile);
  free (buffer);
  return 0;
}

Answer 2

在OpenMP中实现“并行while”的一种方法是使用while循环创建任务。 这是一个一般的草图：

void foo() {
    while( Foo* f = get_next_thing() ) {
#pragma omp task firstprivate(f)
        bar(f);
    }
#pragma omp taskwait
}

对于循环遍历fget的特定情况，请注意fgets具有固有的顺序语义（它获得“下一行”行），因此在启动任务之前需要先调用它。 对于每个任务来说，对fgets返回的数据进行自己的副本操作也很重要，这样，对fgets的调用不会覆盖前一个任务正在操作的缓冲区。

Answer 3

首先，即使它非常接近，但openmp不会神奇地使您的代码并行化。 它与for一起使用for因为for具有它可以理解的上下限。 Openmp使用这些界限在不同线程之间划分工作。

使用while循环是不可能的。

第二，您如何期望任务并行化？ 您正在从文件中读取文件，其中顺序访问可能会比并行访问提供更好的性能。 您可以并行化segment_read （基于其实现）。

或者，您可能希望将文件读取与处理重叠。 为此，您需要使用更多低级函数，例如Unix的open和read函数。 然后，执行异步读取，这意味着您发送读取请求，处理最后一个读取块，然后等待读取请求完成。 例如，搜索“ Linux异步io”以了解更多信息。

使用管道可能实际上并没有太大帮助。 那将取决于我不太熟悉的管道的许多内部结构。 但是，如果您有足够大的内存，则可能还需要考虑先加载整个数据，然后再处理它。 这样，可以尽快（顺序地）完成数据的加载，然后可以并行处理数据。

openmp-文本文件读取和使用管道的while循环

问题描述

3 个解决方案

解决方案1
12

解决方案2
3 已采纳 2013-05-29 17:31:36

解决方案3
1 2013-05-29 15:21:45

openmp-文本文件读取和使用管道的while循环

问题描述

3 个解决方案

解决方案1 12

解决方案2 3 已采纳 2013-05-29 17:31:36

解决方案3 1 2013-05-29 15:21:45

解决方案1
12

解决方案2
3 已采纳 2013-05-29 17:31:36

解决方案3
1 2013-05-29 15:21:45