简体   繁体   English

从C中的文件读取特定数量的行(scanf,fseek,fgets)

[英]Reading a specific number of lines from a file in C (scanf, fseek,fgets)

I have a process master that spawns N child processes that communicate with the parent through unnamed pipes. 我有一个流程主控器,它生成N个子进程 ,这些子进程通过未命名的管道与父进程进行通信。 I must be able to: 我必须能够:

  • make the father open the file and then send, to each child, a struct telling that it has to read from min to max line; 让父亲打开文件,然后向每个孩子发送一个结构,告诉它必须从最小行读取到最大行;
  • this is going to happen at the same time, so I don't know: 这将同时发生,所以我不知道:
  • 1st how to divide total_lines for N maps and 第一种方法是如何为N个地图划分total_lines和
  • 2nd how do I make each child read just the lines it is supposed to? 第二,如何让每个孩子都读应该读的字句?

My problem does not concern the OS concepts, only the file operations :S 我的问题与OS概念无关,仅涉及文件操作:S

Perhaps fseek? 也许fseek? I can't mmap the log file (some have more than 1GB). 我无法映射日志文件(有些文件大小超过1GB)。

I would appreciate some ideas. 我将不胜感激。 Thank you in advance 先感谢您

EDIT: I'm trying to make the children read the respective lines without using fseek and the value of chunks, so, could someone please tell me if this is valid? 编辑:我试图让孩子们在不使用fseek和块的值的情况下读取相应的行,所以,有人可以告诉我这是否有效吗? :

//somewhere in the parent process:

    FILE* logFile = fopen(filename, "r");
                while (fgets(line, 1024, logFile) != NULL) {
                    num_lines++;
                }
                rewind(logFile);
                int prev = 0;
                for (i = 0; i < maps_nr; i++) {
                    struct send_to_Map request;
                    request.fp = logFile;
                    request.lower = lowLimit;
                    request.upper = highLimit;

                    if (i == 0)
                        request.minLine = 0;
                    else
                        request.minLine = 1 + prev;
                    if(i!=maps_nr-1)
                        request.maxLine = (request.minLine + num_lines / maps_nr) - 1;
                    else
                       request.maxLine = (request.minLine + num_lines / maps_nr)+(num_lines%maps_nr);
                    prev = request.maxLine;

                }
                //write this structure to respective pipe


//child process:

while(1) {
      ...
      //reads the structure to pipe (and knows which lines to read)
      int n=0, counter=0;
      while (fgets(line, 1024, logFile) != NULL){
         if (n>=minLine and n<=maxLine)
              counter+= process(Line);//returns 1 if IP was found, in that line, between the low and high limit
         n++; 
      }
     //(...) 
}

I don't know if it's going to work, I just to make it work! 我不知道它是否会起作用,我只是想使其起作用! Even like this, is it possible to outperform a single process reading the whole file and printing the total number of ips found in the log file? 即使这样,是否有可能胜过读取整个文件并打印日志文件中找到的ip总数的单个过程?

If you don't care about dividing the file exactly evenly, and the distribution of line lengths is somewhat even over the entire file, you can avoid reading the entire file in the parent once. 如果您不希望完全均匀地分割文件,并且行长在整个文件中的分布有些均匀,则可以避免一次在父级中读取整个文件。

  1. Get the file size. 获取文件大小。
  2. chunk_size = file_size / number_of_children chunk_size = file_size /子代数
  3. When you spawn each child do in the parent: 当您生成每个孩子时,在父级中执行以下操作:
    • seek to (child_num+1) * chunk_size 寻求(child_num + 1)* chunk_size
    • Read forward until you find a newline. 继续阅读,直到找到换行符。
    • Spawn the child, telling it to start at the end of the previous chunk (or 0 for the first child), and the actual length of the chunk. 生成孩子,告诉它从上一个块的末尾开始(或者第一个孩子的末尾为0),以及块的实际长度。
  4. Each child seeks to start and reads chunk_size bytes. 每个孩子都试图start并读取chunk_size字节。

That's a rough sketch of the strategy. 这是该策略的概图。

Edited to simplify things a bit. 编辑简化一些事情。

Edit : here's some untested code for step 3, and step 4 below. 编辑 :这是下面第3步和第4步的一些未经测试的代码。 This is all untested, and I haven't been careful about off-by-one errors, but it gives you an idea of the usage of fseek and ftell , which sounds like what you are looking for. 这都是未经测试的,并且我还没有仔细研究过一次的错误,但是它使您了解fseekftell的用法,听起来像您要找的东西。

// Assume FILE* f is open to the file, chunk_size is the average expected size,
// child_num is the id of the current child, spawn_child() is a function that
// handles the logic of spawning a child and telling it where to start reading,
// and how much to read. child_chunks[] is an array of structs to keep track of
// where the chunks start and how big they are.
if(fseek(f, child_num * chunk_size, SEEK_SET) < 0) { handle_error(); }
int ch;
while((ch = fgetc(f)) != FEOF && ch != '\n')
{/*empty*/}

// FIXME: needs to handle EOF properly.
child_chunks[child_num].end = ftell(f); // FIXME: needs error check.
child_chunks[child_num+1].start = child_chunks[child_num].end + 1;
spawn_child(child_num);

Then in your child (step 4), assume the child has access to child_chunks[] and knows its child_num : 然后在您的孩子中(第4步),假设该孩子可以访问child_chunks[]并知道其child_num

void this_is_the_child(int child_num)
{
    /* ... */

    fseek(f, child_chunks[child_num].start, SEEK_SET); // FIXME: handle error
    while(fgets(...) && ftell(f) < child_chunks[child_num].end)
    {
    }
}
/* get an array with line-startpositions (file-offsets) */
fpos_t readLineBegins(FILE *f,fpos_t **begins)
{
  fpos_t ch=0, mark=0, num=0;
  *begins = 0;
  do {
    if( ch=='\n' )
    {
       *begins = realloc( *begins, ++num * sizeof(fpos_t) );
      (*begins)[num-1] = mark;
        mark = ftell(f);
    }
  } while( (ch=fgetc(f))!=EOF );

  if( mark<ftell(f) )
  {
    *begins = realloc( *begins, ++num * sizeof(fpos_t) );
    (*begins)[num-1]=mark;
  }

  return num;
}

/* output linenumber beg...end */
void workLineBlocks(FILE *f,fpos_t *begins,fpos_t beg,fpos_t end)
{
  while( beg<=end )
  {
    int ch;
    fsetpos( f, &begins[beg] ); /* set linestart-position */
    printf("%ld:", ++beg );
    while( (ch=fgetc(f))!=EOF && ch!='\n' && ch!='\r' )
      putchar(ch);
    puts("");
  }
}

main()
{
  FILE *f=fopen("file.txt","rb");
  fpos_t *lineBegins, /* Array with line-startpositions */
  lb = readLineBegins(f,&lineBegins); /* get number of lines */

  workLineBlocks(f,lineBegins,lb-2,lb-1); /* out last two lines */
  workLineBlocks(f,lineBegins,0,1); /* out first two lines */

  fclose(f);
  free(lineBegins);
}

我认为它可以帮助您: 从文本文件中读取特定范围的行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM