繁体   English   中英

逐行读取文件并将行存储在 C 中的字符串数组中

[英]Read file line by line and store lines in array of strings in C

我在读取 c 中的文件并存储在字符串数组中时遇到问题

char **aLineToMatch;
FILE *file2; 
int bufferLength = 255;
char buffer[bufferLength];
int i;

char *testFiles[] = { "L_2005149PL.01002201.xml.html",
                       "L_2007319PL.01000101.xml.html",
                       NULL};

char *testStrings[] = { "First",
                         "Second",
                         "Third.",
                          NULL};


file = fopen(testFiles[0], "r"); // loop will come later, thats not the problem

while(fgets(buffer, bufferLength, file2) != NULL) {
 printf("%s\n", buffer);
 // here should be adding to array of strings (testStrings declared above)
} 
 fclose(file);
}

然后我做一些检查,一些打印等。

for(aLineToMatch=testStrings; *aLineToMatch != NULL; aLineToMatch++) {
    printf("String: %s\n", *aLineToMatch);

如何正确更改*testFiles[]的值以包含从文件读取的有效值并在末尾添加 NULL?

我认为这里的关键问题是在 C 中您必须管理自己的 memory,并且您需要了解 C 中可用的不同类型存储之间的区别。

简单地说,有:

  1. Static

以下是一些相关链接,其中包含更多详细信息:

https://www.geeksforgeeks.org/memory-layout-of-c-program/

https://craftofcoding.wordpress.com/2015/12/07/memory-in-c-the-stack-the-heap-and-static/

无论如何,在高级语言中,一切都在堆上,因此您几乎可以随心所欲地操纵它。 但是,沼泽标准 arrays 和 C 中的字符串具有固定大小的 static 存储。

这个答案的 rest 在下面的代码注释中。

我已经修改了你的代码,并试图解释为什么需要它。

// @Compile gcc read_line_by_line.c && ./a.out
// @Compile gcc read_line_by_line.c && valgrind ./a.out
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <stdbool.h>

// When declaring an array, the size of the array must be a compile-time constant
// i.e. it cannot be a dynamic variable like this: int n = 3; int numbers[n];
#define BUFFER_SIZE_BYTES 255

// Uses static program storage, size is fixed at time of compilation
char *files[] = {"file1.txt", "file2.txt"}; // The size of this symbol is (sizeof(char*) * 2)
// Hence this line of code is valid even outside the body of a function
// because it doesn't actually execute,
// it just declares some memory that the compiler is supposed to provision in the resulting binary executable

// Divide the total size, by the size of an element, to calculate the number of elements
const int num_files = sizeof(files) / sizeof(files[0]);

int main() {
  printf("Program start\n\n");

  printf("There are %d files to read.\n", num_files);

  // These lines are in the body of a function and they execute at runtime
  // This means we are now allocating memory 'on-the-fly' at runtime
  int num_lines = 3;
  char **lines = malloc(sizeof(lines[0]) * num_lines);

  // lines[0] = "First"; // This would assign a pointer to some static storage containing the bytes { 'F', 'i', 'r', 's', 't', '\0' }
  lines[0] = strdup("First");  // Use strdup() instead to allocate a copy of the string on the heap
  lines[1] = strdup("Second"); // This is so that we don't end up with a mixture of strings
  lines[2] = strdup("Third");  // with different kinds of storage in the same array
  // because only the heap strings can be free()'d
  // and trying to free() static strings is an error
  // but you won't be able to tell them apart,
  // they will all just look like pointers
  // and you won't know which ones are safe to free()

  printf("There are %d lines in the array.\n", num_lines);

  // Reading the files this way only works for lines shorter than 255 characters
  /*
  printf("\nReading file...\n");
  FILE *fp = fopen(files[0], "r");
  char buffer[BUFFER_SIZE_BYTES];
  while (fgets(buffer, BUFFER_SIZE_BYTES, fp) != NULL) {
    printf("%s\n", buffer);

    // Resize the array we allocated on the heap
    void *ptr = realloc(lines, (num_lines + 1) * sizeof(lines[0]));
    // Note that this can fail if there isn't enough free memory available
    // This is also a comparatively expensive operation
    // so you wouldn't typically do a resize for every single line
    // Normally you would allocate extra space, wait for it to run out, then reallocate
    // Either growing by a fixed size, or even doubling the size, each time it gets full

    // Check if the allocation was successful
    if (ptr == NULL) {
      fprintf(stderr, "Failed to allocate memory at %s:%d\n", __FILE__, __LINE__);
      assert(false);
    }
    // Overwrite `lines` with the pointer to the new memory region only if realloc() was successful
    lines = ptr;

    // We cannot simply lines[num_lines] = buffer
    // because we will end up with an array full of pointers
    // that are all pointing to `buffer`
    // and in the next iteration of the loop
    // we will overwrite the contents of `buffer`
    // so all appended strings will be the same: the last line of the file

    // So we strdup() to allocate a copy on the heap
    // we must remember to free() this later
    lines[num_lines] = strdup(buffer);

    // Keep track of the size of the array
    num_lines++;
  }
  fclose(fp);
  printf("Done.\n");
  */

  // I would recommend reading the file this way instead
  ///*
  printf("\nReading file...\n");
  FILE *fp = fopen(files[0], "r");
  char *new_line = NULL; // This string is allocated for us by getline() and could be any length, we must free() it though afterwards
  size_t str_len = 0;    // This will store the length of the string (including null-terminator)
  ssize_t bytes_read; // This will store the bytes read from the file (excluding null-terminator), or -1 on error (i.e. end-of-file reached)
  while ((bytes_read = getline(&new_line, &str_len, fp)) != -1) {
    printf("%s\n", new_line);

    // Resize the array we allocated on the heap
    void *ptr = realloc(lines, (num_lines + 1) * sizeof(lines[0]));
    // Note that this can fail if there isn't enough free memory available
    // This is also a comparatively expensive operation
    // so you wouldn't typically do a resize for every single line
    // Normally you would allocate extra space, wait for it to run out, then reallocate
    // Either growing by a fixed size, or even doubling the size, each time it gets full

    // Check if the allocation was successful
    if (ptr == NULL) {
      fprintf(stderr, "Failed to allocate memory at %s:%d\n", __FILE__, __LINE__);
      assert(false);
    }
    // Overwrite `lines` with the pointer to the new memory region only if realloc() was successful
    lines = ptr;

    // Allocate a copy on the heap
    // so that the array elements don't all point to the same buffer
    // we must remember to free() this later
    lines[num_lines] = strdup(new_line);

    // Keep track of the size of the array
    num_lines++;
  }
  free(new_line); // Free the buffer that was allocated by getline()
  fclose(fp);     // Close the file since we're done with it
  printf("Done.\n");
  //*/

  printf("\nThere are %d lines in the array:\n", num_lines);
  for (int i = 0; i < num_lines; i++) {
    printf("%d: \"%s\"\n", i, lines[i]);
  }

  // Here you can do what you need to with the data...

  // free() each string
  // We know they're all allocated on the heap
  // because we made copies of the statically allocated strings
  for (int i = 0; i < num_lines; i++) {
    free(lines[i]);
  }

  // free() the array itself
  free(lines);

  printf("\nProgram end.\n");
  // At this point we should have free()'d everything that we allocated
  // If you run the program with Valgrind, you should get the magic words:
  // "All heap blocks were freed -- no leaks are possible"
  return 0;
}

如果要向数组添加元素,有 3 个选项:

  1. 在编译时确定最大元素数并创建正确大小的数组
  2. 在运行时确定最大元素数并创建一个可变长度数组(仅适用于 C99 及更高版本)
  3. 动态分配数组并根据需要展开

选项 1 在这里不起作用,因为在编译时不可能知道你的文件有多少行。

选项 2 意味着您首先找到行数,这意味着迭代文件两次。 这也意味着当你从读取文件的 function 返回时,数组会自动释放。

选项 3 是最好的。 这是一个例子:

char **aLineToMatch;
FILE *file2; 
int bufferLength = 255;
char buffer[bufferLength];
int i = 0;

char *testFiles[] = { "L_2005149PL.01002201.xml.html",
                       "L_2007319PL.01000101.xml.html",
                       NULL};

char (*testStrings)[bufferLength] = NULL; //pointer to an array of strings

//you probably meant file2 here (or the normal file in the while condition)
file2 = fopen(testFiles[0], "r"); // loop will come later, thats not the problem

while(fgets(buffer, bufferLength, file2) != NULL) {
 printf("%s\n", buffer);
 testStrings = realloc(testStrings, (i + 1) * sizeof testStrings[0]);
 strcpy(testStrings[i], buffer);
 i++;
} 
 fclose(file);
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM