简体   繁体   English

将文本文件读取为2个单独的字符数组(用C语言表示)

[英]Reading a text file into 2 separate arrays of characters (in C)

For a class I have to write a program to read in a text file in the format of: 对于一堂课,我必须编写一个程序来读取文本文件,格式为:


TAEDQQ TAEDQQ
ZHPNIU ZHPNIU
CKEWDI 凯迪
VUXOFC VUXOFC
BPIRGK 毕马威
NRTBRB NRTBRB
EXIT 出口
THE
QUICK
BROWN 棕色
FOX 狐狸


I'm trying to get the characters into an array of chars, each line being its own array. 我正在尝试将字符放入一个char数组中,每一行都是其自己的数组。 I'm able to read from the file okay, and this is the code I use to parse the file: 我可以从文件中读取,这是我用来解析文件的代码:


char** getLinesInFile(char *filepath)  
{  
    FILE *file;  
    const char mode = 'r';  
    file = fopen(filepath, &mode);  
    char **textInFile;  

    /* Reads the number of lines in the file. */
    int numLines = 0;
    char charRead = fgetc(file);
    while (charRead != EOF)
    {
        if(charRead == '\n' || charRead == '\r')
        {
            numLines++;
        }
        charRead = fgetc(file);
    }

    fseek(file, 0L, SEEK_SET);
    textInFile = (char**) malloc(sizeof(char*) * numLines);

    /* Sizes the array of text lines. */
    int line = 0;
    int numChars = 1;
    charRead = fgetc(file);
    while (charRead != EOF)
    {
        if(charRead == '\n' || charRead == '\r')
        {
            textInFile[line] = (char*) malloc(sizeof(char) * numChars);
            line++;
            numChars = 0;
        }
        else if(charRead != ' ')
        {
            numChars++;
        }
        charRead = fgetc(file);
    }

    /* Fill the array with the characters */
    fseek(file, 0L, SEEK_SET);
    charRead = fgetc(file);
    line = 0;
    int charNumber = 0;
    while (charRead != EOF)
    {
        if(charRead == '\n' || charRead == '\r')
        {
            line++;
            charNumber = 0;
        }
        else if(charRead != ' ')
        {
            textInFile[line][charNumber] = charRead;
            charNumber++;
        }
        charRead = fgetc(file);
    }

    return textInFile;
}

This is a run of my program: 这是我程序的运行:


Welcome to Word search! 欢迎使用Word搜索!

Enter the file you would like us to parse:testFile.txt TAEDQQ!ZHPNIU!CKEWDI!VUXOFC!BPIRGK!NRTBRB!EXIT!THE!QUICK!BROWN!FOX Segmentation fault 输入您希望我们解析的文件:testFile.txt TAEDQQ!ZHPNIU!CKEWDI!VUXOFC!BPIRGK!NRTBRB!EXIT!QUICK!BROWN!FOX分段错误


What's going on? 这是怎么回事? A), why are the exclamation marks there, and B) why do I get a seg fault at the end? A),为什么会有感叹号?B)为什么最后我出现段错误? The last thing I do in the main is iterate through the array/pointers. 我主要要做的最后一件事是遍历数组/指针。

1) In the first part of your program, you are miscounting the number of lines in the file. 1)在程序的第一部分中,您误算了文件中的行数。 The actual number of lines in the file is 11, but your program gets 10. You need to start counting from 1, as there will always be at least one line in the file. 文件中的实际行数为11,但是您的程序为10。您需要从1开始计数,因为文件中始终至少有一行。 So change 所以改变

int numLines = 0;

to

int numLines = 1;

2) In the second part of the program you are miscounting the number of characters on each line. 2)在程序的第二部分中,您误算了每行中的字符数。 You need to keep your counter initializations the same. 您需要保持计数器初始化不变。 At the start of the segment you initialize numChars to 1. In that case you need to reset your counter to 1 after each iteration, so change: 在该段的开头,将numChars初始化为1。在这种情况下,您需要在每次迭代后将计数器重置为1,因此请更改:

numChars = 0;

to

numChars = 1;

This should provide enough space for all the non-space characters and for the ending NULL terminator. 这应该为所有非空格字符和结尾的NULL终止符提供足够的空间。 Keep in mind that in C char* strings are always NULL terminated. 请记住,在C char *中,字符串始终以NULL终止。

3) Your program also does not account for differences in line termination, but under my test environment that is not a problem -- fgetc returns only one character for the line terminator, even though the file is saved with \\r\\n terminators. 3)您的程序也没有考虑行终止的差异,但是在我的测试环境下这没有问题-即使文件使用\\ r \\ n终止符保存,fgetc仍仅为行终止符返回一个字符。

4) In the second part of your program, you are also not allocating memory for the very last line. 4)在程序的第二部分,您也没有为最后一行分配内存。 This causes your segfault in the third part of your program when you try to access the unallocated space. 当您尝试访问未分配的空间时,这会导致程序第三部分出现段错误。

Note how your code only saves lines if they end in \\r or \\n. 请注意,您的代码仅在行以\\ r或\\ n结尾时才保存行。 Guess what, EOF which technically is the line ending for the last line does not qualify. 猜猜是什么,EOF从技术上讲是最后一行的结尾,不符合条件。 So your second loop does not save the last line into the array. 因此,第二个循环不会将最后一行保存到数组中。

To fix this, add this after the second part: textInFile[line] = (char*) malloc(sizeof(char) * numChars); 要解决此问题,请在第二部分之后添加:textInFile [line] =(char *)malloc(sizeof(char)* numChars);

4) In your program output you are seeing those weird exclamation points because you are not NULL terminating your strings. 4)在程序输出中,您会看到那些奇怪的感叹号,因为您不能以NULL终止字符串。 So you need to add the line marked as NULL termination below: 因此,您需要在下面添加标记为NULL终止的行:

if(charRead == '\n' || charRead == '\r')
{
    textInFile[line][charNumber] = 0; // NULL termination
    line++;
    charNumber = 0;
}

5) Because you are checking for EOF, you have the same problem in your third loop, so you must add this before the return 5)由于您正在检查EOF,因此在第三个循环中存在相同的问题,因此必须在返回之前将其添加

textInFile[line][charNumber] = 0; // NULL termination

6) I am also getting some headaches because of the whole program structure. 6)由于整个程序的结构,我也有些头疼。 You read the same file character by character 3 times! 您逐字符读取同一文件3次! This is extremely slow and inefficient. 这是极其缓慢且效率低下的。

Fixed code follows below: 固定代码如下:

char** getLinesInFile(char *filepath)  
{  
    FILE *file;  
    const char mode = 'r';  
    file = fopen(filepath, &mode);  
    char **textInFile;

    /* Reads the number of lines in the file. */
    int numLines = 1;
    char charRead = fgetc(file);
    while (charRead != EOF)
    {
        if(charRead == '\n' || charRead == '\r')
        {
            numLines++;
        }
        charRead = fgetc(file);
    }

    fseek(file, 0L, SEEK_SET);
    textInFile = (char**) malloc(sizeof(char*) * numLines);

    /* Sizes the array of text lines. */
    int line = 0;
    int numChars = 1;
    charRead = fgetc(file);
    while (charRead != EOF)
    {
        if(charRead == '\n' || charRead == '\r')
        {
            textInFile[line] = (char*) malloc(sizeof(char) * numChars);
            line++;
            numChars = 1;
        }
        else if(charRead != ' ')
        {
            numChars++;
        }
        charRead = fgetc(file);
    }
textInFile[line] = (char*) malloc(sizeof(char) * numChars);

    /* Fill the array with the characters */
    fseek(file, 0L, SEEK_SET);
    charRead = fgetc(file);
    line = 0;
    int charNumber = 0;
    while (charRead != EOF)
    {
        if(charRead == '\n' || charRead == '\r')
        {
            textInFile[line][charNumber] = 0; // NULL termination
            line++;
            charNumber = 0;
        }
        else if(charRead != ' ')
        {
            textInFile[line][charNumber] = charRead;
            charNumber++;
        }
        charRead = fgetc(file);
    }
    textInFile[line][charNumber] = 0; // NULL termination

    return textInFile;
}

You aren't null terminating your arrays. 终止数组不是空的。 This probably explains both problems. 这可能解释了两个问题。 Be sure to allocate an extra character for the null terminator. 确保为空终止符分配一个额外的字符。

Do This: 做这个:

if(charRead == '\n')
    {
        textInFile[line] = (char*) malloc(sizeof(char) * (numChars+1));
        line++;
        numChars = 0;
    }

Then: 然后:

 if(charRead == '\n')
    {
        textInFile[line][charNumber]='\0';
        line++;
        charNumber = 0;
    }

Also you are reading the file 3 times! 另外,您正在读取文件3次! This thread has some good explanation on how to read a file efficiently. 线程对如何有效读取文件有一些很好的解释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM