简体   繁体   English

将文本文件读入C中的2D数组

[英]Read a text file into a 2D array in C

I'm trying to read an entire text file into a 2D array, so I can limit how much it can be stored and to know when to do a new line (if anyone has a better idea, I'm open to suggestions). 我正在尝试将整个文本文件读取为2D数组,因此我可以限制可以存储的文本量,并知道何时进行换行(如果有人有更好的主意,我欢迎您提出建议)。

This is what I have so far: 这是我到目前为止的内容:

int main(int argc, char** argv) {

    char texto[15][45];
    char ch;
    int count = 0;
    FILE *f = fopen("texto.txt", "r");

    if(f == NULL)
        printf("ERRO ao abrir o ficheiro para leitura");

    while((ch = fgetc(f) != EOF))
        count++;

    rewind(f);

    int tamanho = count;

    texto = malloc(tamanho *sizeof(char));

    fscanf(f, "%s", texto);

    fclose(f);

    printf("%s", texto);

    return (EXIT_SUCCESS);
}

And the text file is like this 文本文件是这样的

lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip
lorem ipsum lorem ipsum lorem ipsum lorem ip

But I get this error 但是我得到这个错误

error: assignment to expression with array type 错误:分配给具有数组类型的表达式

here 这里

texto = malloc(tamanho *sizeof(char)); texto = malloc(tamanho * sizeof(char));

The problem you are tasked with is one of forcing you to understand the differences and limitations between character-oriented input , formatted-input , and line-oriented input . 您面临的问题是迫使您了解面向字符的输入格式化输入面向行的输入之间的区别和局限性之一。 You are setting your array limits as: 您将数组限制设置为:

char texto[15][45];

Above declares an array of 15-1D arrays containing 45 characters each which will be sequential in memory (the definition of an array ). 上面声明了一个由15-1D数组组成的数组,每个数组包含45个字符,它们在内存中将是顺序的( array的定义)。 That means at each index texto[0] - texto[14] you can store at most 45 characters (or a string of 44 characters followed by the nul-terminating character). 这意味着在每个索引texto[0] - texto[14]您最多可以存储45字符(或44字符的字符串 ,后跟一个以零结尾的字符)。

You are then given a file of seven line of 45 characters each. 然后,您将得到一个由7行组成的文件,每行45字符。 But there are only 44 characters in each line? 但是每行只有44字符吗? -- wrong. -错误。 Since (presumably given "texto.txt" ) the information is held within a text file , there will be an additional '\\n' (newline) character at the end of each line. 由于(假定给定为"texto.txt" ),信息被保存在文本文件中 ,因此每行末尾将有一个附加的'\\n' (换行符)字符。 You must account for its presence in reading the file. 您在读取文件时必须考虑它的存在。 Each line in the file will look something like the following: 文件中的每一行将类似于以下内容:

        10        20        30        40
123456789012345678901234567890123456789012345
lorem ipsum lorem ipsum lorem ipsum lorem ip\n

(where the numbers simply represent a scale showing how many characters are present in each line) (其中数字仅代表显示每行中存在多少个字符的比例)

The ASCII '\\n' character is a single-character. ASCII '\\n'字符是一个单字符。

The formatted-input Approach 格式化输入法

Can you read the input with fscanf using the "%s" conversion specifier ? 您可以使用"%s" 转换说明符通过fscanf读取输入吗? (Answer: no) Why? (回答:否)为什么? The "%s" conversion specifier stops reading when it encounters the first whitespace characters after reading non-whitespace characters. 读取非空白字符后,遇到第一个空白字符时, "%s"转换说明符将停止读取。 That means reading with fscanf (fp, "%s", ...) will stop reading after the 5th character. 这意味着使用fscanf (fp, "%s", ...)读取将在第5个字符之后停止读取。

While you can remedy this by using the character-class conversion specifier of the form [...] where the brackets contains characters to be included (or excluded if the first character in the class is '^' ), you leave the '\\n' character unread in your input stream. 尽管您可以使用格式为[...]字符类转换说明符对此进行纠正,其中括号中包含要包含的字符(如果类中的第一个字符为'^' ,则不包含方括号),但是保留'\\n'字符在您的输入流中未读

While you can remedy that by using the '*' assignment-suppression character to read and discard the next character (the newline) with "%*c" , if you have any additional characters in the line, they too will remain in the input buffer (input stream, eg your file) unread . 尽管您可以通过使用'*'分配抑制字符来读取和丢弃带有"%*c"的下一个字符(换行符)来进行补救,但是如果该行中还有其他字符,它们也将保留在输入中缓冲区(输入流,例如您的文件) 未读

Are you beginning to get the picture that doing file input with the scanf family of functions is inherently fragile? 您是否开始了解使用scanf系列功能进行文件输入固有的脆弱性? (you would be right) (你会是对的)

A naive implementation using fscanf could be: 使用fscanf的天真的实现可能是:

#include <stdio.h>

#define NROWS 15    /* if you need a constant, #define one (or more) */
#define NCOLS 45

int main (int argc, char **argv) {

    char texto[NROWS][NCOLS] = {""};
    size_t n = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    /* read up to NROWS lines of 44 char each with at most 1 trailing char */
    while (n < NROWS && fscanf (fp, "%44[^\n]%*c", texto[n]) == 1)
        n++;    /* increment line count */

    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (size_t i = 0; i < n; i++)  /* output lines stored */
        printf ("texto[%2lu]: '%s'\n", i, texto[i]);

    return 0;
}

( note: if you can guarantee that your input file format is fixed and never varies, then this can be an appropriate approach. However, a single additional stray character in the file can torpedo this approach) 请注意:如果您可以保证输入文件的格式是固定的,并且永远不会改变,那么这可能是一种适当的方法。但是,文件中的一个额外的杂散字符可能会破坏这种方法。)

Example Use/Output 使用/输出示例

$ ./bin/texto2dfscanf <dat/texto.txt
texto[ 0]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 1]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 2]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 3]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 4]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 5]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'
texto[ 6]: 'lorem ipsum lorem ipsum lorem ipsum lorem ip'

line-oriented Input 面向行的输入

A better approach is always a line-oriented approach. 更好的方法始终是面向行的方法。 Why? 为什么? It allows you to separately validate the read of a line of data from your file (or from the user) and then validate parsing the necessary information from that line. 它使您可以分别验证从文件(或用户)读取的一行数据,然后验证从该行分析必要的信息。

But there is an intentional catch in the sizing of texto that complicates a simplistic line-oriented approach. 但是,在texto的大小调整中有一个故意的陷阱,使简单的面向行方法变得复杂。 While you may be tempted to simply attempting reading each line of text into texto[0-14] , you would only be reading the text into texto and leaving the '\\n' unread. 尽管您可能会试图将文本的每一行简单地读取为texto[0-14] ,但您只会将文本读取为texto'\\n' (What? I thought line-oriented input handles this? -- It does if you provide sufficient space in the buffer you are trying to fill...) (什么?我认为面向行的输入可以解决这个问题?-如果您在试图填充的缓冲区中提供足够的空间,它会这样做...)

Line-oriented input functions ( fgets and POSIX getline ) read and include the trailing '\\n' into the buffer being filled -- provided there is sufficient space. 面向行的输入函数( fgets和POSIX getline )读取在要填充的缓冲区中包含尾随'\\n' getline如果有足够的空间。 If using fgets , fgets will read no more characters than specified into the buffer (which provides protection of your array bounds). 如果使用fgets ,则fgets读取的字符数将不超过缓冲区中指定的字符数(缓冲区保护数组边界)。 Your task here has been designed to require reading of 46 characters with a line oriented function in order to read: 您在这里的任务被设计为要求读取46字符,并且具有面向行的功能才能读取:

the text + '\n' + '\0'

(the text plus the newline plus the nul-terminating character) (文本加上换行符再加上终止符)

This forces you to do line-oriented input properly. 这迫使您正确进行面向行的输入。 Read the information into a buffer of sufficient size to handle the largest anticipated input line (and don't skimp on buffer size). 将信息读取到足够大的缓冲区中,以处理最大的预期输入线(不要跳过缓冲区大小)。 Validate your read succeeded. 验证您的读取成功。 And then parse the information you need from the line using any manner you choose ( sscanf is fine in this case). 然后使用您选择的任何方式从行中解析您需要的信息(在这种情况下, sscanf很好)。 By doing it in this two-step manner, you can read the line, determine the original length of the line read (including the '\\n' ) and validate whether it all fit in your buffer. 通过分两步进行操作,您可以读取行,确定读取的行的原始长度(包括'\\n' )并验证其是否都适合您的缓冲区。 You can then parse the 44 characters (plus room for the nul-terminating characters). 然后,您可以解析44字符(加上以字符结尾的空间 )。

Further, if additional characters remain unread, you know that up-front and can then continually read and discard the remaining characters in preparation for your next read. 此外,如果还有其他字符未读,则您可以预先知道该字符,然后可以连续读取并丢弃其余字符,为下一次读取做准备。

A reasonable line-oriented approach could look something like the following: 合理的面向行的方法可能类似于以下内容:

#include <stdio.h>
#include <string.h>

#define NROWS 15    /* if you need a constant, #define one (or more) */
#define NCOLS 45
#define MAXC  1024

int main (int argc, char **argv) {

    char texto[NROWS][NCOLS] = {""},
        buffer[MAXC] = "";
    size_t n = 0;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    while (n < NROWS && fgets (buffer, MAXC, fp)) {
        size_t len = strlen (buffer);
        if (len && buffer[len-1] == '\n')
            buffer[--len] = 0;
        else
            if (len == MAXC-1) {
                fprintf (stderr, "error: line %zu too long.\n", ++n);
                /* remove remaining chars in line before next read */
                while (fgets (buffer, MAXC, fp)) {}
            }
        if (sscanf (buffer, "%44[^\n]", texto[n]) == 1)
            n++;
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (size_t i = 0; i < n; i++)  /* output lines stored */
        printf ("texto[%2zu]: '%s'\n", i, texto[i]);

    return 0;
}

(the output is the same) (输出是相同的)

character-oriented Input 面向字符的输入

The only method left is a character-oriented approach (which can be a very effective way of reading the file character-by-character). 剩下的唯一方法是面向字符的方法(这可能是一种逐个字符读取文件的非常有效的方法)。 The only challenge with a character-oriented approach is tracking the indexes on a character-by-character basis. 面向字符的方法的唯一挑战是逐个字符地跟踪索引。 The approach here is simple. 这里的方法很简单。 Just repeatedly call fgetc filling the available characters in texto and then discarding any additional characters in the line until the '\\n' or EOF is reached. 只是反复调用fgetc填补可用字符texto ,然后丢弃在该行的任何其他字符,直到'\\n'EOF为止。 It can actually provide a simpler, but equally robust solution compared to a line-oriented approach in the right circumstance. 与在适当情况下面向行的方法相比,它实际上可以提供一种更简单但同样可靠的解决方案。 I'll leave investigating this approach to you. 我将继续研究这种方法。

The key in any input task in C is matching the right set of tools with the job. C语言中任何输入任务中的键都将正确的工具集与作业匹配。 If you are guaranteed that the input file has a fixed format that never deviates, then formatted-input can be effective. 如果可以确保输入文件具有不会偏离的固定格式,那么格式化输入可能是有效的。 For all other input, (including user input), line-oriented input is generally recommended because of it's ability to read a full line without leaving a '\\n' dangling in the input buffer unread -- provided you use an adequately sized buffer. 对于所有其他输入(包括用户输入),通常建议使用面向行的输入,因为它能够读取整行而不会在输入缓冲区中留下未悬挂的'\\n'悬挂状态-前提是您使用了足够大小的缓冲区。 Character-oriented input can always be used, but you have the added challenge of keeping track of indexing on a character-by-character basis. 始终可以使用面向字符的输入,但是您面临着另一个挑战,即逐个字符地跟踪索引。 Using all three is the only way to develop an understanding of which is the best tool for the job. 使用这三种方法是唯一了解哪种工具是工作的最佳工具的唯一方法。

Look things over and let me know if you have further questions. 仔细检查一下,如果您还有其他问题,请告诉我。

You are assigning using malloc on fixed array, that is impossible, since it already has fixed size. 您将在固定数组上使用malloc进行分配,这是不可能的,因为它已经具有固定大小。 You should define the texto as char* in order to use malloc . 您应该将texto定义为char*以便使用malloc The purpose of malloc is to allocate memory, memory allocation on fixed arrays - not possible. malloc的目的是分配内存,在固定阵列上分配内存-不可能。

Here is example of how to read the text file in 2D array: 这是如何以2D数组读取文本文件的示例:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv) {
    char texto[256][256]; // 256 - Big enough array, or use malloc for dynamic array size
    char ch;
    int count = 0;
    FILE *f = fopen("texto.txt", "r");

    if(f == NULL)
        printf("ERRO ao abrir o ficheiro para leitura");

    while((ch = fgetc(f) != EOF)) {
        count++;
        // rewind(f);
        int tamanho = count;
        // texto[count] = malloc(tamanho *sizeof(char));
        fscanf(f, "%s", &texto[count]);
    }
    // Now lets print all in reverse way.
    for (int i = count; i != 0; i--) {
        printf("%s, ", texto[i]);
    }
    return (0);
}

output: 输出:

ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, lorem, ip, lorem, ipsum, lorem, ipsum, lorem, ipsum, orem, ip,lorem,ipsum,lorem,ipsum,lorem,ipsum,lorem,ip,lorem,ipsum,lorem,ipsum,lorem,ipsum,lorem,ip,lorem,ipsum,lorem,ipsum,lorem,ipsum,lorem,ip lorem,ipsum,lorem,ipsum,lorem,ipsum,lorem,ip,lorem,ipsum,lorem,ipsum,lorem,ipsum,lorem,ip,lorem,ipsum,lorem,ipsum,lorem,ipsum,lorem,ip,lorem ipsum,lorem,ipsum,lorem,ipsum,orem,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM