如何创建一个二维数组来存储从 C 中的 a.txt 文件扫描的单词集合？

Question

I am working on a program where I want to scan a.txt file that contains a poem.我正在开发一个程序，我想在其中扫描包含一首诗的 .txt 文件。 After scanning the poem, I want to be able to store each individual word as a single string and store those strings in a 2D array.扫描完这首诗后，我希望能够将每个单独的单词存储为一个字符串，并将这些字符串存储在一个二维数组中。 For example, if my.txt file contains the following:例如，如果 my.txt 文件包含以下内容：

Haikus are easy.
But sometimes they don't make sense.
Refrigerator.

I want to be able to store each word as the following in a single array:我希望能够将每个单词存储在一个数组中，如下所示：

H a i k u s \0
a r e \0
e a s y . \0
B u t \0
s o m e t i m e s \0
t h e y \0
d o n ' t \0
m a k e \0
s e n s e . \0
R e f r i g e r a t o r . \0

So far, this is the code I have.到目前为止，这是我拥有的代码。 I am having difficulties understanding 2D arrays, so if someone could explain that to me as well in context to this problem, that would be great.我很难理解 2D arrays，所以如果有人能在这个问题的上下文中向我解释这一点，那就太好了。 I am still learning the C language, so it takes time for me to understand some things.我还在学习 C 语言，所以我需要时间来理解一些东西。 I have been scratching my head at this for a few hours now and am using this as help after trying everything I could think of!几个小时以来，我一直在为此挠头，在尝试了我能想到的一切后，我将其用作帮助！

The following is my function for getting the words and storing them in to arrays (it also returns the number of words there are, which is used separately for a different part of the program):以下是我的 function 用于获取单词并将它们存储到 arrays （它还返回单词的数量，分别用于程序的不同部分）：

int getWords(int maxSize, FILE* inFile, char strings[][COL_SIZE]){
    int numWords;
    for(int i = 0; i < maxSize; i++){
        fscanf(inFile, "%s", strings[i]);
        while(fscanf(inFile, "%s", strings[i] == 10){
            numWords++;
        }
    }
    return numWords;
}

Here's the code I have where I call the function in the main function (I am not sure what numbers to set the COL_SIZE and MAX_LENGTH to, like I said, I am new to this and am trying my best to understand 2D arrays and how they work): Here's the code I have where I call the function in the main function (I am not sure what numbers to set the COL_SIZE and MAX_LENGTH to, like I said, I am new to this and am trying my best to understand 2D arrays and how they工作）：

#define COL_SIZE 10
#define MAX_LENGTH 500

int main(){
    FILE* fp;
    char strArray[MAX_LENGTH][COL_SIZE];

    fp = fopen(FILE_NAME, "r");
    if(fp == NULL){
        printf("File could not be found!");
    }
    else{
        getWords(MAX_LENGTH, fp, strArray);
        fclose(fp);
    }
    return 0;
}

Answer 1

What you are not understanding, it that COL_SIZE must be large enough to store the longest word +1 for the nul-terminating character.您不理解的是， COL_SIZE必须足够大以存储nul 终止字符的最长单词+1 。 Take:拿：

R e f r i g e r a t o r . \0
----------------------------
1 2 3 4 5 6 7 8 9 0 1 2 3 4    - >  14 characters of storage required

You declare a 500 x 10 2D array of char :您声明char的 500 x 10 2D 数组：

char strArray[500][10]

"Refrigertator." cannot fit in strArray , so what happens is "Refrigerat" is stored at one row-index, and then "tor.\0" overwrites the first 5 characters of the next.不适合strArray ，所以发生的情况是"Refrigerat"存储在一个行索引中，然后"tor.\0"覆盖下一个的前 5 个字符。

There are a number of ways to handle the input, but if you want to use fscanf , then you need (1) to include a field-width modifier with the string conversion to limit the number of characters stored to the amount of storage available, and (2) validate the next character after those you have read is a whitespace character, eg处理输入的方法有很多种，但如果要使用fscanf ，则需要 (1) 在字符串转换中包含字段宽度修饰符，以将存储的字符数限制为可用存储量， (2) 验证您已阅读的字符之后的下一个字符是空格字符，例如

#include <ctype.h>

int getWords(int maxSize, FILE* inFile, char strings[][COL_SIZE])
{
    char c;
    int n = 0;
    
    while (n < maxSize) {
        int rtn = fscanf (inFile, "%9s%c", strings[n], &c);
        if (rtn == 2 && isspace(c))
            n++;
        else if (rtn == 1) {
            n++;
            break;
        }
        else
            break;
    }
    
    return n;
}

Note the format string contains a field-width modifier of one-less than the total number of characters available, and then the character conversion stores the next character and validates it is whitespace (if it isn't you have a word that is too long to fit in your array)请注意，格式字符串包含一个小于可用字符总数的字段宽度修饰符，然后字符转换存储下一个字符并验证它是空格（如果不是，您有一个太长的单词适合您的阵列）

With any user-input function, you cannot use it correctly unless you check the return .对于任何用户输入的 function，除非您检查 return ，否则您无法正确使用它。 Above, the return from fscanf() is saved in rtn .上面， fscanf()的返回值保存在rtn中。 If you have a successful conversion of both your string limited to COL_SIZE - 1 by your field-width modifier and c is whitespace, you have a successful read of the word and you are not yet at EOF .如果您的字段宽度修饰符限制为COL_SIZE - 1的字符串成功转换，并且c是空格，则您已成功读取该单词并且您尚未到达EOF 。 If the return is 1 , you have the successful read of the word and you have reached EOF (non-POSIX line end on last line).如果返回为1 ，则您已成功读取该单词并且您已到达EOF （最后一行的非 POSIX 行结束）。 Otherwise, you will either reach the limit of MAX_LENGTH and exit the loop, or your will reach EOF and fscanf() will return EOF forcing an exit of the loop through the else clause.否则，您将达到MAX_LENGTH的限制并退出循环，或者您将达到EOF并且fscanf()将返回EOF强制通过else子句退出循环。

Lastly, don't skimp on buffer size.最后，不要吝啬缓冲区大小。 The longest word in the non-medical unabridged dictionary is 29-character, requiring a total of 30 characters storage, so #define COL_SIZE 32 makes more sense than 10 .非医学未删节词典中最长的单词是 29 个字符，总共需要存储 30 个字符，因此#define COL_SIZE 32比10更有意义。

Look things over and let me know if you have more questions.看看事情，让我知道如果你有更多的问题。

stdio.h Only仅限 stdio.h

If you are limited to stdio.h , then you can manually confirm that c contains a whitespace character:如果您仅限于stdio.h ，那么您可以手动确认c包含空格字符：

        if (rtn == 2 && (c == ' ' || c == '\t' || c == '\n'))
            n++;

Answer 2

You probably don't want a traditional 2D array.您可能不想要传统的二维数组。 Those are usually rectangular, which is not well suited to storing variable length words.这些通常是矩形的，不太适合存储可变长度的单词。 Instead, you would want an array of pointers to buffers, sort of like argv is.相反，您需要一个指向缓冲区的指针数组，有点像argv 。 Since the goal is to load from a file, I suggest using a contiguous buffer rather than allocating a separate one for each word.由于目标是从文件加载，我建议使用连续缓冲区而不是为每个单词分配一个单独的缓冲区。

The general idea is this:总体思路是这样的：

First pass: get total file size and read in the whole thing (+1 byte for trailing NUL).第一遍：获取总文件大小并读入整个内容（+1 字节用于尾随 NUL）。
Second pass: count the words and split them with NULs.第二遍：计算单词并用 NUL 分割它们。
Third pass: allocate a buffer for the word pointers and fill it in第三遍：为字指针分配一个缓冲区并填充

Here's how to load the entire file:以下是加载整个文件的方法：

#include <sys/stat.h>
#include <stdlib.h>
#include <stdio.h>

char *load_file(const char *fname, int *n)
{
    struct stat st;
    if(stat(fname, &st) == -1 || st.st_size == 0) return NULL;
    char *buffer = malloc(st.st_size + 1);
    if(buffer == NULL) return NULL;
    FILE *file = fopen(fname, "r");
    if(file == NULL || fread(buffer, 1, st.st_size, file)) {
        free(buffer);
        buffer = NULL;
    }
    fclose(file);
    *n = st.st_size;
    return buffer;
}

You can count the words by just stepping through the file contents and marking the end of each word.您可以通过单步浏览文件内容并标记每个单词的结尾来计算单词。

#include <ctype.h>

char *skip_nonword(char *text, char *end)
{
    while(text != end && !isalpha(*text)) text++;
    return text;
}

char *skip_word(char *text, char *end)
{
    while(text != end && isalpha(*text)) text++;
    return text;
}

int count_words(char *text, int n)
{
    char *end = text + n;
    int count = 0;
    while(text < end) {
        text = skip_nonword(text, end);
        if(text < end) {
            count++;
            text = skip_word(text, end);
            *text = '\0';
        }
    }
    return count;
}

Now you are in position to allocate the word buffer and fill it in:现在你在 position 分配字缓冲区并填写：

char **list_words(const char *text, int n, int count)
{
    char *end = text + n;
    char **words = malloc(count * sizeof(char *));
    if(words == NULL) return NULL;
    for(int i = 0; i < count; i++) {
        words[i] = skip_nonword(text, end);
        text = skip_word(words[i], end);
    }
    return words;
}

如何创建一个二维数组来存储从 C 中的 a.txt 文件扫描的单词集合？

问题描述

2 个解决方案

解决方案1
2 2020-08-12 03:49:33

解决方案2
1 2020-08-12 04:50:36

如何创建一个二维数组来存储从 C 中的 a.txt 文件扫描的单词集合？

问题描述

2 个解决方案

解决方案1 2 2020-08-12 03:49:33

解决方案2 1 2020-08-12 04:50:36

解决方案1
2 2020-08-12 03:49:33

解决方案2
1 2020-08-12 04:50:36