[英]How can I create a 2D array to store a collection of words scanned from a .txt file in C?
I am working on a program where I want to scan a.txt file that contains a poem.我正在开发一个程序,我想在其中扫描包含一首诗的 .txt 文件。 After scanning the poem, I want to be able to store each individual word as a single string and store those strings in a 2D array.
扫描完这首诗后,我希望能够将每个单独的单词存储为一个字符串,并将这些字符串存储在一个二维数组中。 For example, if my.txt file contains the following:
例如,如果 my.txt 文件包含以下内容:
Haikus are easy.
But sometimes they don't make sense.
Refrigerator.
I want to be able to store each word as the following in a single array:我希望能够将每个单词存储在一个数组中,如下所示:
H a i k u s \0
a r e \0
e a s y . \0
B u t \0
s o m e t i m e s \0
t h e y \0
d o n ' t \0
m a k e \0
s e n s e . \0
R e f r i g e r a t o r . \0
So far, this is the code I have.到目前为止,这是我拥有的代码。 I am having difficulties understanding 2D arrays, so if someone could explain that to me as well in context to this problem, that would be great.
我很难理解 2D arrays,所以如果有人能在这个问题的上下文中向我解释这一点,那就太好了。 I am still learning the C language, so it takes time for me to understand some things.
我还在学习 C 语言,所以我需要时间来理解一些东西。 I have been scratching my head at this for a few hours now and am using this as help after trying everything I could think of!
几个小时以来,我一直在为此挠头,在尝试了我能想到的一切后,我将其用作帮助!
The following is my function for getting the words and storing them in to arrays (it also returns the number of words there are, which is used separately for a different part of the program):以下是我的 function 用于获取单词并将它们存储到 arrays (它还返回单词的数量,分别用于程序的不同部分):
int getWords(int maxSize, FILE* inFile, char strings[][COL_SIZE]){
int numWords;
for(int i = 0; i < maxSize; i++){
fscanf(inFile, "%s", strings[i]);
while(fscanf(inFile, "%s", strings[i] == 10){
numWords++;
}
}
return numWords;
}
Here's the code I have where I call the function in the main function (I am not sure what numbers to set the COL_SIZE and MAX_LENGTH to, like I said, I am new to this and am trying my best to understand 2D arrays and how they work): Here's the code I have where I call the function in the main function (I am not sure what numbers to set the COL_SIZE and MAX_LENGTH to, like I said, I am new to this and am trying my best to understand 2D arrays and how they工作):
#define COL_SIZE 10
#define MAX_LENGTH 500
int main(){
FILE* fp;
char strArray[MAX_LENGTH][COL_SIZE];
fp = fopen(FILE_NAME, "r");
if(fp == NULL){
printf("File could not be found!");
}
else{
getWords(MAX_LENGTH, fp, strArray);
fclose(fp);
}
return 0;
}
What you are not understanding, it that COL_SIZE
must be large enough to store the longest word +1
for the nul-terminating character.您不理解的是,
COL_SIZE
必须足够大以存储nul 终止字符的最长单词+1
。 Take:拿:
R e f r i g e r a t o r . \0
----------------------------
1 2 3 4 5 6 7 8 9 0 1 2 3 4 - > 14 characters of storage required
You declare a 500 x 10 2D array of char
:您声明
char
的 500 x 10 2D 数组:
char strArray[500][10]
"Refrigertator."
cannot fit in strArray
, so what happens is "Refrigerat"
is stored at one row-index, and then "tor.\0"
overwrites the first 5 characters of the next.不适合
strArray
,所以发生的情况是"Refrigerat"
存储在一个行索引中,然后"tor.\0"
覆盖下一个的前 5 个字符。
There are a number of ways to handle the input, but if you want to use fscanf
, then you need (1) to include a field-width modifier with the string conversion to limit the number of characters stored to the amount of storage available, and (2) validate the next character after those you have read is a whitespace character, eg处理输入的方法有很多种,但如果要使用
fscanf
,则需要 (1) 在字符串转换中包含字段宽度修饰符,以将存储的字符数限制为可用存储量, (2) 验证您已阅读的字符之后的下一个字符是空格字符,例如
#include <ctype.h>
int getWords(int maxSize, FILE* inFile, char strings[][COL_SIZE])
{
char c;
int n = 0;
while (n < maxSize) {
int rtn = fscanf (inFile, "%9s%c", strings[n], &c);
if (rtn == 2 && isspace(c))
n++;
else if (rtn == 1) {
n++;
break;
}
else
break;
}
return n;
}
Note the format string contains a field-width modifier of one-less than the total number of characters available, and then the character conversion stores the next character and validates it is whitespace (if it isn't you have a word that is too long to fit in your array)请注意,格式字符串包含一个小于可用字符总数的字段宽度修饰符,然后字符转换存储下一个字符并验证它是空格(如果不是,您有一个太长的单词适合您的阵列)
With any user-input function, you cannot use it correctly unless you check the return .对于任何用户输入的 function,除非您检查 return ,否则您无法正确使用它。 Above, the return from
fscanf()
is saved in rtn
.上面,
fscanf()
的返回值保存在rtn
中。 If you have a successful conversion of both your string limited to COL_SIZE - 1
by your field-width modifier and c
is whitespace, you have a successful read of the word and you are not yet at EOF
.如果您的字段宽度修饰符限制为
COL_SIZE - 1
的字符串成功转换,并且c
是空格,则您已成功读取该单词并且您尚未到达EOF
。 If the return is 1
, you have the successful read of the word and you have reached EOF
(non-POSIX line end on last line).如果返回为
1
,则您已成功读取该单词并且您已到达EOF
(最后一行的非 POSIX 行结束)。 Otherwise, you will either reach the limit of MAX_LENGTH
and exit the loop, or your will reach EOF
and fscanf()
will return EOF
forcing an exit of the loop through the else
clause.否则,您将达到
MAX_LENGTH
的限制并退出循环,或者您将达到EOF
并且fscanf()
将返回EOF
强制通过else
子句退出循环。
Lastly, don't skimp on buffer size.最后,不要吝啬缓冲区大小。 The longest word in the non-medical unabridged dictionary is 29-character, requiring a total of 30 characters storage, so
#define COL_SIZE 32
makes more sense than 10
.非医学未删节词典中最长的单词是 29 个字符,总共需要存储 30 个字符,因此
#define COL_SIZE 32
比10
更有意义。
Look things over and let me know if you have more questions.看看事情,让我知道如果你有更多的问题。
stdio.h Only仅限 stdio.h
If you are limited to stdio.h
, then you can manually confirm that c
contains a whitespace character:如果您仅限于
stdio.h
,那么您可以手动确认c
包含空格字符:
if (rtn == 2 && (c == ' ' || c == '\t' || c == '\n'))
n++;
You probably don't want a traditional 2D array.您可能不想要传统的二维数组。 Those are usually rectangular, which is not well suited to storing variable length words.
这些通常是矩形的,不太适合存储可变长度的单词。 Instead, you would want an array of pointers to buffers, sort of like
argv
is.相反,您需要一个指向缓冲区的指针数组,有点像
argv
。 Since the goal is to load from a file, I suggest using a contiguous buffer rather than allocating a separate one for each word.由于目标是从文件加载,我建议使用连续缓冲区而不是为每个单词分配一个单独的缓冲区。
The general idea is this:总体思路是这样的:
Here's how to load the entire file:以下是加载整个文件的方法:
#include <sys/stat.h>
#include <stdlib.h>
#include <stdio.h>
char *load_file(const char *fname, int *n)
{
struct stat st;
if(stat(fname, &st) == -1 || st.st_size == 0) return NULL;
char *buffer = malloc(st.st_size + 1);
if(buffer == NULL) return NULL;
FILE *file = fopen(fname, "r");
if(file == NULL || fread(buffer, 1, st.st_size, file)) {
free(buffer);
buffer = NULL;
}
fclose(file);
*n = st.st_size;
return buffer;
}
You can count the words by just stepping through the file contents and marking the end of each word.您可以通过单步浏览文件内容并标记每个单词的结尾来计算单词。
#include <ctype.h>
char *skip_nonword(char *text, char *end)
{
while(text != end && !isalpha(*text)) text++;
return text;
}
char *skip_word(char *text, char *end)
{
while(text != end && isalpha(*text)) text++;
return text;
}
int count_words(char *text, int n)
{
char *end = text + n;
int count = 0;
while(text < end) {
text = skip_nonword(text, end);
if(text < end) {
count++;
text = skip_word(text, end);
*text = '\0';
}
}
return count;
}
Now you are in position to allocate the word buffer and fill it in:现在你在 position 分配字缓冲区并填写:
char **list_words(const char *text, int n, int count)
{
char *end = text + n;
char **words = malloc(count * sizeof(char *));
if(words == NULL) return NULL;
for(int i = 0; i < count; i++) {
words[i] = skip_nonword(text, end);
text = skip_word(words[i], end);
}
return words;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.