简体   繁体   English

C-如何将文本文件中的单词读入字符串数组

[英]C-How to read words from a text file into an array of strings

I need to write a program that generates a table mapping words onto the number of times the word appears in a text file. 我需要编写一个程序,生成一个将单词映射到单词在文本文件中出现的次数的表。 So far my code looks like this 到目前为止,我的代码看起来像这样

#include <stdlib.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>
struct entry
{
  char* word;
  unsigned int n;
  struct entry *left;
  struct entry *right;
};

struct entry* 
insert(struct entry *table, char *str)
{
  if(table==NULL){
    table = (struct entry*)malloc(sizeof(struct entry));
    table->word = str;
    table->n = 1;
    table->left = NULL;
    table->right = NULL;
  }else if(strcmp(table->word,str)==0){
    table->n=(table->n)+1;
  }else if(strcmp(table->word,str)==1){
    table->left=insert(table->left,str);
  }else{
    table->right = insert(table->right,str);
  }
  return table;
}

void
print_table(struct entry *table)
{
  if(!(table==NULL)){
    print_table(table->left);
    fprintf(stdout,"%s\t %d\n",table->word,table->n);
    print_table(table->right);
  }
}

int
main(int argc, const char *argv[])
{

  struct entry* table = NULL;
  char *str = "foo";
  table =  insert(table,str);
  str = "foo";
  table = insert(table,str); 
  print_table(table);

  return 0;

}

which gives an out put of 这给出了一个结果

foo 2

what I need to do is do this exact thing with an input file. 我需要做的是使用输入文件来执行此操作。 My idea is to take every word of the text file which will look like 我的想法是将文本文件中的每个单词都看起来像

This is an example of 
what the text file
will look like.

I have no idea what the exact number of lines or words per line are. 我不知道每行的确切行数或字数是多少。 As I was saying, my idea was to take every word from the text file and put it into an array of strings, then run my insert function through every element in the array, I just have no idea how I should go about taking each word and putting it in the array. 就像我说的那样,我的想法是将文本文件中的每个单词放入字符串数组中,然后对数组中的每个元素运行我的插入函数,我只是不知道应该如何使用每个单词并将其放入数组。 Any suggestions are welcome and appreciated. 任何建议都值得欢迎和赞赏。

If you want to store every word of the following paragraph 如果要存储以下段落的每个单词

This is an example of 
what the text file
will look like.

The following will work: 以下将起作用:

while(true){
    while(inFile >> yourword){
    //store yourword here
    }
    getline(inFile, yourword); //discards the newline
    if(/*some_conditional_to_break*/)
    break;
}

Portability bug 可移植性错误

Note that this use of strcmp() is wrong: 请注意,这种对strcmp()是错误的:

}else if(strcmp(table->word,str)==1){

The definition of strcmp() is that it returns a value less than zero, equal to zero, or greater than zero. strcmp()的定义是它返回小于零,等于零或大于零的值。 No mention of 1 . 没有提及1

Always, but always, compare with 0: 总是但总是与0比较:

  • if (strcmp(word, str) == 0)word equal to str if (strcmp(word, str) == 0) —等于str word
  • if (strcmp(word, str) != 0)word not equal to str if (strcmp(word, str) != 0) - word不等于str
  • if (strcmp(word, str) <= 0)word less than or equal to str if (strcmp(word, str) <= 0) —小于或等于str word
  • if (strcmp(word, str) >= 0)word greater than or equal to str if (strcmp(word, str) >= 0) —大于或等于str word
  • if (strcmp(word, str) < 0)word less than str if (strcmp(word, str) < 0) —小于str word
  • if (strcmp(word, str) > 0)word greater than str if (strcmp(word, str) > 0) —大于str word

In many implementations, the return value from strcmp() is the numeric difference between the characters that differ, and can be much larger or smaller than 1. 在许多实现中, strcmp()的返回值是不同字符之间的数值差,并且可以大于或小于1。

Reading words 阅读单词

If you're reasonably sure your input won't be wholly insane, you can use a variant on this loop to read the data: 如果您有理由确定输入不会完全疯狂,则可以在此循环上使用一个变体来读取数据:

char buffer[4096];
while (fscanf(fp, "%4095s", buffer) == 1)
{
    char *word = strdup(buffer);
    table = insert(table, word);
}

This reads words up to 4 KiB long and stores each one in your table using your function. 这将读取最多4 KiB的单词,并使用您的函数将每个单词存储在表格中。 If a word is exactly 4 KiB long or longer than that, it will be split into pieces. 如果一个单词的长度恰好等于或大于4 KiB,它将被拆分成多个部分。 It probably won't be a problem. 可能不会有问题。 Note that the scanf() family treats blanks, tabs and newlines as the separators between words. 请注意, scanf()系列将空白,制表符和换行符视为单词之间的分隔符。 Writing "az" in a file gets treated as one word, double quotes, dash and all. 在文件中写入"az"将被视为一个单词,双引号,破折号和全部。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM