简体   繁体   English

如何从非结构化的.txt文件中读取单词并将每个单词存储在C中的char数组中?

[英]How to read words from unstructured .txt file and store each word in a char array in C?

I have a text file with random words stored in an unstructured way. 我有一个文本文件,其中随机单词以非结构化方式存储。 (Unstructured meaning random spaces and blank lines) - eg of the text file: (非结构化表示随机空格和空白行)-例如文本文件:

file.txt file.txt

word1 word2              word3 
         word4 
                        word5

     word6 

I'm want to read each of these words into a char array. 我想将每个单词读入一个char数组。 I tried the following: 我尝试了以下方法:

FILE *fp 

fp = fopen("file.txt","r")


int numWords =0;
char *arr = malloc(sizeof(char *));
while(!feof(fp)){
    fscanf(fp, "%s", arr);
    numWords++; 
}

fclose(fp);

For some reason, I can't access each word from the array. 由于某种原因,我无法访问数组中的每个单词。 ie I'm expecting printf("%s", arr[0]) to return word1 , etc. However, arr[0] stores a character, in this case w . 即,我期望printf("%s", arr[0])返回word1等。但是, arr[0]存储一个字符,在这种情况下为w

There is also another problem. 还有另一个问题。 I put a printf statement in the while loop and it prints the last word, word6 twice, meaning the loop is executed an extra time at the end for some reason. 我在while循环中放入了一个printf语句,它会将最后一个单词word6两次打印,这意味着该循环由于某种原因在结束时会额外执行一次。

If someone could help me on how to achieve this objective that would be much appreciated, thanks! 如果有人能帮助我实现这一目标,将不胜感激!

Your code simply has undefined behavior, so it's impossible to reason about until you remove it. 您的代码仅具有未定义的行为,因此除非将其删除,否则无法进行推理。

The allocation allocates room for a single char * pointer, which means typically 8 or 4 bytes. 该分配为单个char *指针分配空间,这通常意味着8或4个字节。 That's all. 就这样。 There's no room to save a lot of word data in there. 没有空间在其中保存大量的单词数据。 C won't automatically append to the array or anything like that, you need to deal with the allocation of every byte of storage that you need. C不会自动追加到数组或类似的东西,您需要处理所需的每个存储字节的分配。 When you go ahead and write outside your allocated space, you get the undefined behavior. 当您继续在分配的空间之外书写时,您将得到未定义的行为。

To store words like this, you might want to implement a dynamic pointer array. 要存储这样的单词,您可能需要实现动态指针数组。 That will deal with storing any number of pointers; 那将处理存储任意数量的指针; the pointers (words) themselves will need to be separately allocated on the heap before being added to the array. 指针(单词)本身需要先添加到堆上,然后再添加到数组中。 This is quite a lot of code. 这是很多代码。

If you're willing to live with some static limitations (on word length and word count), you can of course do: 如果您愿意忍受一些静态限制(在字长和字数方面),则可以这样做:

char words[1000][30];

That'll give you space for 1000 words of at most 30 characters each. 这将为您提供1000个单词的空间,每个单词最多30个字符。 You might want to think about de-duplicating the data, ie checking if a word is already stored before storing it again. 您可能需要考虑对数据进行重复数据删除,即在再次存储单词之前检查单词是否已经存储。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM