简体   繁体   中英

How to read words from unstructured .txt file and store each word in a char array in C?

I have a text file with random words stored in an unstructured way. (Unstructured meaning random spaces and blank lines) - eg of the text file:

file.txt

word1 word2              word3 
         word4 
                        word5

     word6 

I'm want to read each of these words into a char array. I tried the following:

FILE *fp 

fp = fopen("file.txt","r")


int numWords =0;
char *arr = malloc(sizeof(char *));
while(!feof(fp)){
    fscanf(fp, "%s", arr);
    numWords++; 
}

fclose(fp);

For some reason, I can't access each word from the array. ie I'm expecting printf("%s", arr[0]) to return word1 , etc. However, arr[0] stores a character, in this case w .

There is also another problem. I put a printf statement in the while loop and it prints the last word, word6 twice, meaning the loop is executed an extra time at the end for some reason.

If someone could help me on how to achieve this objective that would be much appreciated, thanks!

Your code simply has undefined behavior, so it's impossible to reason about until you remove it.

The allocation allocates room for a single char * pointer, which means typically 8 or 4 bytes. That's all. There's no room to save a lot of word data in there. C won't automatically append to the array or anything like that, you need to deal with the allocation of every byte of storage that you need. When you go ahead and write outside your allocated space, you get the undefined behavior.

To store words like this, you might want to implement a dynamic pointer array. That will deal with storing any number of pointers; the pointers (words) themselves will need to be separately allocated on the heap before being added to the array. This is quite a lot of code.

If you're willing to live with some static limitations (on word length and word count), you can of course do:

char words[1000][30];

That'll give you space for 1000 words of at most 30 characters each. You might want to think about de-duplicating the data, ie checking if a word is already stored before storing it again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM