简体   繁体   English

C读取一个文本文件,该文件用单词大小不受限制的空格分隔

[英]C reading a text file separated by spaces with unbounded word size

I have a text file that contains words (strings) that are separated by spaces. 我有一个文本文件,其中包含用空格分隔的单词(字符串)。 The strings' size aren't bounded, nor is the number of words. 字符串的大小不受限制,字数也不受限制。 What I need to do is to put all the words from the file in a list. 我需要做的是将文件中的所有单词放在列表中。 (Assume the list works fine). (假设列表工作正常)。 I cannot figure out how to overcome the unbounded word size problem. 我无法弄清楚如何克服无限的字长问题。 I have tried this : 我已经试过了:

FILE* f1;
f1 = fopen("file1.txt", "rt");
int a = 1;

char c = fgetc(f1);
while (c != ' '){
    c = fgetc(f1);
    a = a + 1;
}
char * word = " ";
fgets(word, a, f1);
printf("%s", word);
fclose(f1);
getchar();

My text file looks like this: 我的文本文件如下所示:

 this is sparta

Notice that that all I was able to get was the first word, and even that I do improperly because I get the error: 请注意,我所能得到的只是第一个字,甚至我做错了,因为我得到了错误:

Access violation writing location 0x00B36860.

Can someone please help me? 有人可以帮帮我吗?

Taking suggestions from commenters above, this reallocates memory whenever there is not enough, or apparently just enough. 从上面的评论者那里获取建议,只要内存不足或显然足够,它就会重新分配内存。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void fatal(char *msg) {
    printf("%s\n", msg);
    exit (1);
    }

int main() {
    FILE* f1 = NULL;
    char *word = NULL;
    size_t size = 2;
    long fpos = 0;
    char format [32];

    if ((f1 = fopen("file1.txt", "rt")) == NULL)        // open file
        fatal("Failed to open file");
    if ((word = malloc(size)) == NULL)                  // word memory
        fatal("Failed to allocate memory");
    sprintf (format, "%%%us", (unsigned)size-1);        // format for fscanf

    while(fscanf(f1, format, word) == 1) {
        while (strlen(word) >= size-1) {                // is buffer full?
            size *= 2;                                  // double buff size
            printf ("** doubling to %u **\n", (unsigned)size);
            if ((word = realloc(word, size)) == NULL)
                fatal("Failed to reallocate memory");
            sprintf (format, "%%%us", (unsigned)size-1);// new format spec
            fseek(f1, fpos, SEEK_SET);                  // re-read the line
            if (fscanf(f1, format, word) == 0)
                fatal("Failed to re-read file");
        }
        printf ("%s\n", word);
        fpos = ftell(f1);                               // mark file pos
    }

    free(word);
    fclose(f1);
    return(0);
}

Program input 程序输入

this   is  sparta
help 30000000000000000000000000000000000000000
me

Program output: 程序输出:

** doubling to 4 **
** doubling to 8 **
this
is
sparta
help
** doubling to 16 **
** doubling to 32 **
** doubling to 64 **
30000000000000000000000000000000000000000
me

Which platform are you on? 您在哪个平台上?

If you're using a POSIX-ish platform, then consider using getline() to read lines of unbounded size, then one of strcspn() , strpbrk() , strtok_r() , or (if you are really determined to make your code not reusable) strtok() to get the boundaries of the words, and finally use strdup() to create copies of the words. 如果您使用的是POSIX平台,则考虑使用getline()读取无限制大小的行,然后使用strcspn()strpbrk()strtok_r()或(如果您确实确定要编写代码, (不可重用) strtok()获取单词的边界,最后使用strdup()创建单词的副本。 The pointers returned by strdup() will be stored in an array of char * managed via realloc() . strdup()返回的指针将存储在通过realloc()管理的char *数组中。

If you're not on a sufficiently POSIX-ish platform, then you'll need to use fgets() with checking to find whether you actually read a whole line — using realloc() to allocate more space if your initial line isn't long enough. 如果您没有使用足够的POSIX平台,则需要使用fgets()进行检查以查看您是否实际读取了整行-如果您的起始行不在,请使用realloc()分配更多空间足够长了。 Once you've got a line, you can then split it up as before. 一旦有了一行,就可以像以前一样拆分它。

You could mess around with POSIX getdelim() except it only takes a single delimiter and you probably want spaces and newlines to mark the ends of words (and possibly tabs too), which it won't handle. 您可能会getdelim()于POSIX getdelim() ,只需要一个定界符,并且您可能希望用空格和换行符来标记单词的结尾(可能还有制表符),而这是无法处理的。

And, again if you're on a sufficiently modern POSIX system, you can consider using the m modifier to scanf() : 而且,如果您使用的是足够现代的POSIX系统,则可以考虑使用m修饰符对scanf()

char *word = 0;

while (scanf("%ms", &word) == 1)
    …store word in your list…

This is even simpler when it is available. 如果可用,这甚至更简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM