简体   繁体   English

尝试从文件中读取单词列表并将其存储到C语言的数组中

[英]Trying to read list of words from file and store into an array in C

my goal is to read my text file "dictionary.txt" line by line and save all the words in it to my array called words for now and would like to print it to make sure the array contains everything but I'm pretty new to C and not sure how to go about it. 我的目标是逐行读取我的文本文件“ dictionary.txt”,并将其中的所有单词保存到名为“ words”的数组中,现在想将其打印以确保数组包含所有内容,但我对C,不确定如何去做。 The program just crashes when I try to run it and not sure why. 当我尝试运行该程序时,它崩溃了,不知道为什么。
(it is a rather large text file that contains 149256 words with the longest word being 20 characters) (这是一个相当大的文本文件,包含149256个单词,最长的单词为20个字符)

edit: I would like to have the array dynamically allocated, not sure if this is how to do it. 编辑:我想动态分配数组,不确定这是否是怎么做的。

#include <stdio.h>
#include <stdlib.h>
#define listlength 149256
#define wordslength 21


char** getwords(int rows, int col);
void freeArray(char** words, int rows);

int main(){

    int i,j, numCases;
    char** words = getwords(listlength, wordslength);
    //test to see if the words array is saving correctly

    for(i=0;i<20;i++){
        printf("%s", words[i]);
    }

    //Get number of cases.
    //printf("enter number of cases:\n");
    //scanf("%d", &numCases);
    //Process each case.

    freeArray(words, listlength);

}


char** getwords(int rows, int col){

    //allocate top level of pointers.

    char** words = malloc(sizeof(char*)*rows);
    int i;
    FILE *dictionary;

    //allocate each individual array
    for(i=0; i<rows; i++){
        words[i] = malloc(sizeof(char)*col);
    }
        //read dictionary.txt
    for(i=0; i<rows; i++){
        FILE *dictionary = fopen("dictionary.txt", "r");
        fgets(words[i],wordslength,dictionary);
    }

    fclose(dictionary);
    return words;
}

void freeArray(char** words, int rows){

    int i;
    for(i=0; i<rows; i++){
        free(words[i]);
    }
    free(words);
}

Ok I think that instead of listing all the errors in here ;), I will just rewrite the getwords function for you and hopefully teach you along the way. 好吧,我认为getwords代替在这里列出所有错误, getwords我将为您重写getwords函数,并希望在getwords对您getwords帮助。 Note that I am making some assumptions here. 请注意,我在这里做一些假设。 I assume that the file has one word per line and the maximum length is the cols parameter. 我假设文件每行只有一个单词,最大长度是cols参数。 To start, I would change the parameter name to maxWordLen instead of cols (this is clearer) and getwords to getWords (this is convention). 首先,我将参数名称更改为maxWordLen而不是cols(这更清楚),将getwords更改为getWords(这是约定)。 Making the function signature like so: 使函数签名如下所示:

char** getWords(int rows, int maxWordLen)

You can straight up get rid of these two lines: 您可以直接摆脱这两行:

int i;
FILE *dictionary;

For allocating, you need to include space for the null character at the end of every string. 为了进行分配,您需要在每个字符串的末尾包含用于空字符的空间。

//   VVV put declaration here (on others as well)
for (int i = 0; i < rows; i++) {
    words[i] = malloc(sizeof(char) * (maxWordLen + 1));
    //                                          ^^^^
}

DO NOT OPEN THE FILE MULTIPLE TIMES!!! 不要多次打开文件!!! Your code: 您的代码:

for(i=0; i<rows; i++){
    FILE *dictionary = fopen("dictionary.txt", "r");
    fgets(words[i],wordslength,dictionary);
}

Not only will it not work, because it starts at the top of the file every time, is bad practice and is very memory inefficient. 它不仅不起作用,因为它每次都在文件的开头启动,这是错误的做法,并且内存效率非常低。 Do this instead: 改为这样做:

FILE* dictionary = fopen("dictionary.txt", "r");

for (int i = 0; i < rows; i++) {
    fgets(words[i], maxWordLen + 1, dictionary);
}

The last two line are good just finish up with closing the file and return words . 最后两行很好,只需关​​闭文件并返回words Whew! 呼! Here's a condensed code snippet of all that ;): 这是所有这些的简明代码片段;):

char** getWords(int rows, int maxWordLen) {
    char** words = malloc(sizeof(char*) * rows);

    for (int i = 0; i < rows; i++) {
        words[i] = malloc(sizeof(char) * (maxWordLen + 1));
    }

    FILE* dictionary = fopen("dictionary.txt", "r");

    for (int i = 0; i < rows; i++) {
        fgets(words[i], maxWordLen + 1, dictionary);
    }

    fclose(dictionary);

    return words
}

Now I haven't tested this code, so it might have some typos, but hopefully this helps! 现在我还没有测试该代码,因此它可能会有一些错别字,但希望这会有所帮助!

You were having a bit of difficulty determining what is important to pass to getwords . 您在确定传递给getwords重要内容方面有些困难。 While you can embed/hardcode a filename in the function, that really defeats the purpose of creating a flexible re-usable routing. 您可以在函数中嵌入/硬编码文件名,但这实际上违反了创建灵活的可重用路由的目的。 When you think about what the functions needs, it needs (1) a FILE* stream to read from; 当您考虑功能需要什么时,它需要(1)要读取的FILE*流; (2) a way to return the number of words read into your pointer-to-pointers to strings; (2)一种将读入指针的单词数返回字符串的方法; and (3) it must return the pointer. (3)必须返回指针。 That way you get back, your newly allocated list of words and know how many there are. 这样,您就可以得到新分配的words列表,并知道有多少个words

Your use of fgets was a bit awkward. 您对fgets使用有点尴尬。 Since you have defined the wordslength as 21 , you can simply statically declare a buffer (say buf ) of wordslength + 1 to use with fgets and then allocate/copy to words[i] . 由于您已将wordslength定义为21 ,因此您可以简单地静态声明一个wordslength + 1的缓冲区(例如buf ),以与fgets一起使用,然后将其分配/复制给words[i] This allows you to insure you have a valid string in buf before you allocate memory. 这样可以确保在分配内存之前, buf具有有效的字符串。

Lastly, there is a realloc function that makes is unnecessary to allocate all 149256 pointers at once. 最后,有一个realloc函数,使您不必一次分配所有149256指针。 (if you know that is how many you will have, that's fine) As a general rule, start with some reasonable expected amount and then realloc additional pointers when your limit is reached and keep going. (如果你知道这你有多少,这是罚款)作为一般规则,先从一些合理预期的量,然后realloc是达到限额时额外的指针和继续下去。

Here is a quick rewrite putting the pieces together: 这是将各个部分组合在一起的快速重写:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define listlength 256
#define wordslength 21

char **getwords (FILE *fp, int *n);
void free_array (char** words, int rows);

int main (int argc, char **argv) {

    int i, nwords = 0;
    char **words = NULL;  /* file given as argv[1] (default dictionary.txt) */
    char *fname = argc > 1 ? argv[1] : "dictionary.txt";
    FILE *dictionary = fopen (fname, "r");

    if (!dictionary) { /* validate file open */
        fprintf (stderr, "error: file open failed.\n");
        return 1;
    }

    if (!(words = getwords (dictionary, &nwords))) {
        fprintf (stderr, "error: getwords returned NULL.\n");
        return 1;
    }
    fclose(dictionary);

    printf ("\n '%d' words read from '%s'\n\n", nwords, fname);

    for (i = 0; i < nwords; i++) {
        printf ("%s\n", words[i]);
    }

    free_array (words, nwords);

    return 0;
}

/* read all words 1 per-line, from 'fp', return
 * pointer-to-pointers of allocated strings on 
 * success, NULL otherwise, 'n' updated with 
 * number of words read.
 */
char **getwords (FILE *fp, int *n) {

    char **words = NULL;
    char buf[wordslength + 1] = {0};
    int maxlen = listlength > 0 ? listlength : 1;

    if (!(words = calloc (maxlen, sizeof *words))) {
        fprintf (stderr, "getwords() error: virtual memory exhausted.\n");
        return NULL;
    }

    while (fgets (buf, wordslength + 1, fp)) {

        size_t wordlen = strlen (buf);  /* get word length */

        if (buf[wordlen - 1] == '\n')   /* strip '\n' */
            buf[--wordlen] = 0;

        words[(*n)++] = strdup (buf);   /* allocate/copy */

        if (*n == maxlen) { /* realloc as required, update maxlen */
            void *tmp = realloc (words, maxlen * 2 * sizeof *words);
            if (!tmp) {
                fprintf (stderr, "getwords() realloc: memory exhausted.\n");
                return words; /* to return existing words before failure */
            }
            words = tmp;
            memset (words + maxlen, 0, maxlen * sizeof *words);
            maxlen *= 2;
        }
    }

    return words;
}

void free_array (char **words, int rows){

    int i;
    for (i = 0; i < rows; i++){
        free (words[i]);
    }
    free(words);
}

Example Use/Output 使用/输出示例

$ ./bin/dict ../dat/10int_nl.txt

 '10' words read from '../dat/10int_nl.txt'

8572
-2213
6434
16330
3034
12346
4855
16985
11250
1495

Memory Error Check 内存错误检查

In any code your write that dynamically allocates memory, you have 2 responsibilites regarding any block of memory allocated: (1) always preserves a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. 在您的任何动态分配内存的代码中,您对分配的任何内存块都有2种责任:(1)始终保留指向内存块起始地址的指针,因此,(2)在没有内存块的情况下可以将其释放需要更长的时间。

It is imperative that you use a memory error checking program to insure you haven't written beyond/outside your allocated block of memory, attempted to read or base a jump on an unintitialized value and finally to confirm that you have freed all the memory you have allocated. 必须使用一个内存错误检查程序来确保您没有在所分配的内存块之外/之外进行写操作,试图读取或基于一个未初始化的值进行跳转,最后确认您已释放了所有内存已分配。

For Linux valgrind is the normal choice. 对于Linux, valgrind是通常的选择。 There are many subtle ways to misuse a new block of memory. 有许多微妙的方法来滥用新的内存块。 Using a memory error checker allows you to identify any problems and validate proper use of of the memory you allocate rather than finding out a problem exists through a segfault . 使用一个内存错误检查器允许您识别任何问题并确认正确使用的内存你分配,而不是找出问题通过一个存在segfault There are similar memory checkers for every platform. 每个平台都有类似的内存检查器。 They are all simple to use, just run your program through it. 它们都很容易使用,只需通过它运行程序即可。

$ valgrind ./bin/dict ../dat/10int_nl.txt
==10212== Memcheck, a memory error detector
==10212== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==10212== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==10212== Command: ./bin/dict ../dat/10int_nl.txt
==10212==

 '10' words read from '../dat/10int_nl.txt'

8572
-2213
<snip>
11250
1495
==10212==
==10212== HEAP SUMMARY:
==10212==     in use at exit: 0 bytes in 0 blocks
==10212==   total heap usage: 15 allocs, 15 frees, 863 bytes allocated
==10212==
==10212== All heap blocks were freed -- no leaks are possible
==10212==
==10212== For counts of detected and suppressed errors, rerun with: -v
==10212== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

Always confirm All heap blocks were freed -- no leaks are possible and equally important ERROR SUMMARY: 0 errors from 0 contexts . 始终确认所有堆块都已释放-不可能发生泄漏,并且同等重要。 错误摘要:0个上下文中的0个错误

Note on strdup 注意strdup

Since strdup allocates memory (as well as copies the given string), you should check the return just as you would with malloc or calloc to protect against memory exhaustion. 由于strdup分配了内存(并复制了给定的字符串),因此应像检查malloccalloc一样检查返回值,以防止内存耗尽。 eg: 例如:

    if (!(words[(*n)++] = strdup (buf))) {
        fprintf (stderr, "getwords() error: virtual memory exhausted.\n");
        return NULL;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM