意外的輸出-在c中存儲到2D數組中

Question

我正在從許多文件中讀取數據，每個文件都包含一個單詞列表。 我試圖顯示每個文件中的單詞數，但是遇到了問題。 例如，當我運行代碼時，收到如下所示的輸出。

除了兩個文件（每個文件包含成千上萬的字數）外，幾乎所有金額都可以正確顯示。 每個其他文件僅包含三位數的單詞，它們看起來還不錯。

我只能猜測這個問題可能是什么（某處沒有足夠的空間分配？），我也不知道如何解決。 如果這句話措辭不好，我深表歉意。 我的大腦炸了，我在掙扎。 任何幫助，將不勝感激。

我試圖使示例代碼盡可能簡短。 我省去了很多錯誤檢查和其他與整個程序有關的任務。 我還在可能的地方添加了評論。 謝謝。

StopWords.c

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <stddef.h>
#include <string.h>

typedef struct
{
    char stopwords[2000][60];
    int wordcount;
} LangData;

typedef struct
{
    int languageCount;
    LangData languages[];
} AllData;


main(int argc, char **argv)
{
    //Initialize data structures and open path directory
    int langCount = 0;
    DIR *d;
    struct dirent *ep;
    d = opendir(argv[1]);

    //Count the number of language files in the directory
    while(readdir(d))
        langCount++;

    //Account for "." and ".." in directory
    //langCount = langCount - 2 THIS MAKES SENSE RIGHT?
    langCount = langCount + 1; //The program crashes if I don't do this, which doesn't make sense to me.

    //Allocate space in AllData for languageCount
    AllData *data = malloc(sizeof(AllData) + sizeof(LangData)*langCount); //Unsure? Seems to work.

    //Reset the directory in preparation for reading data
    rewinddir(d);

    //Copy all words into respective arrays.
    char word[60];
    int i = 0;
    int k = 0;
    int j = 0;
    while((ep = readdir(d)) != NULL) //Probably could've used for loops to make this cleaner. Oh well.
    {
        if (!strcmp(ep->d_name, ".") || !strcmp(ep->d_name, ".."))
        {
            //Filtering "." and ".."
        }
        else
        {
            FILE *entry;

            //Get string for path (i should make this a function)
            char fullpath[100];
            strcpy(fullpath, path);
            strcat(fullpath, "\\");
            strcat(fullpath, ep->d_name);

            entry = fopen(fullpath, "r");

            //Read all words from file
            while(fgets(word, 60, entry) != NULL)
            {
                j = 0;

                //Store each word one character at a time (better way?) 
                while(word[j] != '\0') //Check for end of word
                {
                    data->languages[i].stopwords[k][j] = word[j];
                    j++; //Move onto next character
                }
                k++; //Move onto next word
                data->languages[i].wordcount++;
            }

            //Display number of words in file
            printf("%d\n", data->languages[i].wordcount);
            i++; Increment index in preparation for next language file.

            fclose(entry);
        }
    }
}

輸出量

256 //czech.txt: Correct
101 //danish.txt: Correct
101 //dutch.txt: Correct
547 //english.txt: Correct
1835363006 //finnish.txt: Should be 1337. Of course it's 1337.
436 //french.txt: Correct
576 //german.txt: Correct
737 //hungarian.txt: Correct
683853 //icelandic.txt: Should be 1000.
399 //italian.txt: Correct
172 //norwegian.txt: Correct
269 //polish.txt: Correct
437 //portugese.txt: Correct
282 //romanian.txt: Correct
472 //spanish.txt: Correct
386 //swedish.txt: Correct
209 //turkish.txt: Correct

Answer 1

文件中的單詞超過2000個嗎？ 您僅分配了2000個單詞的空間，因此一旦程序嘗試在單詞2001上進行復制，它將在為該數組分配的內存之外執行該操作，可能會分配到為“ wordcount”分配的空間中。

我還要指出的是，fgets返回一個字符串到行尾或最多n個字符（在您的情況下為60個字符），以先到者為准。 如果正在讀取的文件中每行只有一個單詞，這將起作用，否則將不得不在字符串中定位空格並從那里計算單詞。

如果您只是想獲取單詞數，則無需首先將所有單詞存儲在數組中。 假設每行一個單詞，則以下內容也應同樣有效：

 char word[60];
 while(fgets(word, 60, entry) != NULL)
        {
            data->languages[i].wordcount++;
        }

與fgets參比http://www.cplusplus.com/reference/cstdio/

更新我又看了一眼，您可能想要按以下方式嘗試分配數據：

typedef struct
{
    char stopwords[2000][60];
    int wordcount;
} LangData;

typedef struct
{
    int languageCount;
    LangData *languages;
} AllData;

AllData *data = malloc(sizeof(AllData));
data->languages = malloc(sizeof(LangData)*langCount);

這樣，將為語言數組專門分配內存。

我同意langCount = langCount-2是有意義的。 你遇到了什么錯誤？

意外的輸出-在c中存儲到2D數組中

問題描述

1 個解決方案

解決方案1
0 2016-04-10 21:54:46

意外的輸出-在c中存儲到2D數組中

問題描述

1 個解決方案

解決方案1 0 2016-04-10 21:54:46

解決方案1
0 2016-04-10 21:54:46