C 程序對 FILE 中的特定單詞進行計數

Question

該計划的目的是評估一個人的簡歷。 該程序應打開並讀取兩個.txt 類型的文件。 其中一個文件包含關鍵字，另一個是簡歷本身。 該程序的過程包括遍歷keywords.txt，然后嘗試在resume.txt中找到一個相似的詞。 我幾乎可以正常工作，但程序似乎將第一個空格視為關鍵字 FILE 中文件的結尾。 這就是我所擁有的：（我嘗試在關鍵字上切換第一個單詞並且計數似乎有效/如果只掃描沒有符號的字符並且有必要計算每個關鍵字的出現次數）

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

int main(){

    FILE*  txtKey;
    FILE* txtResume;
    char keyWords[1000];
    char word[10000];
    int count;

    txtKey=fopen("keywords.txt", "r");
    if(txtKey == NULL){
        printf("Failed to open txtKey file \n");
        return 1;
    }

    txtResume=fopen("resume.txt", "r");
    if(txtResume == NULL){
        printf("Failed to open txtResume file \n");
        return 1;
    }

    while (fscanf(txtKey, "%s", keyWords) != EOF) 
    { 
        while (fscanf(txtResume, "%s", word) != EOF) 
        { 
            if (strstr(word, keyWords) != NULL) 
            { 
            count++; 
            } 
        } 
    } 
    printf("The keywords were found %d times in your resume!", count);

    fclose(txtResume);
    fclose(txtKey);

    return 0;
}//END MAIN

Answer 1

注意：這是由我的頂級評論開始的。

我創建了一個包含單詞列表的單詞列表struct 。 它被使用兩次。 一次，存儲關鍵字列表。 並且，第二次解析簡歷文件的當前行。

我從頭開始編寫它，因為它與您所擁有的有所不同：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#ifdef DEBUG
#define dbgprt(_fmt...) \
    do { \
        printf(_fmt); \
    } while (0)
#else
#define dbgprt(_fmt...) \
    do { \
    } while (0)
#endif

typedef struct {
    int list_max;
    int list_cnt;
    char **list_words;
} list_t;

list_t keywords;
list_t linewords;

char buf[10000];

int
wordsplit(FILE *xf,list_t *list,int storeflg)
{
    char *cp;
    char *bp;
    int valid;

    if (! storeflg)
        list->list_cnt = 0;

    do {
        cp = fgets(buf,sizeof(buf),xf);

        valid = (cp != NULL);
        if (! valid)
            break;

        bp = buf;
        while (1) {
            cp = strtok(bp," \t\n");
            bp = NULL;

            if (cp == NULL)
                break;

            // grow the list
            if (list->list_cnt >= list->list_max) {
                list->list_max += 100;
                list->list_words = realloc(list->list_words,
                    sizeof(char *) * (list->list_max + 1));
            }

            if (storeflg)
                cp = strdup(cp);

            list->list_words[list->list_cnt++] = cp;
            list->list_words[list->list_cnt] = NULL;
        }
    } while (0);

    return valid;
}

void
listdump(list_t *list,const char *tag)
{
    char **cur;

    dbgprt("DUMP: %s",tag);

    for (cur = list->list_words;  *cur != NULL;  ++cur) {
        dbgprt(" '%s'",*cur);
    }

    dbgprt("\n");
}

int
main(void)
{
    FILE *xf;
    int count;

    xf = fopen("keywords.txt","r");
    if (xf == NULL)
        return 1;
    while (1) {
        if (! wordsplit(xf,&keywords,1))
            break;
    }
    fclose(xf);
    listdump(&keywords,"KEY");

    count = 0;

    xf = fopen("resume.txt","r");
    if (xf == NULL)
        return 2;
    while (1) {
        if (! wordsplit(xf,&linewords,0))
            break;
        listdump(&linewords,"CUR");

        for (char **str = linewords.list_words;  *str != NULL;  ++str) {
            dbgprt("TRYCUR: '%s'\n",*str);
            for (char **key = keywords.list_words;  *key != NULL;  ++key) {
                dbgprt("TRYKEY: '%s'\n",*key);
                if (strcmp(*str,*key) == 0) {
                    count += 1;
                    break;
                }
            }
        }
    }
    fclose(xf);

    printf("keywords found %d times\n",count);

    return 0;
}

更新：

有什么辦法讓它更簡單嗎？ 我不認為我知道這個答案的所有概念，盡管結果是完美的。

是的，根據您的代碼，我意識到我所做的有點先進。 但是，通過像我一樣重用列表，它實際上節省了一些復制代碼（例如，當它們非常相似時，為什么要為關鍵字和恢復數據使用單獨的解析代碼。

所有 libc 函數（例如fgets 、 strtok 、 strcmp ）都有標准文檔。

如果您事先知道關鍵字的 [最大] 數量 [這是可能的]，您可以使用固定大小的char **數組 [類似於您所擁有的]。

或者，您可以對每個新關鍵字（例如realloc ）的char **keywords數組進行重新cp 。 並且，維護一個單獨的計數變量（例如int keycnt ）。 如果我們只需要一個列表（即我們可以放棄list_t struct ），這會很好。

我們可以為main中的第二個循環復制一些關鍵字代碼，並再次為數組及其計數使用不同的變量。

但是，這是浪費。 list_t是有效使用realloc的一個例子（即不經常調用它）。 這是一種標准技術。

如果您對dynamic resize array realloc進行網絡搜索，您會發現其中一個條目是： https://newton.ex.ac.uk/teaching/resources/jmr/appendix-growable.html

請注意使用strdup來保留關鍵字列表的單詞值，而不是下次調用fgets 。

希望這涵蓋了足夠的內容，以便您可以研究一下。 整個“如何使用realloc實現動態調整大小的數組？” 經常出現關於 SO 的問題，因此您也可以在此處搜索有關它的問題。

此外，如果keywords.txt 列表中有以“，”分隔的單詞，它怎么會出現單詞？

要通過“，”解析，只需將第二個參數更改為strtok以包含它（例如" \t,\n" ）。 這將適用於abc def 、 abc,def或abc, def 。

C 程序對 FILE 中的特定單詞進行計數

問題描述

1 個解決方案

解決方案1
0 已采納 2020-04-19 00:07:39

C 程序對 FILE 中的特定單詞進行計數

問題描述

1 個解決方案

解決方案1 0 已采納 2020-04-19 00:07:39

解決方案1
0 已采納 2020-04-19 00:07:39