簡體   English   中英

如何使用fgetc()計算唯一字數,然后在C中打印計數

[英]How to count unique number of words using fgetc() then printing the count in C

我問了一個與此程序有關的問題,但經過大量研究和反復討論,我再次陷入困境。

我正在嘗試編寫一個程序,該程序將接受用戶輸入並存儲它,然后打印出所有唯一單詞以及它們各自出現的次數

例如

Please enter something: Hello#@Hello# hs,,,he,,whywhyto[then the user hits enter] 

hello 2 
hs 1 
he 1 
whywhyto 1 

上面應該是輸出,當然,為什么不是單詞,但是在這種情況下並不重要,因為我假設任何字母的模式都由非字母分隔(空格,0-9,#$ (@等)被認為是一個單詞,我需要使用2D數組,因為我無法使用鏈表,也無法理解它們。

這是我到目前為止所擁有的

#include <stdio.h> 
#include <ctype.h> 

int main() 
{ 
char array[64]; 

int i=0, j, input; 

printf("Please enter an input:"); 


input=fgetc(stdin); 

while(input != '\n')
{ 
if(isalpha(input)) 
{ 


array[i]=input; 
i++; 
} 

input=fgetc(stdin); 
} 

for(j=0;j<i;j++) 
{ 
// printf("%c ",j,array[j]); 
printf("%c",array[j]); 
} 
printf("\n"); 
} 

我正在使用isalpha來僅獲取字母,但是所有這些操作是它擺脫了不是字母的任何東西,將其存儲然后打印回去,但是我不知道如何讓它第一次將單詞存儲一次出現,然后僅增加每個單詞的計數。 我只能使用至少對我來說很難的fgetc(),我只有大約3-4個月的C經驗,我知道我將不得不使用2維數組,已經在閱讀它們,但是我無法要了解我將如何實施它們,請給我一些幫助。

不知道這是否是家庭作業,所以我沒有為您做任何事情,我還整理了一點代碼。 但是,如果您不了解此人可以輸入多少個單詞,幾乎就需要一個動態數據結構,例如鏈表

#include <stdio.h>
#include <string.h>
#include <ctype.h> 

typedef struct linkedlist linkedlist;
struct linkedlist{
    char *word;
    int count;
    linkedlist *next;
};

int main() 
{ 
    //know your bounds, this will cause trouble if word is longer than 64 chars
    char array[64]; 
    int i=0, input;
    linkedlist *head = NULL;

    printf("Please enter an input:"); 

    while((input=fgetc(stdin)) != '\n')
    { 
        if(isalpha(input) && i!=63) //added this so that code does not brake (word is 64 chars)
        { 
            array[i]=input; 
        }
        else{
            array[i]='\0';
            char *word = malloc(strlen(array)+1);
            strcpy(word, array);
            add_word(word, &head);
            i=0; //need to restart i to keep reading words
        }

        i++;
    } 

    //print out final results
    for(linkedlist *temp = head; temp != NULL; temp = temp->next){
        printf("%s %d ", temp->word, temp->count);
    }
}

//adds word to end of list if does not exist
//increments word count if it exists
void add_word(char *word, linkedlist **ll){
    //implement this
}

//frees resources used by malloc (lookup how to free a linkedlist/destroy a linked list
//make sure to free both final and head in main
void destroy_list(linkedlist **ll){
    //implement this
}

對於add_word,您將需要一些類似於(PSEUDO-CODE)的內容:

list = *ll
if(list == NULL): //new list
    *ll = malloc(sizeof(linkedlist))
    ll->word = word
    ll->count = 1
    ll->next = NULL
    return

while list->next != null:
    if word = list->word:
        free(word)
        list->count++
        return
    list = list->next

if list->word = word: //last word in list
    free(word)
    list->count++
else: //word did not exist, add new word to end of list
    temp = malloc(sizeof(linkedlist))
    temp->word = word
    temp->count = 1
    list->next = temp

也許不是最有效的方法,但是您可以改進它。希望我不會進一步混淆您,祝您好運

這是似乎有效的代碼:

#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

enum { MAX_WORDS = 64, MAX_WORD_LEN = 20 };

int main(void)
{
    char words[MAX_WORDS][MAX_WORD_LEN];
    int  count[MAX_WORDS] = { 0 };
    int w = 0;
    char word[MAX_WORD_LEN];
    int c;
    int l = 0;

    while ((c = getchar()) != EOF)
    {
        if (isalpha(c))
        {
            if (l < MAX_WORD_LEN - 1)
               word[l++] = c;
            else
            {
                fprintf(stderr, "Word too long: %*s%c...\n", l, word, c);
                break;
            }
        }
        else if (l > 0)
        {
            word[l] = '\0';
            printf("Found word <<%s>>\n", word);
            assert(strlen(word) < MAX_WORD_LEN);
            int found = 0;
            for (int i = 0; i < w; i++)
            {
                if (strcmp(word, words[i]) == 0)
                {
                    count[i]++;
                    found = 1;
                    break;
                }
            }
            if (!found)
            {
                if (w >= MAX_WORDS)
                {
                    fprintf(stderr, "Too many distinct words (%s)\n", word);
                    break;
                }
                strcpy(words[w], word);
                count[w++] = 1;
            }
            l = 0;
        }
    }

    for (int i = 0; i < w; i++)
        printf("%3d: %s\n", count[i], words[i]);

    return 0;
}

樣本輸出:

$ ./wordfreq <<< "I think, therefore I am, I think, or maybe I do not think after all, and therefore I am not."
Found word <<I>>
Found word <<think>>
Found word <<therefore>>
Found word <<I>>
Found word <<am>>
Found word <<I>>
Found word <<think>>
Found word <<or>>
Found word <<maybe>>
Found word <<I>>
Found word <<do>>
Found word <<not>>
Found word <<think>>
Found word <<after>>
Found word <<all>>
Found word <<and>>
Found word <<therefore>>
Found word <<I>>
Found word <<am>>
Found word <<not>>
  5: I
  3: think
  2: therefore
  2: am
  1: or
  1: maybe
  1: do
  2: not
  1: after
  1: all
  1: and
$ ./wordfreq <<< "I think thereforeIamIthinkormaybeI do not think after all, and therefore I am not."
Found word <<I>>
Found word <<think>>
Word too long: thereforeIamIthinkor...
  1: I
  1: think
$ ./wordfreq <<< "a b c d e f g h i j k l m n o p q r s t u v w x y z
>                 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
>                 aa ab ac ad ae af ag ah ai aj ak al am
>                 an ao ap aq ar as at au av aw ax ay az
>                "
Found word <<a>>
Found word <<b>>
Found word <<c>>
Found word <<d>>
Found word <<e>>
Found word <<f>>
Found word <<g>>
Found word <<h>>
Found word <<i>>
Found word <<j>>
Found word <<k>>
Found word <<l>>
Found word <<m>>
Found word <<n>>
Found word <<o>>
Found word <<p>>
Found word <<q>>
Found word <<r>>
Found word <<s>>
Found word <<t>>
Found word <<u>>
Found word <<v>>
Found word <<w>>
Found word <<x>>
Found word <<y>>
Found word <<z>>
Found word <<A>>
Found word <<B>>
Found word <<C>>
Found word <<D>>
Found word <<E>>
Found word <<F>>
Found word <<G>>
Found word <<H>>
Found word <<I>>
Found word <<J>>
Found word <<K>>
Found word <<L>>
Found word <<M>>
Found word <<N>>
Found word <<O>>
Found word <<P>>
Found word <<Q>>
Found word <<R>>
Found word <<S>>
Found word <<T>>
Found word <<U>>
Found word <<V>>
Found word <<W>>
Found word <<X>>
Found word <<Y>>
Found word <<Z>>
Found word <<aa>>
Found word <<ab>>
Found word <<ac>>
Found word <<ad>>
Found word <<ae>>
Found word <<af>>
Found word <<ag>>
Found word <<ah>>
Found word <<ai>>
Found word <<aj>>
Found word <<ak>>
Found word <<al>>
Found word <<am>>
Too many distinct words (am)
  1: a
  1: b
  1: c
  1: d
  1: e
  1: f
  1: g
  1: h
  1: i
  1: j
  1: k
  1: l
  1: m
  1: n
  1: o
  1: p
  1: q
  1: r
  1: s
  1: t
  1: u
  1: v
  1: w
  1: x
  1: y
  1: z
  1: A
  1: B
  1: C
  1: D
  1: E
  1: F
  1: G
  1: H
  1: I
  1: J
  1: K
  1: L
  1: M
  1: N
  1: O
  1: P
  1: Q
  1: R
  1: S
  1: T
  1: U
  1: V
  1: W
  1: X
  1: Y
  1: Z
  1: aa
  1: ab
  1: ac
  1: ad
  1: ae
  1: af
  1: ag
  1: ah
  1: ai
  1: aj
  1: ak
  1: al
$

對“單詞太長”和“單詞太多”的測試有助於使我確信代碼是正確的。 設計這樣的測試是一種好習慣。

OP仍有大量工作要做。

這個技巧是1)讀取輸入2)識別定界符3)將單詞與整個緩沖區進行比較,以及4)僅將它們打印一次。

這種方法是高效的內存,因為它只使用了OP建議的64個char緩沖區。 搜索復雜度為O(n * n)

#include <ctype.h>
#include <stdio.h>
#include <string.h>

// Helper function to find word occurrences.
void Print_count(const char *word, const char *array, int i) {
  int count = 0;
  const char *found;
  for (int j = 0; j < i; j++) {
    if (isalpha((unsigned char ) array[j])) {
      if (strcmp(&array[j], word) == 0) {
        found = &array[j];
        count++;
      }
      // skip rest of word
      do {
        j++;
      } while (isalpha((unsigned char ) array[j]));
    }
  }
  if (found == word) {
    printf("%s %d\n", word, count);
  }
}

int main(void) {
  char array[64];
  int i = 0;
  int j;
  int input;
  printf("Please enter an input:");

  // get the input
  while ((input = fgetc(stdin)) != '\n' && input != EOF) {
    array[i] = input;
    if (i + 1 >= sizeof array)
      break;
    i++;
  }
  array[i] = '\0';

  // change all delimiters to \0
  for (j = 0; j < i; j++) {
    if (!isalpha((unsigned char ) array[j])) {
      array[j] = '\0';
    }
  }


  for (j = 0; j < i; j++) {
    // Use the beginning of each word ... 
    if (isalpha((unsigned char ) array[j])) {
      Print_count(&array[j], array, i);
      // skip test of word
      do {
        j++;
      } while (isalpha((unsigned char ) array[j]));
    }
  }
  return 0;
}

輸入Hello#@Hello# hs,,,he,,whywhyto

輸出:

Hello 2
hs 1
he 1
whywhyto 1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM