简体   繁体   中英

Most common character in a file in C

I'm doing my C programming course homework and I need to find a most common character in given file.

My testing with a testfile, emptyfile and other small amount text files works great (or at least I think so), but in the last long testfile something goes wrong and the error message is: "Should have returned 'e' (101) for file rfc791.txt. You returned 'b' (98)" .

So what I'm asking that what might be wrong with my code, when suddenly the most common letter is not what is should be?

int most_common_character(char *filename) {
    FILE *f;
    if ((f = fopen(filename, "r")) == NULL) {
        fprintf(stderr, "Not opened: %s\n", strerror(errno));
        return -1;
    }

    char frequency[26];
    int ch = fgetc(f);
    if (ch == EOF) {
        return 0;
    }

    for (ch = 0; ch < 26; ch++) {
        frequency[ch] = 0;
    }

    while (1) {
        ch = fgetc(f);
        if (ch == EOF) {
            break;
        }
        if ('a' <= ch && ch <= 'z') {
            frequency[ch - 'a']++;
        }
        else if ('A' <= ch && ch <= 'Z') {
            frequency[ch - 'A']++;
        }
    }
    int maxCount = 0;
    int maxChar = 0;
    for (int i = 0; i <= 26; ++i) {
        if (frequency[i] > maxCount) {
            maxCount = frequency[i];
            maxChar = i;
        }
    }
    fclose(f);
    return maxChar + 'a';
}

I would be very grateful if someone has any hints to fix my code :) I've tried to search the solution to this problem from many other related topics but nothing seems to work.

You should use < operator in the second for loop. Because of that when you are checking frequency[i] > maxCount, at frequency[26] it behaves undefined behaviour, meaning the value at that index may be less or higher than the compared value.

Your code do have some problems. However, they are so tiny so the code still works well with small tests.

  1. int ch = fgetc(f); drop the first char in the file

  2. for (int i = 0; i <= 26; ++i) break out of the array 's range (only from 0-->25)

Beside these small mistakes, your code is awesomely fine. Well done #thumbsup

  1. Loop runs out-of-bounds. @Weather Vane

     // for (int i = 0; i <= 26; ++i) { for (int i = 0; i < 26; ++i) { 
  2. Code throws away result of the first character. @BLUEPIXY

     int ch = fgetc(f); if (ch == EOF) { return 0; } // This value of ch is not subsequently used. 

Other fixes as below

int most_common_character(char *filename) {
    ...

    // Use a more generous count @Weather Vane
    // char frequency[26];
    // Consider there may be more than 26 different letters
    // fgetc return EOF and value in the unsigned char range
    int frequency[UCHAR_MAX + 1] = { 0 };

    // Not needed as array was initialize above
    // for (ch = 0; ch < 26; ch++) { frequency[ch] = 0; }

    // BTW correct type declaration of int, avoided rookie mistake of using char
    int ch;

    // Codes use tolower(), islower() as that is the portable way to 
    // handle type-of-character detection
    while ((ch = fgetc(f)) != EOF) {
      frequency[tolower(ch)]++;  // could add check to insure frequency[] does not overflow
    } 

    int maxCount = 0;
    int maxChar = -1;
    for (int i = 0; i <= UCHAR_MAX; ++i) {
      if (islower(i) && frequency[i] > maxCount) {
        maxCount = frequency[i];
        maxChar = i;
      }
    }

    fclose(f);
    return maxChar;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM