简体   繁体   中英

Finding the most frequent character in a file in C

I'm writing a function that finds the most common alphabetic character in a file. The function should ignore all characters other than alphabetic.

At the moment I have the following:

int most_common(const char *filename)
{
char frequency[26];
int ch = 0;

FILE *fileHandle;
if((fileHandle = fopen(filename, "r")) == NULL){
    return -1;
}

for (ch = 0; ch < 26; ch++)
    frequency[ch] = 0;

while(1){
    ch = fgetc(fileHandle);
    if (ch == EOF) break;

    if ('a' <= ch && ch  <= 'z')
        frequency[ch - 'a']++;
    else if ('A' <= ch && ch <= 'Z')
        frequency[ch - 'A']++;
}

int max = 0;
for (int i = 1; i < 26; ++i)
  if (frequency[i] > frequency[max])
      max = i;

return max;
}

Now the function returns how many times the most frequent letter occurred, not the character itself. I'm a bit lost, as I'm not sure if that's the way this function should look like at all. Does it make sense and how possibly can I fix the problem?

I would really appreciate your help.

The variable frequency is indexed by the character code. So frequency[0] is 5, if there have been 5 'a's.

In your code you are assigning the count to max , not the character code, so you're returning the count not the actual character.

You need to store both the maximum frequency count and the character code that it referred to.

I would fix this with:

int maxCount = 0;
int maxChar = 0;
// i = A to Z
for (int i = 0; i <= 26; ++i)
{
  // if freq of this char is greater than the previous max freq
  if (frequency[i] > maxCount)
  {
      // store the value of the max freq
      maxCount = frequency[i];

      // store the char that had the max freq
      maxChar = i;
  }
}

// character codes are zero-based alphabet.
// Add ASCII value of 'A' to turn back into a char code.
return maxChar + 'A';

Note that I changed int i = 1 to int i = 0 . Starting at 1 would mean starting at B , which is a subtle bug you might not notice. Also, the loop should terminate at <= 26 , otherwise you'd miss out Z too.

Note the braces. Your braces style (no braces for single-statement blocks) comes very highly un-recommended.

Also, i++ is more common than ++i in cases like this. In this context it will make no difference, so would advise i++ .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM