I'm writing a function that finds the most common alphabetic character in a file. The function should ignore all characters other than alphabetic.
At the moment I have the following:
int most_common(const char *filename)
{
char frequency[26];
int ch = 0;
FILE *fileHandle;
if((fileHandle = fopen(filename, "r")) == NULL){
return -1;
}
for (ch = 0; ch < 26; ch++)
frequency[ch] = 0;
while(1){
ch = fgetc(fileHandle);
if (ch == EOF) break;
if ('a' <= ch && ch <= 'z')
frequency[ch - 'a']++;
else if ('A' <= ch && ch <= 'Z')
frequency[ch - 'A']++;
}
int max = 0;
for (int i = 1; i < 26; ++i)
if (frequency[i] > frequency[max])
max = i;
return max;
}
Now the function returns how many times the most frequent letter occurred, not the character itself. I'm a bit lost, as I'm not sure if that's the way this function should look like at all. Does it make sense and how possibly can I fix the problem?
I would really appreciate your help.
The variable frequency
is indexed by the character code. So frequency[0]
is 5, if there have been 5 'a's.
In your code you are assigning the count to max
, not the character code, so you're returning the count not the actual character.
You need to store both the maximum frequency count and the character code that it referred to.
I would fix this with:
int maxCount = 0;
int maxChar = 0;
// i = A to Z
for (int i = 0; i <= 26; ++i)
{
// if freq of this char is greater than the previous max freq
if (frequency[i] > maxCount)
{
// store the value of the max freq
maxCount = frequency[i];
// store the char that had the max freq
maxChar = i;
}
}
// character codes are zero-based alphabet.
// Add ASCII value of 'A' to turn back into a char code.
return maxChar + 'A';
Note that I changed int i = 1
to int i = 0
. Starting at 1 would mean starting at B
, which is a subtle bug you might not notice. Also, the loop should terminate at <= 26
, otherwise you'd miss out Z
too.
Note the braces. Your braces style (no braces for single-statement blocks) comes very highly un-recommended.
Also, i++
is more common than ++i
in cases like this. In this context it will make no difference, so would advise i++
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.