简体   繁体   English

在C中查找文件中最常见的字符

[英]Finding the most frequent character in a file in C

I'm writing a function that finds the most common alphabetic character in a file. 我正在编写一个函数,用于查找文件中最常见的字母字符。 The function should ignore all characters other than alphabetic. 该函数应忽略除字母以外的所有字符。

At the moment I have the following: 目前我有以下内容:

int most_common(const char *filename)
{
char frequency[26];
int ch = 0;

FILE *fileHandle;
if((fileHandle = fopen(filename, "r")) == NULL){
    return -1;
}

for (ch = 0; ch < 26; ch++)
    frequency[ch] = 0;

while(1){
    ch = fgetc(fileHandle);
    if (ch == EOF) break;

    if ('a' <= ch && ch  <= 'z')
        frequency[ch - 'a']++;
    else if ('A' <= ch && ch <= 'Z')
        frequency[ch - 'A']++;
}

int max = 0;
for (int i = 1; i < 26; ++i)
  if (frequency[i] > frequency[max])
      max = i;

return max;
}

Now the function returns how many times the most frequent letter occurred, not the character itself. 现在函数返回最常出现的字母发生的次数,而不是字符本身。 I'm a bit lost, as I'm not sure if that's the way this function should look like at all. 我有点失落,因为我不确定这个功能应该是什么样子。 Does it make sense and how possibly can I fix the problem? 它是否有意义,我怎么可能解决这个问题?

I would really appreciate your help. 我将衷心感谢您的帮助。

The variable frequency is indexed by the character code. 可变frequency由字符代码索引。 So frequency[0] is 5, if there have been 5 'a's. 所以frequency[0]是5,如果有5'a的话。

In your code you are assigning the count to max , not the character code, so you're returning the count not the actual character. 在您的代码中,您将计数分配给max ,而不是字符代码,因此您将返回计数而不是实际字符。

You need to store both the maximum frequency count and the character code that it referred to. 您需要存储最大频率计数和它所引用的字符代码。

I would fix this with: 我会解决这个问题:

int maxCount = 0;
int maxChar = 0;
// i = A to Z
for (int i = 0; i <= 26; ++i)
{
  // if freq of this char is greater than the previous max freq
  if (frequency[i] > maxCount)
  {
      // store the value of the max freq
      maxCount = frequency[i];

      // store the char that had the max freq
      maxChar = i;
  }
}

// character codes are zero-based alphabet.
// Add ASCII value of 'A' to turn back into a char code.
return maxChar + 'A';

Note that I changed int i = 1 to int i = 0 . 请注意,我将int i = 1更改为int i = 0 Starting at 1 would mean starting at B , which is a subtle bug you might not notice. 从1开始意味着从B开始,这是一个你可能不会注意到的微妙错误。 Also, the loop should terminate at <= 26 , otherwise you'd miss out Z too. 此外,循环应终止于<= 26 ,否则你也会错过Z

Note the braces. 注意大括号。 Your braces style (no braces for single-statement blocks) comes very highly un-recommended. 您的大括号样式(单语句块没有大括号) 非常不推荐。

Also, i++ is more common than ++i in cases like this. 此外,在这种情况下, i++++i更常见。 In this context it will make no difference, so would advise i++ . 在这种情况下,它没有任何区别,所以建议i++

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM