简体   繁体   English

句子的平均单词长度

[英]Average word length for a sentence

I want to calculate average word length for a sentence. 我想计算一个句子的平均单词长度。

For example, given input abc def ghi , the average word length would be 3.0 . 例如,给定输入abc def ghi ,平均字长将为3.0

The program works but i want to ignore extra spaces between the words. 该程序有效,但我想忽略单词之间的多余空格。 So, given the following sentence: 因此,给出以下句子:

abc  def

(two spaces between the words), the average word length is calculated to be 2.0 instead of 3.0 . (单词之间的两个空格),平均单词长度计算为2.0而不是3.0

How can I take into account extra spaces between words? 如何考虑单词之间的多余空格? These are to be ignored, which would give average word length of 3.0 in the example above, instead of the erroneously calculated 2.0 . 这些将被忽略,在上面的示例中,平均字长为3.0 ,而不是错误计算的2.0

#include <stdio.h>
#include <conio.h>

int main()
{
char ch,temp;
float avg;
int space = 1,alphbt = 0,k = 0;

printf("Enter a sentence: ");

while((ch = getchar()) != '\n')
{
    temp = ch;

    if( ch != ' ')
    {
       alphbt++;
       k++;         // To ignore spaces before first word!!!
    }  
    else if(ch == ' ' && k != 0)
       space++;

}

if (temp == ' ')    //To ignore spaces after last word!!!
   printf("Average word lenth: %.1f",avg = (float) alphbt/(space-1));
else
   printf("Average word lenth: %.1f",avg = (float) alphbt/space);

getch();
}               

The counting logic is awry. 计数逻辑出现问题。 This code seems to work correctly with both leading and trailing blanks, and multiple blanks between words, etc. Note the use of int ch; 此代码似乎可以正确地使用前导和尾随空格以及单词之间的多个空格等int ch; so that the code can check for EOF accurately ( getchar() returns an int ). 以便代码可以准确地检查EOF( getchar()返回int )。

#include <stdio.h>
#include <stdbool.h>

int main(void)
{
    int ch;
    int numWords = 0;
    int numLetters = 0;
    bool prevWasASpace = true; //spaces at beginning are ignored

    printf("Enter a sentence: ");
    while ((ch = getchar()) != EOF && ch != '\n')
    {
        if (ch == ' ')
            prevWasASpace = true;
        else
        {
            if (prevWasASpace)
                numWords++;
            prevWasASpace = false;
            numLetters++;
        }
    }

    if (numWords > 0)
    {
        double avg = numLetters / (float)(numWords);
        printf("Average word length: %.1f (C = %d, N = %d)\n", avg, numLetters, numWords);
    }
    else
        printf("You didn't enter any words\n");
    return 0;
}

Various example runs, using # to indicate where Return was hit. 运行各种示例,使用#指示在何处击中了Return

Enter a sentence: A human in Algiers#
Average word length: 3.8 (C = 15, N = 4)

Enter a sentence: A human in Algiers  #
Average word length: 3.8 (C = 15, N = 4)

Enter a sentence:   A human  in   Algiers  #
Average word length: 3.8 (C = 15, N = 4)

Enter a sentence: #
You didn't enter any words

Enter a sentence: A human in AlgiersAverage word length: 3.8 (C = 15, N = 4)

Enter a sentence: You didn't enter any words

In the last but one example, I typed Control-D twice (the first to flush the 'A human in Algiers' to the program, the second to give EOF), and once in the last example. 在最后一个示例中,我两次键入Control-D (第一个将“阿尔及尔人”刷新到程序中,第二个输入EOF),最后一个示例中输入一次。 Note that this code counts tabs as 'not space'; 请注意,此代码将制表符视为“非空格”; you'd need #include <ctype.h> and if (isspace(ch)) (or if (isblank(ch)) ) in place of if (ch == ' ') to handle tabs better. 您需要#include <ctype.h>if (isspace(ch)) (或if (isblank(ch)) )代替if (ch == ' ')来更好地处理制表符。


getchar() returns an int getchar()返回一个int

I am confused why you have used int ch and EOF ! 我很困惑为什么您使用了int chEOF

There are several parts to this answer. 这个答案有几个部分。

  1. The first reason for using int ch is that the getchar() function returns an int . 使用int ch的第一个原因是getchar()函数返回int It can return any valid character plus a separate value EOF; 它可以返回任何有效字符以及单独的EOF值; therefore, its return value cannot be a char of any sort because it has to return more values than can fit in a char . 因此,它的返回值不能是任何类型的char ,因为它必须返回比char可以容纳的更多的值。 It actually returns an int . 它实际上返回一个int

  2. Why does it matter? 为什么这有关系? Suppose the value from getchar() is assigned to char ch . 假设将getchar()值分配给char ch Now, for most characters, most of the time, it works OK. 现在,对于大多数角色来说,大多数时候都可以正常运行。 However, one of two things will happen. 但是,将发生以下两种情况之一。 If plain char is a signed type, a valid character (often ÿ, y-umlaut, 0xFF, formally Unicode U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) is misrecognized as EOF. 如果plain char是带符号类型,则有效字符(通常为ÿ,y-umlaut,0xFF,正式为Unicode U + 00FF,带有DIAERESIS的拉丁小写字母Y)被误认为EOF。 Alternatively, if plain char is an unsigned type, then you will never detect EOF. 另外,如果plain char是无符号类型,那么您将永远不会检测到EOF。

  3. Why does detecting EOF matter? 为什么检测EOF很重要? Because your input code can get EOF when you aren't expecting it to. 因为您的输入代码可以在您不期望的时候获得EOF。 If your loop is: 如果循环是:

     int ch; while ((ch = getchar()) != '\\n') ... 

    and the input reaches EOF, the program is going to spend a long time doing nothing useful. 当输入到达EOF时,该程序将花费很长时间做无用的事情。 The getchar() function will repeatedly return EOF, and EOF is not '\\n' , so the loop will try again. getchar()函数将反复返回EOF,而EOF不是'\\n' ,因此循环将重试。 Always check for error conditions in input functions, whether the function is getchar() , scanf() , fread() , read() or any of their myriad relatives. 始终检查输入函数中的错误情况,无论函数是getchar()scanf()fread()read()还是其无数亲戚。

Obviously counting non-space characters is easy, your problem is counting words. 显然,计算非空格字符很容易,您的问题是计算单词。 Why count words as spaces as you're doing? 为什么在您进行操作时将单词算作空格? Or more importantly, what defines a word? 或更重要的是,单词的定义是什么?

IMO a word is defined as the transition from space character to non-space character. IMO一词定义为从空格字符到非空格字符的过渡。 So, if you can detect that, you can know how many words you have and your problem is solved. 因此,如果您能够检测到这一点,就可以知道您有多少个单词,问题就可以得到解决。

I have an implementation, there are many possible ways to implement it, I don't think you'll have trouble coming up with one. 我有一个实现,有很多可能的实现方法,我认为您不会遇到麻烦。 I may post my implementation later as an edit. 我可以稍后将实现发布为修改内容。

*Edit: my implementation *编辑:我的实现

#include <stdio.h>

int main()
{
    char ch;
    float avg;
    int words = 0;
    int letters = 0;
    int in_word = 0;

    printf("Enter a sentence: ");

    while((ch = getchar()) != '\n')
    {
        if(ch != ' ') {
            if (!in_word) {
                words++;
                in_word = 1;
            }
            letters++;
        }
        else {
            in_word = 0;
        }
    }

    printf("Average word lenth: %.1f",avg = (float) letters/words);
}

Consider the following input: (hyphens represent spaces) 考虑以下输入:(连字符代表空格)

--Hello---World--

You currently ignore the initial spaces and the ending spaces, but you count each of the middle spaces, even though they are next to each other. 当前,您忽略了起始空格和结尾空格,但是您对每个中间空格都进行了计数,即使它们彼此相邻。 With a slight change to your program, in particular to 'k' we can deal with this case. 稍稍更改您的程序,尤其是对“ k”,我们就可以处理这种情况。

#include <stdio.h>
#include <conio.h>
#include <stdbool.h>
int main()
{
  char ch;
  float avg;
  int numWords = 0;
  int numLetters = 0;
  bool prevWasASpace = true; //spaces at beginning are ignored

  printf("Enter a sentence: ");

  while((ch = getchar()) != '\n')
  {
      if( ch != ' ')
      {
         prevWasASpace = false;
         numLetters++;
      }  
      else if(ch == ' ' && !prevWasASpace)
      {
         numWords++;
         prevWasASpace = true; //EDITED this line until after the if.
      }
  } 

  avg = numLetters / (float)(numWords);

  printf("Average word lenth: %.1f",avg);

  getch();
}          

You may need to modify the preceding slightly (haven't tested it). 您可能需要稍微修改前面的内容(尚未测试)。

However, counting words in a sentence based on only spaces between words, might not be everything you want. 但是,仅根据单词之间的间隔对句子中的单词进行计数可能不是您想要的。 Consider the following sentences: 考虑以下句子:

John said, "Get the phone...Now!" 约翰说:“接电话...现在!”

The TV announcer just offered a buy-1-get-1-free deal while saying they are open 24/7. 电视播音员刚刚提供了买一送一的优惠,同时说他们全天候24/7开放。

It wouldn't cost them more than $100.99/month (3,25 euro). 每月花费不超过100.99美元(3.25欧元)。

I'm calling (555) 555-5555 immediately on his/her phone. 我立即在他/她的电话上拨打(555)555-5555。

A(n) = A(n-1) + A(n-2) -- in other words the sequence: 0,1,1,2,3,5, . A(n)= A(n-1)+ A(n-2)-换句话说,顺序为:0,1,1,2,3,5,。 . .

You will need to decide what constitutes a word, and that is not an easy question (btw, y'all, none of the examples included all varieties of English). 您将需要决定什么构成一个单词,这不是一个简单的问题(顺便说一句,你们都没有示例包括所有英语品种)。 Counting spaces is a pretty good estimate in English, but it won't get you all of the way. 用英语计算空格是一个不错的估计,但并不能完全解决问题。

Take a look at the Wikipedia page on Text Segmentation . 看一下“ 文本分段”的Wikipedia页面。 The article uses the phrase "non-trivial" four times. 本文四次使用短语“非平凡的”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM