简体   繁体   English

计数字符串中的单词?

[英]Counting words in a string?

Hello for this program I am supposed to count the number of words in a string. 您好,对于该程序,我应该计算字符串中的单词数。 So far, I have found out how to find the number of characters in a string but am unable to figure out how to turn the letters that make a word, and count it as 1 word. 到目前为止,我已经找到了如何查找字符串中的字符数,但是无法弄清楚如何将构成单词的字母转换为1个单词。

My function is: 我的职能是:

int wordcount( char word[MAX] ){

    int i, num, counter, j;

    num = strlen( word );
    counter = 0;

    for (i = 0; i < num; i++)
    {
        if (word[i] != ' ' || word[i] != '\t' || word[i] != '\v' || word[i] != '\f')
        {

        }

    }

    return counter;
}

I tried some variations, but the middle part of the if statement is where I am confused. 我尝试了一些变体,但是if语句的中间部分让我感到困惑。 How can I count the number of words in a string? 如何计算字符串中的单词数? Testing for this tests if the string has multiple spaces like "Hello this is a string" 测试此字符串是否有多个空格,例如“ Hello this is a string”

Hints only since this is probably homework. 仅因为这可能是家庭作业而提示。

What you're looking to count is the number of transitions between 'word' characters and whitespace. 您要计算的是“单词”字符和空白之间的过渡数量。 That will require remembering the last character and comparing it to the current one. 这将需要记住最后一个字符并将其与当前字符进行比较。

If one is whitespace and the other is not, you have a transition. 如果一个是空格,而另一个不是,则有一个过渡。

With more detail, initialise the lastchar to whitespace, then loop over every character in your input. 更详细地讲,将lastchar初始化为空格,然后循环输入中的每个字符。 Where the lastchar was whitespace and the current character is not, increase the word count. 如果lastchar是空格,而当前字符不是空格,请增加字数。

Don't forget to copy the current character to lastchar at the end of each loop iteration. 不要忘记在每次循环迭代结束时将当前字符复制到lastchar And it should hopefully go without saying that the word count should be initialised to 0. 希望不用说,字数应该初始化为0。

There is a linux util 'wc' that can count words. 有一个linux util'wc'可以计数单词。

have a look (it includes some explanation and a sample): 看一看(包括一些说明和示例):

http://en.literateprograms.org/Word_count_(C) http://en.literateprograms.org/Word_count_(C)

and a link to the source 以及到源的链接

http://en.literateprograms.org/index.php?title=Special:DownloadCode/Word_count_(C)&oldid=15634 http://en.literateprograms.org/index.php?title=特殊:DownloadCode / Word_count_(C)&oldid = 15634

When you're in the if part, it means you're inside a word. 当您处于if部分时,表示您在一个字里面。 So you can flag this inword and look whether you change from out of word (which would be your else part) to inword and back. 因此,您可以标记该inword并查看您是否从单词外(这是您的其他部分)更改为inword并返回。

This is a quick suggestion — there could be better ways, but I like this one. 这是一个快速的建议-可能有更好的方法,但是我喜欢这种方法。

First, be sure to "know" what a word is made of. 首先,请务必“知道”一个单词的成分。 Let us suppose it's made of letters only. 让我们假设它仅由字母组成。 All the rest, being punctuation or "blanks", can be considered as a separator. 所有其余的标点符号或“空白”都可以视为分隔符。

Then, your "system" has two states: 1) completing a word, 2) skipping separator(s). 然后,您的“系统”具有两种状态:1)完成一个单词,2)跳过分隔符。

You begin your code with a free run of the skip separator(s) code. 您可以自由运行跳过分隔符代码来开始编写代码。 Then you enter the "completing a word" state which you will keep until the next separator or the end of the whole string (in this case, you exit). 然后,进入“正在完成单词”状态,直到下一个分隔符或整个字符串的末尾(在这种情况下,您将退出)。 When it happens, you have completed a word, so you increment your word counter by 1, and you go in the "skipping separators" state. 发生这种情况时,您已经完成了一个单词,因此将单词计数器加1,然后进入“跳过分隔符”状态。 And the loop continue. 并且循环继续。

Pseudo C-like code: 类似伪C的代码:

char *str;

/* someone will assign str correctly */

word_count = 0;
state = SKIPPING;

for(c = *str; *str != '\0'; str++)
{
    if (state == SKIPPING && can_be_part_of_a_word(c)) {
        state = CONSUMING;
        /* if you need to accumulate the letters, 
           here you have to push c somewhere */
    }
    else if (state == SKIPPING) continue; // unneeded - just to show the logic
    else if (state == CONSUMING && can_be_part_of_a_word(c)) {
        /* continue accumulating pushing c somewhere 
           or, if you don't need, ... else if kept as placeholder */
    }
    else if (state == CONSUMING) {
        /* separator found while consuming a word: 
           the word ended. If you accumulated chars, you can ship
           them out as "the word" */
        word_count++;
        state = SKIPPING;
    }
}
// if the state on exit is CONSUMING you need to increment word_count:
// you can rearrange things to avoid this when the loop ends, 
// if you don't like it
if (state == CONSUMING) { word_count++; /* plus ship out last word */ }

the function can_be_part_of_a_word returns true if the read char is in [A-Za-z_] for example, false otherwise. 例如,如果读取的字符位于[A-Za-z_]中,则函数can_be_part_of_a_word返回true,否则返回false。

(It should work If I have not done some gross error with the abetment of the tiredness) (如果我没有因疲倦而做一些严重的错误,那应该起作用)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM