简体   繁体   English

为什么我的角色数不正确?

[英]Why is my character count incorrect?

The following code gets the number of words: 以下代码获取单词数:

int count = 0;
for (int i = 0; chars[i] != EOF; i++)
{
    if (chars[i] == ' ')
    {
         count++;
    }
}

My problem is, that it doesn't count the words correctly. 我的问题是,它没有正确计算单词。

For example, if my file.txt has the following text in it: 例如,如果我的file.txt包含以下文本:

spaced-out there's I'd like

It says I have 6 words , when according to MS Word I'd have 4 . 它说我有6 words ,根据MS Word,我有4 单词

spaced-out and in

Gives me a word count of 4 . 给我4个字数。

spaced out and in

Gives me a word count of 6 给我一个6字的字数

I'm sorry if this question has been answered before, Google doesn't take into account the special characters in the search, so it is hard to find the answer to coding. 我很抱歉,如果之前已经回答过这个问题,谷歌没有考虑搜索中的特殊字符,所以很难找到编码的答案。 I'd preferably have the words just by identifying if it's a space or not. 我最好通过确定它是否是一个空格来获得这些词语。

I tried looking for answers but no one seemed to have the same problem exactly. 我试着寻找答案,但似乎没有人确切地遇到同样的问题。 I know that the .txt files might end in /r/n in Windows, but then that should be part of one word. 我知道.txt文件可能以Windows中的/r/n结尾,但那应该是一个单词的一部分。 For example: 例如:

spaced out and in/r/n

I believe it should still give me 4 words. 我相信它还应该给我4字。 Also when I add || chars[i] == '\\n' 当我添加|| chars[i] == '\\n' || chars[i] == '\\n' as: || chars[i] == '\\n'如下:

for (int i = 0; chars[i] != EOF || chars[i] == '\n'; i++)

I get even more words, 8 for the line 我得到了更多的话, 8

spaced out and in

I am doing this on a Linux-based server, but on an SSH client on Windows. 我在基于Linux的服务器上执行此操作,但在Windows上的SSH客户端上执行此操作。 The characters come from a .txt file. 字符来自.txt文件。


Edit: Okay, here is the code, I avoided the #include when posting it. 编辑:好的,这是代码,发布时我避免使用#include

#define BUF_SIZE 500            
#define OUTPUT_MODE 0700        

int main(int argc, char *argv[])
{
    int input, output;
    int readSize = 1, writeSize;            
    char chars[BUF_SIZE];   
    int count = 0;

    input = open(argv[1], O_RDONLY);                

    output = creat(argv[2], OUTPUT_MODE);   

    while (readSize > 0)                
    {
        readSize = read(input, chars, BUF_SIZE); 
        if (readSize < 0)       
        exit(4);

        for (int i = 0; chars[i] != '\0'; i++)
        {
            if (chars[i] == ' ')
            {
                count++;
            }
        }

        writeSize = write(output, chars, readSize);     
        if (writeSize <= 0)             
        {
            close(input);       
            close(output);
            printf("%d words\n", count);
            exit(5);
        }
    }
}

I am writing this answer because I think, I know what your confusion is. 我写这个答案是因为我想,我知道你的困惑是什么。 But note that you did not explain how you read the file, I'll give an example and explain why we test != EOF , which is not a character that you read from a file. 但请注意,您没有解释如何阅读文件,我将举例说明我们测试的原因!= EOF ,这不是您从文件中读取的字符。

It appears that you think EOF is a character that is stored in the file, well it's not. 看来您认为EOF是存储在文件中的字符,但事实并非如此。 If you just want to count words you can do something like 如果你只想数字,你可以做类似的事情

int chr;
while ((chr = fgetc(file)) != EOF)
    count += (chr == ' ') ? 1 : 0;

note that chr MUST be of type int because EOF is of type int , but it's certainly not present in the file! 请注意, chr必须是int类型,因为EOFint类型,但它肯定不存在于文件中! It's returned by functions like fgetc() to indicate that there is nothing more to read, note that an attempt to read must be made in order for it to return it. 它由fgetc()类的函数返回,表示没有其他内容可读,请注意必须尝试读取才能返回它。

Oops, also note that my sample code will not count the last word. 糟糕,还要注意我的示例代码不计算最后一个单词。 But that's for you to figure out. 但那是你要弄明白的。

Also, this would count multiple spaces as " words " something that you should also workout. 此外,这将计算多个空格作为“ 单词 ”的东西,你也应该锻炼。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM