简体   繁体   English

我的java“文件中的字数统计”程序不准确

[英]my java "word count from file" program is inaccurate

I am using a fairly basic logic for counting words from a ".txt" file in my java program & i do get accurate count for a single line in the file, but for some reason the count falls short 1 count every time a new line is encountered in the file...?我正在使用一个相当基本的逻辑来计算我的 java 程序中“.txt”文件中的单词,并且我确实获得了文件中单行的准确计数,但由于某种原因,每次换行时计数都低于 1 个计数在文件中遇到...? any help as to where my logic falls short would be great help !!关于我的逻辑不足之处的任何帮助都会有很大帮助!!

my code我的代码

        int count = 1; // any alternative for this ... ?

        FileInputStream fis = new FileInputStream ("practice.txt");

        int c,f=0;          

        while((c = fis.read()) != -1)
        {
            if ( c != ' ')
            {
                f = 1;
            }
            if(f == 1 && c == ' ')
            {
                count++;
                f=0;
            }
        }
        System.out.println(count);

Just to give a perspective: now if when i compile & run this on a text file with a single line like: "Welcome to java"只是给出一个观点:现在,如果我在一个文本文件上编译并运行它,其中包含一行,例如: “欢迎使用 java”

the result is:结果是:

  3 // accurate

but if the file gets a new line ie但是如果文件有一个新行,即

"Welcome to Java “欢迎使用 Java

this is line 2" .............. I get:这是第 2 行” …………我得到:

 6 

now as the lines increase the result decreases 1 for each line...!?现在随着行数的增加,每行结果减少 1 ......!?

(I am trying to be absolutely basic so am not using the tokenizer / split or any other built-in method for this) (我试图做到绝对基础,所以我没有为此使用标记器/拆分或任何其他内置方法)

The newlines are not space characters so when you have this换行符不是空格字符,所以当你有这个

"Welcome to Java

this is line 2"

It's being read by your program as它正在被您的程序读取为

"Welcome to Java\\nthis is line 2"

So you can add the newline escape sequence \\n to your program to handle that因此,您可以将换行符转义序列\\n添加到您的程序中以处理该问题

You are only checking for spaces.您只是在检查空格。 You need to check for new lines ( \\n ) also.您还需要检查新行( \\n )。 Thus you are counting Java and this as one word.因此,您将 Java 和 this 视为一个词。

If you test your program like this, you can modify the test data pretty easily to check other cases.如果你这样测试你的程序,你可以很容易地修改测试数据来检查其他情况。 Then you can alter to read in the file.然后您可以更改以读取文件。

And you may want to modify your code to check for tab ( \\t ) characters since they may also separate words.您可能希望修改代码以检查制表符 ( \\t ) 字符,因为它们也可能分隔单词。

        int count = 1; // any alternative for this ... ?

        String str = "Welcome to Java\nThis is a test";

        int f = 0;

        for (char c : str.toCharArray()) {
            if (c != ' ' && c != '\n') {
                f = 1;
            }
            if (f == 1 && (c == ' ' || c == '\n')) {

                count++;
                f = 0;

            }

        }
        System.out.println(count);
    }

The reason it worked for count = 1, is that when the for loop is done cycling thru the questions, the last word is not counted because the process abruptly ends.它对 count = 1 起作用的原因是,当 for 循环完成循环通过问题时,最后一个单词不会被计算在内,因为该过程突然结束。 The same thing with your reading from a file.从文件中读取同样的事情。 To see this set count = 0 run it and see.要查看此设置 count = 0 运行它并查看。 It will be 6. Then put a space at the end of the string and the count will be 7. So you need to put something like this when you end up reading in characters.它将是 6。然后在字符串的末尾放置一个空格,计数将是 7。因此,当您最终读取字符时,您需要放置这样的东西。

if (f == 1) {
   count++;
} 

Then you can initialize count = 0;然后就可以初始化count = 0;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM