简体   繁体   English

使用Java从文本文件中计算字数

[英]Word Count from a text file using Java

I am trying to write a simple code that will give me the word count from a text file. 我正在尝试编写一个简单的代码,该代码将为我提供文本文件中的字数统计。 The code is as follows: 代码如下:

import java.io.File; //to read file
import java.util.Scanner;

public class ReadTextFile {
   public static void main(String[] args) throws Exception { 
      String filename = "textfile.txt";
      File f = new File (filename);
      Scanner scan = new Scanner(f);
      int wordCnt = 1;

      while(scan.hasNextLine()) {
          String text = scan.nextLine();
          for (int i = 0; i < text.length(); i++) {
              if(text.charAt(i) == ' ' && text.charAt(i-1) != ' ') {
                  wordCnt++;
              }
          }
      }
      System.out.println("Word count is " + wordCnt);
   }

}

this code compiles but does not give the correct word count. 此代码可以编译,但不能提供正确的字数。 What am I doing incorrectly? 我做错了什么?

Right now you are only incrementing wordCnt if the character you are on is a whitespace and the character before it is not. 现在,如果您使用的字符是空格,而前面的字符不是空格,则仅增加wordCnt However this discounts several cases, such as if there is not a space, but a newline character. 但这打折了几种情况,例如如果没有空格,而是换行符。 Consider if your file looked like: 考虑一下您的文件是否看起来像:

This is a text file\n
with a bunch of\n
words. 

Your method should return ten, but since there is not space after the words file , and of it will not count them as words. 您的方法应该返回十,但是由于单词file之后没有空格of因此不会将其视为单词。

If you just want the word count you can do something along the lines of: 如果您希望字数统计,则可以执行以下操作:

while(scan.hasNextLine()){
   String text = scan.nextLine();
   wordCnt+= text.split("\\s+").length;  
}

Which will split on white space(s), and return how many tokens are in the resulting Array 它将在空格上分割,并返回结果Array中的令牌数量

First of all remember about closing resources. 首先,请记住有关关闭资源的信息。 Please check this out. 请检查出来。

Since Java 8 you can count words in this way: 从Java 8开始,您可以通过这种方式对单词进行计数:

String regex = "\\s+"
String filename = "textfile.txt";

File f = new File (filename);

long wordCnt = 1;
try (var scanner = new Scanner (f)){
        wordCnt scanner.lines().map(str -> str.split(regex)).count();
} catch (IOException e) {
        e.printStackTrace();
}



System.out.println("Word count is " + wordCnt);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM