计算文件中的单词数

Question

我在计算文件中的单词数时遇到问题。 我采用的方法是，当我看到一个空格或一个换行符时，我就知道要数一个单词。

问题是，如果我在段落之间有多行，那么我最终也会将它们算作单词。 如果您查看 readFile() 方法，您可以看到我在做什么。

你能帮我解决这个问题吗？

示例输入文件（包括一个空行）：

word word word
word word

word word word

Answer 1

您可以将 Scanner 与 FileInputStream 一起使用，而不是将 BufferedReader 与 FileReader 一起使用。 例如：-

File file = new File("sample.txt");
try(Scanner sc = new Scanner(new FileInputStream(file))){
    int count=0;
    while(sc.hasNext()){
        sc.next();
        count++;
    }
System.out.println("Number of words: " + count);
}

Answer 2

我会稍微改变你的方法。 首先，我将使用BufferedReader使用readLine()逐行读取文件文件。 然后使用String.split("\\\\s")在空白处拆分每一行，并使用结果数组的大小来查看该行上有多少单词。 要获得字符数，您可以查看每行或每个拆分单词的大小（取决于您是否要将空格计为字符）。

Answer 3

这只是一个想法。 有一种非常简单的方法可以做到。 如果您只需要单词数而不是实际单词，那么只需使用 Apache WordUtils

import org.apache.commons.lang.WordUtils;

public class CountWord {

public static void main(String[] args) {    
String str = "Just keep a boolean flag around that lets you know if the previous character was whitespace or not pseudocode follows";

    String initials = WordUtils.initials(str);

    System.out.println(initials);
    //so number of words in your file will be
    System.out.println(initials.length());    
  }
}

Answer 4

import java.io.BufferedReader;
import java.io.FileReader;

public class CountWords {

    public static void main (String args[]) throws Exception {

       System.out.println ("Counting Words");       
       FileReader fr = new FileReader ("c:\\Customer1.txt");        
       BufferedReader br = new BufferedReader (fr);     
       String line = br.readLin ();
       int count = 0;
       while (line != null) {
          String []parts = line.split(" ");
          for( String w : parts)
          {
            count++;        
          }
          line = br.readLine();
       }         
       System.out.println(count);
    }
}

Answer 5

只需保留一个布尔标志，让您知道前一个字符是否为空格（伪代码如下）：

boolean prevWhitespace = false;
int wordCount = 0;
while (char ch = getNextChar(input)) {
  if (isWhitespace(ch)) {
    if (!prevWhitespace) {
      prevWhitespace = true;
      wordCount++;
    }
  } else {
    prevWhitespace = false;
  }
}

Answer 6

我认为正确的方法是通过正则表达式：

String fileContent = <text from file>;    
String[] words = Pattern.compile("\\s+").split(fileContent);
System.out.println("File has " + words.length + " words");

希望它有帮助。 "\\s+" 的意思是在Pattern javadoc

Answer 7

黑客解决方案

您可以将文本文件读入字符串变量。 然后使用单个空格作为分隔符 StringVar.Split(" ") 将字符串拆分为一个数组。

数组计数将等于文件中的“单词”数。 当然，这不会给你行号的计数。

Answer 8

文件字数统计

如果单词之间有一些符号，那么您可以拆分并计算单词的数量。

Scanner sc = new Scanner(new FileInputStream(new File("Input.txt")));
        int count = 0;
        while (sc.hasNext()) {

            String[] s = sc.next().split("d*[.@:=#-]"); 

            for (int i = 0; i < s.length; i++) {
                if (!s[i].isEmpty()){
                    System.out.println(s[i]);
                    count++;
                }   
            }           
        }
        System.out.println("Word-Count : "+count);

Answer 9

3个步骤：消耗所有的空白，检查是否是一行，消耗所有的非空白。3

while(true){
    c = inFile.read();                
    // consume whitespaces
    while(isspace(c)){ inFile.read() }
    if (c == '\n'){ numberLines++; continue; }
    while (!isspace(c)){
         numberChars++;
         c = inFile.read();
    }
    numberWords++;
}

Answer 10

在这里查看我的解决方案，它应该可以工作。 这个想法是从单词中删除所有不需要的符号，然后将这些单词分开并将它们存储在其他一些变量中，我使用的是 ArrayList。 通过调整“excludedSymbols”变量，您可以添加更多您希望从单词中排除的符号。

public static void countWords () {
    String textFileLocation ="c:\\yourFileLocation";
    String readWords ="";
    ArrayList<String> extractOnlyWordsFromTextFile = new ArrayList<>();
    // excludedSymbols can be extended to whatever you want to exclude from the file 
    String[] excludedSymbols = {" ", "," , "." , "/" , ":" , ";" , "<" , ">", "\n"};
    String readByteCharByChar = "";
    boolean testIfWord = false;


    try {
        InputStream inputStream = new FileInputStream(textFileLocation);
        byte byte1 = (byte) inputStream.read();
        while (byte1 != -1) {

            readByteCharByChar +=String.valueOf((char)byte1);
            for(int i=0;i<excludedSymbols.length;i++) {
            if(readByteCharByChar.equals(excludedSymbols[i])) {
                if(!readWords.equals("")) {
                extractOnlyWordsFromTextFile.add(readWords);
                }
                readWords ="";
                testIfWord = true;
                break;
            }
            }
            if(!testIfWord) {
                readWords+=(char)byte1;
            }
            readByteCharByChar = "";
            testIfWord = false;
            byte1 = (byte)inputStream.read();
            if(byte1 == -1 && !readWords.equals("")) {
                extractOnlyWordsFromTextFile.add(readWords);
            }
        }
        inputStream.close();
        System.out.println(extractOnlyWordsFromTextFile);
        System.out.println("The number of words in the choosen text file are: " + extractOnlyWordsFromTextFile.size());
    } catch (IOException ioException) {

        ioException.printStackTrace();
    }
}

Answer 11

这可以使用 Java 8 以一种非常方式完成：

Files.lines(Paths.get(file))
    .flatMap(str->Stream.of(str.split("[ ,.!?\r\n]")))
    .filter(s->s.length()>0).count();

Answer 12

BufferedReader bf= new BufferedReader(new FileReader("G://Sample.txt"));
        String line=bf.readLine();
        while(line!=null)
        {
            String[] words=line.split(" ");
            System.out.println("this line contains " +words.length+ " words");
            line=bf.readLine();
        }

Answer 13

下面的代码在 Java 8 中支持

//将文件读入字符串

String fileContent=new String(Files.readAlBytes(Paths.get("MyFile.txt")),StandardCharacters.UFT_8);

//通过用分隔符分割将它们保存到字符串列表中

List<String> words = Arrays.asList(contents.split("\\PL+"));

int count=0;
for(String x: words){
 if(x.length()>1) count++;
}

sop(x);

Answer 14

如此简单，我们可以通过以下方法从文件中获取字符串：getText();

public class Main {

    static int countOfWords(String str) {
        if (str.equals("") || str == null) {
            return 0;
        }else{
            int numberWords = 0;
            for (char c : str.toCharArray()) {
                if (c == ' ') {
                    numberWords++;
                }
            }

            return ++numberWordss;
        }
    }
}

计算文件中的单词数

问题描述

14 个解决方案

解决方案1
13 2010-11-04 05:46:34

解决方案2
11 已采纳 2010-11-04 05:43:36

解决方案3
4 2010-11-04 07:00:09

解决方案4
3 2012-04-20 17:52:03

解决方案5
3 2010-11-04 05:40:07

解决方案6
3 2010-11-08 18:31:05

解决方案7
2 2010-11-04 05:45:47

解决方案8
0 2015-07-09 09:54:22

解决方案9
0 2010-11-04 05:55:11

解决方案10
0 2017-10-28 16:37:08

解决方案11
0 2017-12-02 05:20:26

解决方案12
0 2018-02-21 06:32:04

解决方案13
0 2018-05-09 00:05:58

解决方案14
0 2019-12-23 11:58:03

计算文件中的单词数

问题描述

14 个解决方案

解决方案1 13 2010-11-04 05:46:34

解决方案2 11 已采纳 2010-11-04 05:43:36

解决方案3 4 2010-11-04 07:00:09

解决方案4 3 2012-04-20 17:52:03

解决方案5 3 2010-11-04 05:40:07

解决方案6 3 2010-11-08 18:31:05

解决方案7 2 2010-11-04 05:45:47

解决方案8 0 2015-07-09 09:54:22

解决方案9 0 2010-11-04 05:55:11

解决方案10 0 2017-10-28 16:37:08

解决方案11 0 2017-12-02 05:20:26

解决方案12 0 2018-02-21 06:32:04

解决方案13 0 2018-05-09 00:05:58

解决方案14 0 2019-12-23 11:58:03

解决方案1
13 2010-11-04 05:46:34

解决方案2
11 已采纳 2010-11-04 05:43:36

解决方案3
4 2010-11-04 07:00:09

解决方案4
3 2012-04-20 17:52:03

解决方案5
3 2010-11-04 05:40:07

解决方案6
3 2010-11-08 18:31:05

解决方案7
2 2010-11-04 05:45:47

解决方案8
0 2015-07-09 09:54:22

解决方案9
0 2010-11-04 05:55:11

解决方案10
0 2017-10-28 16:37:08

解决方案11
0 2017-12-02 05:20:26

解决方案12
0 2018-02-21 06:32:04

解决方案13
0 2018-05-09 00:05:58

解决方案14
0 2019-12-23 11:58:03