简体   繁体   English

计算文件中的字符,单词和行数

[英]count characters, words and lines in file

This should count number of lines, words and characters into file. 这应该将行数,单词数和字符数计入文件中。

But it doesn't work. 但这是行不通的。 From output it shows only 0 . 从输出中它仅显示0

Code: 码:

public static void main(String[] args) throws IOException {
    int ch;
    boolean prev = true;        
    //counters
    int charsCount = 0;
    int wordsCount = 0;
    int linesCount = 0;

    Scanner in = null;
    File selectedFile = null;
    JFileChooser chooser = new JFileChooser();
    // choose file 
    if (chooser.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) {
        selectedFile = chooser.getSelectedFile();
        in = new Scanner(selectedFile);         
    }

    // count the characters of the file till the end
    while(in.hasNext()) {
        ch = in.next().charAt(0);
        if (ch != ' ') ++charsCount;
        if (!prev && ch == ' ') ++wordsCount;
        // don't count if previous char is space
        if (ch == ' ') 
            prev = true;
        else 
            prev = false;

        if (ch == '\n') ++linesCount;
    }

    //display the count of characters, words, and lines
    charsCount -= linesCount * 2;
    wordsCount += linesCount;
    System.out.println("# of chars: " + charsCount);
    System.out.println("# of words: " + wordsCount);
    System.out.println("# of lines: " + linesCount);

    in.close();
}

I can't understand what's going on. 我不明白发生了什么。 Any suggestions? 有什么建议么?

Your code is looking at only the first characters of default tokens (words) in the file. 您的代码仅查看文件中默认标记(单词)的前几个字符。

When you do this ch = in.next().charAt(0) , it gets you the first character of a token (word), and the scanner moves forward to the next token (skipping rest of that token). 当您执行ch = in.next().charAt(0) ,它将获取令牌的第一个字符(单词),然后扫描程序将前进至下一个令牌(跳过该令牌的其余部分)。

Different approach. 不同的方法。 Using strings to find line,word and character counts: 使用字符串查找行数,单词数和字符数:

public static void main(String[] args) throws IOException {
        //counters
        int charsCount = 0;
        int wordsCount = 0;
        int linesCount = 0;

        Scanner in = null;
        File selectedFile = null;
        JFileChooser chooser = new JFileChooser();
        // choose file 
        if (chooser.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) {
            selectedFile = chooser.getSelectedFile();
            in = new Scanner(selectedFile);
        }

        while (in.hasNext()) {
            String tmpStr = in.nextLine();
            if (!tmpStr.equalsIgnoreCase("")) {
                String replaceAll = tmpStr.replaceAll("\\s+", "");
                charsCount += replaceAll.length();
                wordsCount += tmpStr.split(" ").length;
            }
            ++linesCount;
        }

        //display the count of characters, words, and lines
        System.out.println("# of chars: " + charsCount);
        System.out.println("# of words: " + wordsCount);
        System.out.println("# of lines: " + linesCount);

        in.close();
    }


Note: 注意:
For other encoding styles use new Scanner(new File(selectedFile), "###"); 对于其他编码样式,请使用new Scanner(new File(selectedFile), "###"); in place of new Scanner(selectedFile); 代替new Scanner(selectedFile); .

### is the Character set to needed. ###是需要设置的字符。 Refer this and wiki 引用这个维基

You have a couple of issues in here. 这里有几个问题。

First is the test for the end of line is going to cause problems since it usually isn't a single character denoting end of line. 首先是对行尾的测试将导致问题,因为它通常不是表示行尾的单个字符。 Read http://en.wikipedia.org/wiki/End-of-line for more detail on this issue. 请阅读http://en.wikipedia.org/wiki/在线结尾,以获取有关此问题的更多详细信息。

The whitespace character between words can be more than just the ASCII 32 (space) value. 单词之间的空白字符不仅可以是ASCII 32(空格)值。 Consider tabs as one case. 将制表符视为一种情况。 You want to check for Character.isWhitespace() more than likely. 您想要检查Character.isWhitespace()的可能性更高。

You could also solve the end of line issues with two scanners found in How to check the end of line using Scanner? 您也可以使用如何使用扫描仪检查行尾中的两个扫描仪来解决行尾问题

Here is a quick hack on the code you provided along with input and output. 这是您与输入和输出一起提供的代码的快速技巧。

import java.io.*;
import java.util.Scanner;
import javax.swing.JFileChooser;

public final class TextApp {

public static void main(String[] args) throws IOException {
    //counters
    int charsCount = 0;
    int wordsCount = 0;
    int linesCount = 0;

    Scanner fileScanner = null;
    File selectedFile = null;
    JFileChooser chooser = new JFileChooser();
    // choose file 
    if (chooser.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) {
        selectedFile = chooser.getSelectedFile();
        fileScanner = new Scanner(selectedFile);         
    }

    while (fileScanner.hasNextLine()) {
      linesCount++;
      String line = fileScanner.nextLine();
      Scanner lineScanner = new Scanner(line);
      // count the characters of the file till the end
      while(lineScanner.hasNext()) {
        wordsCount++;
        String word = lineScanner.next();
        charsCount += word.length();
      } 

    lineScanner.close();
  }

  //display the count of characters, words, and lines
  System.out.println("# of chars: " + charsCount);
  System.out.println("# of words: " + wordsCount);
  System.out.println("# of lines: " + linesCount);

  fileScanner.close();
 }
}

Here is the test file input: 这是测试文件输入:

$ cat ../test.txt 
test text goes here
and here

Here is the output: 这是输出:

$ javac TextApp.java
$ java TextApp 
# of chars: 23
# of words: 6
# of lines: 2
$ wc test.txt 
 2  6 29 test.txt

The difference between character count is due to not counting whitespace characters which appears to be what you were trying to do in the original code. 字符计数之间的差异是由于不计算空白字符,这似乎是您在原始代码中尝试执行的操作。

I hope that helps out. 希望对您有所帮助。

You could store every line in a List<String> and then linesCount = list.size() . 您可以将每行存储在List<String> ,然后将linesCount = list.size()

Calculating charsCount : 计算charsCount

for(final String line : lines)
    charsCount += line.length();

Calculating wordsCount : 计算wordsCount

for(final String line : lines)
    wordsCount += line.split(" +").length;

It would probably be a wise idea to combine these calculations together as opposed to doing them seperately. 将这些计算组合在一起而不是单独进行计算,可能是一个明智的想法。

Use Scanner methods: 使用Scanner方法:

int lines = 0;
int words = 0;
int chars = 0;
while(in.hasNextLine()) {
    lines++;
    Scanner lineScanner = new Scanner(in.nextLine());
    lineScanner.useDelimiter(" ");
    while(lineScanner.hasNext()) {
        words++;
        chars += lineScanner.next().length();
    }
}

Looks like everyone is suggesting you an alternative, 似乎每个人都在建议您另一种选择,

The flaw with your logic is, you are not looping through the all the characters for the entire line. 逻辑上的缺陷是,您没有遍历整行的所有字符。 You are just looping through the first character of every line. 您只是循环浏览每行的第一个字符。

 ch = in.next().charAt(0);

Also, what does 2 in charsCount -= linesCount * 2; 另外, charsCount -= linesCount * 2; represent? 代表?

You might also want to include a try-catch block, while accessing a file. 您可能还希望在访问文件时包括try-catch块。

  try {
            in = new Scanner(selectedFile);
        } catch (FileNotFoundException e) {}

Maybe my code will help you...everything work correct 也许我的代码可以帮助您...一切正常

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;
import java.util.StringTokenizer;

public class LineWordChar {
    public static void main(String[] args) throws IOException {
        // Convert our text file to string
    String text = new Scanner( new File("way to your file"), "UTF-8" ).useDelimiter("\\A").next();
    BufferedReader bf=new BufferedReader(new FileReader("way to your file"));
    String lines="";
    int linesi=0;
    int words=0;
    int chars=0;
    String s="";
    // while next lines are present in file int linesi will add 1
        while ((lines=bf.readLine())!=null){
        linesi++;}
    // Tokenizer separate our big string "Text" to little string and count them
    StringTokenizer st=new StringTokenizer(text);
     while (st.hasMoreTokens()){
        `enter code here`  s = st.nextToken();
          words++;
    // We take every word during separation and count number of char in this words    
          for (int i = 0; i < s.length(); i++) {
              chars++;}
        }
     System.out.println("Number of lines: "+linesi);
     System.out.println("Number of words: "+words);
     System.out.print("Number of chars: "+chars);
 }
}
public class WordCount {

    /**
     * @return HashMap a map containing the Character count, Word count and
     *         Sentence count
     * @throws FileNotFoundException 
     *
     */
    public static void main() throws FileNotFoundException {
        lineNumber=2; // as u want
        File f = null;
        ArrayList<Integer> list=new ArrayList<Integer>();

        f = new File("file.txt");
        Scanner sc = new Scanner(f);
        int totalLines=0;
        int totalWords=0;
        int totalChars=0;
        int totalSentences=0;
        while(sc.hasNextLine())
        {
            totalLines++;
            if(totalLines==lineNumber){
                String line = sc.nextLine();
                totalChars += line.length();
                totalWords += new StringTokenizer(line, " ,").countTokens();  //line.split("\\s").length;
                totalSentences += line.split("\\.").length;
                break;
            }
            sc.nextLine();

        }

        list.add(totalChars);
        list.add(totalWords);
        list.add(totalSentences);
        System.out.println(lineNumber+";"+totalWords+";"+totalChars+";"+totalSentences);

    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM