简体   繁体   English

Java程序,用于计算文本给定文件中的行,单词和字符

[英]Java program to count lines, words, and chars from a text given file

I am practicing to write a program that gets a text file from user and provides data such as characters, words, and lines in the text. 我正在练习编写一个程序,从用户那里获取文本文件,并提供文本中的字符,单词和行等数据。

I have searched and looked over the same topic but cannot find a way to make my code run. 我搜索并查看了相同的主题,但找不到让我的代码运行的方法。

public class Document{
private Scanner sc;

// Sets users input to a file name
public Document(String documentName) throws FileNotFoundException {
    File inputFile = new File(documentName);
    try {
        sc = new Scanner(inputFile);

    } catch (IOException exception) {
        System.out.println("File does not exists");
    }
}


public int getChar() {
    int Char= 0;

    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        Char += line.length() + 1;

    }
    return Char;
}

// Gets the number of words in a text
public int getWords() {
    int Words = 0;

    while (sc.hasNext()) {
        String line = sc.next();
        Words += new StringTokenizer(line, " ,").countTokens();

    }

    return Words;
}

public int getLines() {
    int Lines= 0;

    while (sc.hasNextLine()) {
        Lines++;
    }

    return Lines;
}
  }

Main method: 主要方法:

public class Main {

    public static void main(String[] args) throws FileNotFoundException {
        DocStats doc = new DocStats("someText.txt");

        // outputs 1451, should be 1450
        System.out.println("Number of characters: "
            + doc.getChar()); 

        // outputs 0, should be 257
        System.out.println("Number of words: " + doc.getWords());
        // outputs 0, should be 49
        System.out.println("Number of lines: " + doc.getLines()); 

    }

}

I know exactly why I get 1451 instead of 1451. The reason is because I do not have '\\n' at the end of the last sentence but my method adds numChars += line.length() + 1; 我确切地知道为什么我得到1451而不是1451.原因是因为我在最后一句末尾没有'\\ n'但是我的方法添加了numChars + = line.length()+ 1;

However, I cannot find a solution to why I get 0 for words and lines. 但是,我无法找到解决方法,为什么我的单词和行为0。 *My texts includes elements as: ? *我的文本包含以下内容: , - ' , - '

After all, could anyone help me to make this work? 毕竟,有人可以帮助我做这项工作吗?

**So far, I the problem that concerns me is how I can get a number of characters, if the last sentence does not have '\\n' element. **到目前为止,我关心的问题是我如何获得一些字符,如果最后一句话没有'\\ n'元素。 Is there a chance I could fix that with an if statement? 我有机会用if语句解决这个问题吗?

-Thank you! -谢谢!

After doc.getChar() you have reached the end of file. doc.getChar()您已到达文件末尾。 So there's nothing more to read in this file! 所以在这个文件中没有什么可读的了!

You should reset your scanner in your getChar/Words/Lines methods, such as: 您应该使用getChar/Words/Lines方法重置扫描仪,例如:

public int getChar() {
    sc = new Scanner(inputFile);
...
    // solving your problem with the last '\n'
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        if (sc.hasNextLine())
            Char += line.length() + 1;
        else
            Char += line.length();
    }
    return char;
}

Please note that a line ending is not always \\n ! 请注意,行结尾并不总是\\n It might also be \\r\\n (especially under windows)! 它也可能是\\r\\n (特别是在windows下)!

public int getWords() {
    sc = new Scanner(inputFile);
...


public int getLines() {
    sc = new Scanner(inputFile);
...

I would use one sweep to calculate all 3, with different counters. 我会使用一次扫描来计算所有3个,具有不同的计数器。 just a loop over each char, check if its a new word etc, increase counts , use Charater.isWhiteSpace * 只是在每个char上循环,检查它是否是一个新单词等,增加计数,使用Charater.isWhiteSpace *

import java.io.*;
/**Cound lines, characters and words Assumes all non white space are words so even () is a word*/
public class ChrCounts{

    String data;
    int chrCnt;
    int lineCnt;
    int wordCnt;
    public static void main(String args[]){
        ChrCounts c = new ChrCounts();
        try{
            InputStream data = null;
            if(args == null || args.length < 1){
                data = new ByteArrayInputStream("quick brown foxes\n\r new toy\'s a fun game.\nblah blah.la la ga-ma".getBytes("utf-8"));
            }else{
                data = new BufferedInputStream( new FileInputStream(args[0]));
            }
            c.process(data);
            c.print();
        }catch(Exception e){
            System.out.println("ee " + e);
            e.printStackTrace();
        }
    }

    public void print(){
        System.out.println("line cnt " + lineCnt + "\nword cnt " + wordCnt + "\n chrs " + chrCnt);
    }

    public void process(InputStream data) throws Exception{
        int chrCnt = 0;
        int lineCnt = 0;
        int wordCnt = 0;
        boolean inWord = false;
        boolean inNewline = false;
        //char prev = ' ';
        while(data.available() > 0){
            int j = data.read();
            if(j < 0)break;
            chrCnt++;
            final char c = (char)j;
            //prev = c;
            if(c == '\n' || c == '\r'){
                chrCnt--;//some editors do not count line seperators as new lines
                inWord = false;
                if(!inNewline){
                    inNewline = true;
                    lineCnt++;
                }else{
                    //chrCnt--;//some editors dont count adjaccent line seps as characters
                }
            }else{
                inNewline = false;
                if(Character.isWhitespace(c)){
                    inWord = false;
                }else{
                    if(!inWord){
                        inWord = true;
                        wordCnt++;
                    }
                }
            }
        }
        //we had some data and last char was not in new line, count last line
        if(chrCnt > 0 && !inNewline){
            lineCnt++;
        }
        this.chrCnt = chrCnt;
        this.lineCnt = lineCnt;
        this.wordCnt = wordCnt;
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM