简体   繁体   中英

Counting words from a text-file in Java

I'm writing a program that'll scan a text file in, and count the number of words in it. The definition for a word for the assignment is: 'A word is a non-empty string consisting of only of letters (a,. . . ,z,A,. . . ,Z), surrounded by blanks, punctuation, hyphenation, line start, or line end. '.

I'm very novice at java programming, and so far i've managed to write this instancemethod, which presumably should work. But it doesn't.

public int wordCount() {
    int countWord = 0;
    String line = "";
    try {
        File file = new File("testtext01.txt");
        Scanner input = new Scanner(file);

        while (input.hasNext()) {
            line = line + input.next()+" ";
            input.next();
        }
        input.close();
        String[] tokens = line.split("[^a-zA-Z]+");
        for (int i=0; i<tokens.length; i++){
            countWord++;
        }
        return countWord;

    } catch (Exception ex) {
        ex.printStackTrace();
    }
    return -1;
}

Quoting from Counting words in text file?

    int wordCount = 0;

    while (input.hasNextLine()){

       String nextLine = input.nextLine();
       Scanner word = new Scanner(nextline);

       while(word.hasNext()){
          wordCount++;    
          word.next();
       }
       word.close();
    }
    input.close();

The only usable word separators in your file are spaces and hyphens. You can use regex and the split() method.

int num_words = line.split("[\\s\\-]").length; //stores number of words
System.out.print("Number of words in file is "+num_words);

REGEX (Regular Expression):

\\\\s splits the String at white spaces/line breaks and \\\\- at hyphens. So wherever there is a space, line break or hyphen, the sentence will be split. The words extracted are copied into and returned as an array whose length is the number of words in your file.

you can use java regular expression. 
You can read http://docs.oracle.com/javase/tutorial/essential/regex/groups.html to know about group



    public int wordCount(){

        String patternToMatch = "([a-zA-z]+)";
        int countWord = 0;
        try {
        Pattern pattern =  Pattern.compile(patternToMatch);
        File file = new File("abc.txt");
        Scanner sc = new Scanner(file);
        while(sc.hasNextLine()){
            Matcher matcher = pattern.matcher(sc.nextLine());
             while(matcher.find()){
                 countWord++;
             }
        }
        sc.close();
}catch(Exception e){
          e.printStackTrace();
        }
        return countWord > 0 ? countWord : -1;
    }
void run(String path)
throws Exception
{
    try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF-8")))
    {
        int result = 0;

        while (true)
        {
            String line = reader.readLine();

            if (line == null)
            {
                break;
            }

            result += countWords(line);
        }

        System.out.println("Words in text: " + result);
    }
}

final Pattern pattern = Pattern.compile("[A-Za-z]+");

int countWords(String text)
{
    Matcher matcher = pattern.matcher(text);

    int result = 0;

    while (matcher.find())
    {
        ++result;

        System.out.println("Matcher found [" + matcher.group() + "]");
    }

    System.out.println("Words in line: " + result);

    return result;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM