简体   繁体   中英

How to calculate the frequency of a word from a txt file - Java

I need some help with this code. I want my program to calculate the frequency of each word matched from the pattern described.

public class Project {
    public static void main(String[] args) throws FileNotFoundException{
    Scanner INPUT_TEXT = new Scanner(new File("moviereview.txt")).useDelimiter(" ");

    String pattern = "[a-zA-Z'-]+";
    Pattern r = Pattern.compile(pattern);

    int occurences=0;

    while(INPUT_TEXT.hasNext()){
        //read next word
        String Stringcandidate=INPUT_TEXT.next();   

        //see if pattern matches (boolean find)
        if(r.matcher(Stringcandidate).find()) {
            occurences++; //increment occurences if pattern is found
            String moviereview = m.group(0); //retrieve found string
            String moviereview2 = moviereview.toLowerCase(); // ???

            System.out.println(moviereview2 + " appears " + occurences);
            if(occurences>1){
                 System.out.println(" times\n");
            }
            else{
                System.out.println(" time\n");
            }
        }
        INPUT_TEXT.close();//Close your Scanner.     
    }

}

As described in my comment earlier one can use a Map implementation, like HashMap , to store the matched words and their occurrences/frequencies.

I recommend to encapsulate the functionality of the program into smaller methods/classes so that every method/class only does a small task. So the code can be read better.

I assumed your file contained the String "auto bush trumped her tomato in the petunia auto"

Here is the code:

package how_to_calculate_the_frequency;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Project {

    HashMap<String, Integer> map = new HashMap<String, Integer>();

    public static void main(String[] args){

        Project project = new Project();

        Scanner INPUT_TEXT = project.readFile();

        project.analyse(INPUT_TEXT);

        project.showResults();

    }

    /**
     * logic to count the occurences of words matched by REGEX in a scanner that
     * loaded some text
     * 
     * @param scanner
     *            the scanner holding the text
     */
    public void analyse(Scanner scanner) {

        String pattern = "[a-zA-Z'-]+";
        Pattern r = Pattern.compile(pattern);

        while (scanner.hasNext()) {
            // read next word
            String Stringcandidate = scanner.next();

            // see if pattern matches (boolean find)
            Matcher matcher = r.matcher(Stringcandidate);
            if (matcher.find()) {
                String matchedWord = matcher.group();
                //System.out.println(matchedWord); //check what is matched
                this.addWord(matchedWord);

            }

        }
        scanner.close();// Close your Scanner.
    }

    /**
     * adds a word to the <word,count> Map if the word is new, a new entry is
     * created, otherwise the count of this word is incremented
     */
    public void addWord(String matchedWord) {

        if (map.containsKey(matchedWord)) {
            // increment occurrence
            int occurrence = map.get(matchedWord);
            occurrence++;
            map.put(matchedWord, occurrence);
        } else {
            // add word and set occurrence to 1
            map.put(matchedWord, 1);
        }

    }

    /**
     * reads a file from disk and returns a scanner to analyse it
     * 
     * @return the file from disk as scanner
     */
    public Scanner readFile() {

        Scanner scanner = null;

        /* use that for reading a file from disk
         * try { scanner = new Scanner(new
         * File("moviereview.txt")).useDelimiter(" "); } catch (Exception e) {
         * e.printStackTrace(); }
         */

        scanner = new Scanner("auto bush trumped her tomato in the petunia auto");

        return scanner;
    }

    /**
     * prints the matched words and their occurrences
     * in a readable way
     */
    public void showResults() {

        for (HashMap.Entry<String, Integer> matchedWord : map.entrySet()) {
            int occurrence = matchedWord.getValue();
            System.out.print("\"" + matchedWord.getKey() + "\" appears " + occurrence);
            if (occurrence > 1) {
                System.out.print(" times\n");
            } else {
                System.out.print(" time\n");
            }
        }

        // or as the new Java 8 lambda expression
        // map.forEach((word,occurrence)->System.out.println("\"" + word + "\"
        // appears " + occurrence + " times"));
    }
}

// DONE seperate reading a file, analysing the file and
// word-frequency-counting-logic in different
// methods
// Done implement <word,count> Map and logic to add new and known(to the map)
// words

This yields:

"the" appears 1 time

"auto" appears 2 times

"her" appears 1 time

"in" appears 1 time

"bush" appears 1 time

"trumped" appears 1 time

"tomato" appears 1 time

"petunia" appears 1 time

regards

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM