繁体   English   中英

如何计算txt文件中的单词出现频率 - Java

[英]How to calculate the frequency of a word from a txt file - Java

我需要一些有关此代码的帮助。 我希望我的程序计算与描述的模式匹配的每个单词的频率。

public class Project {
    public static void main(String[] args) throws FileNotFoundException{
    Scanner INPUT_TEXT = new Scanner(new File("moviereview.txt")).useDelimiter(" ");

    String pattern = "[a-zA-Z'-]+";
    Pattern r = Pattern.compile(pattern);

    int occurences=0;

    while(INPUT_TEXT.hasNext()){
        //read next word
        String Stringcandidate=INPUT_TEXT.next();   

        //see if pattern matches (boolean find)
        if(r.matcher(Stringcandidate).find()) {
            occurences++; //increment occurences if pattern is found
            String moviereview = m.group(0); //retrieve found string
            String moviereview2 = moviereview.toLowerCase(); // ???

            System.out.println(moviereview2 + " appears " + occurences);
            if(occurences>1){
                 System.out.println(" times\n");
            }
            else{
                System.out.println(" time\n");
            }
        }
        INPUT_TEXT.close();//Close your Scanner.     
    }

}

正如我之前的评论中所述,可以使用Map实现(如HashMap )来存储匹配的单词及其出现次数/频率。

我建议将程序的功能封装到更小的方法/类中,这样每个方法/类只做一个小任务。 所以代码可以更好地阅读。

我假设你的文件包含字符串“汽车灌木在矮牵牛汽车中胜过她的番茄”

这是代码:

package how_to_calculate_the_frequency;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Project {

    HashMap<String, Integer> map = new HashMap<String, Integer>();

    public static void main(String[] args){

        Project project = new Project();

        Scanner INPUT_TEXT = project.readFile();

        project.analyse(INPUT_TEXT);

        project.showResults();

    }

    /**
     * logic to count the occurences of words matched by REGEX in a scanner that
     * loaded some text
     * 
     * @param scanner
     *            the scanner holding the text
     */
    public void analyse(Scanner scanner) {

        String pattern = "[a-zA-Z'-]+";
        Pattern r = Pattern.compile(pattern);

        while (scanner.hasNext()) {
            // read next word
            String Stringcandidate = scanner.next();

            // see if pattern matches (boolean find)
            Matcher matcher = r.matcher(Stringcandidate);
            if (matcher.find()) {
                String matchedWord = matcher.group();
                //System.out.println(matchedWord); //check what is matched
                this.addWord(matchedWord);

            }

        }
        scanner.close();// Close your Scanner.
    }

    /**
     * adds a word to the <word,count> Map if the word is new, a new entry is
     * created, otherwise the count of this word is incremented
     */
    public void addWord(String matchedWord) {

        if (map.containsKey(matchedWord)) {
            // increment occurrence
            int occurrence = map.get(matchedWord);
            occurrence++;
            map.put(matchedWord, occurrence);
        } else {
            // add word and set occurrence to 1
            map.put(matchedWord, 1);
        }

    }

    /**
     * reads a file from disk and returns a scanner to analyse it
     * 
     * @return the file from disk as scanner
     */
    public Scanner readFile() {

        Scanner scanner = null;

        /* use that for reading a file from disk
         * try { scanner = new Scanner(new
         * File("moviereview.txt")).useDelimiter(" "); } catch (Exception e) {
         * e.printStackTrace(); }
         */

        scanner = new Scanner("auto bush trumped her tomato in the petunia auto");

        return scanner;
    }

    /**
     * prints the matched words and their occurrences
     * in a readable way
     */
    public void showResults() {

        for (HashMap.Entry<String, Integer> matchedWord : map.entrySet()) {
            int occurrence = matchedWord.getValue();
            System.out.print("\"" + matchedWord.getKey() + "\" appears " + occurrence);
            if (occurrence > 1) {
                System.out.print(" times\n");
            } else {
                System.out.print(" time\n");
            }
        }

        // or as the new Java 8 lambda expression
        // map.forEach((word,occurrence)->System.out.println("\"" + word + "\"
        // appears " + occurrence + " times"));
    }
}

// DONE seperate reading a file, analysing the file and
// word-frequency-counting-logic in different
// methods
// Done implement <word,count> Map and logic to add new and known(to the map)
// words

这产生:

“the”出现1次

“自动”出现2次

“她”出现 1 次

"in" 出现 1 次

“灌木”出现1次

“胜过”出现 1 次

“番茄”出现1次

“矮牵牛”出现1次

问候

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM