[英]How to generate a matrix of frequency of consecutive characters from txt file in java?
[英]How to calculate the frequency of a word from a txt file - Java
我需要一些有关此代码的帮助。 我希望我的程序计算与描述的模式匹配的每个单词的频率。
public class Project {
public static void main(String[] args) throws FileNotFoundException{
Scanner INPUT_TEXT = new Scanner(new File("moviereview.txt")).useDelimiter(" ");
String pattern = "[a-zA-Z'-]+";
Pattern r = Pattern.compile(pattern);
int occurences=0;
while(INPUT_TEXT.hasNext()){
//read next word
String Stringcandidate=INPUT_TEXT.next();
//see if pattern matches (boolean find)
if(r.matcher(Stringcandidate).find()) {
occurences++; //increment occurences if pattern is found
String moviereview = m.group(0); //retrieve found string
String moviereview2 = moviereview.toLowerCase(); // ???
System.out.println(moviereview2 + " appears " + occurences);
if(occurences>1){
System.out.println(" times\n");
}
else{
System.out.println(" time\n");
}
}
INPUT_TEXT.close();//Close your Scanner.
}
}
正如我之前的评论中所述,可以使用Map
实现(如HashMap
)来存储匹配的单词及其出现次数/频率。
我建议将程序的功能封装到更小的方法/类中,这样每个方法/类只做一个小任务。 所以代码可以更好地阅读。
我假设你的文件包含字符串“汽车灌木在矮牵牛汽车中胜过她的番茄”
这是代码:
package how_to_calculate_the_frequency;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Project {
HashMap<String, Integer> map = new HashMap<String, Integer>();
public static void main(String[] args){
Project project = new Project();
Scanner INPUT_TEXT = project.readFile();
project.analyse(INPUT_TEXT);
project.showResults();
}
/**
* logic to count the occurences of words matched by REGEX in a scanner that
* loaded some text
*
* @param scanner
* the scanner holding the text
*/
public void analyse(Scanner scanner) {
String pattern = "[a-zA-Z'-]+";
Pattern r = Pattern.compile(pattern);
while (scanner.hasNext()) {
// read next word
String Stringcandidate = scanner.next();
// see if pattern matches (boolean find)
Matcher matcher = r.matcher(Stringcandidate);
if (matcher.find()) {
String matchedWord = matcher.group();
//System.out.println(matchedWord); //check what is matched
this.addWord(matchedWord);
}
}
scanner.close();// Close your Scanner.
}
/**
* adds a word to the <word,count> Map if the word is new, a new entry is
* created, otherwise the count of this word is incremented
*/
public void addWord(String matchedWord) {
if (map.containsKey(matchedWord)) {
// increment occurrence
int occurrence = map.get(matchedWord);
occurrence++;
map.put(matchedWord, occurrence);
} else {
// add word and set occurrence to 1
map.put(matchedWord, 1);
}
}
/**
* reads a file from disk and returns a scanner to analyse it
*
* @return the file from disk as scanner
*/
public Scanner readFile() {
Scanner scanner = null;
/* use that for reading a file from disk
* try { scanner = new Scanner(new
* File("moviereview.txt")).useDelimiter(" "); } catch (Exception e) {
* e.printStackTrace(); }
*/
scanner = new Scanner("auto bush trumped her tomato in the petunia auto");
return scanner;
}
/**
* prints the matched words and their occurrences
* in a readable way
*/
public void showResults() {
for (HashMap.Entry<String, Integer> matchedWord : map.entrySet()) {
int occurrence = matchedWord.getValue();
System.out.print("\"" + matchedWord.getKey() + "\" appears " + occurrence);
if (occurrence > 1) {
System.out.print(" times\n");
} else {
System.out.print(" time\n");
}
}
// or as the new Java 8 lambda expression
// map.forEach((word,occurrence)->System.out.println("\"" + word + "\"
// appears " + occurrence + " times"));
}
}
// DONE seperate reading a file, analysing the file and
// word-frequency-counting-logic in different
// methods
// Done implement <word,count> Map and logic to add new and known(to the map)
// words
这产生:
“the”出现1次
“自动”出现2次
“她”出现 1 次
"in" 出现 1 次
“灌木”出现1次
“胜过”出现 1 次
“番茄”出现1次
“矮牵牛”出现1次
问候
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.