I need some help with this code. I want my program to calculate the frequency of each word matched from the pattern described.
public class Project {
public static void main(String[] args) throws FileNotFoundException{
Scanner INPUT_TEXT = new Scanner(new File("moviereview.txt")).useDelimiter(" ");
String pattern = "[a-zA-Z'-]+";
Pattern r = Pattern.compile(pattern);
int occurences=0;
while(INPUT_TEXT.hasNext()){
//read next word
String Stringcandidate=INPUT_TEXT.next();
//see if pattern matches (boolean find)
if(r.matcher(Stringcandidate).find()) {
occurences++; //increment occurences if pattern is found
String moviereview = m.group(0); //retrieve found string
String moviereview2 = moviereview.toLowerCase(); // ???
System.out.println(moviereview2 + " appears " + occurences);
if(occurences>1){
System.out.println(" times\n");
}
else{
System.out.println(" time\n");
}
}
INPUT_TEXT.close();//Close your Scanner.
}
}
As described in my comment earlier one can use a Map
implementation, like HashMap
, to store the matched words and their occurrences/frequencies.
I recommend to encapsulate the functionality of the program into smaller methods/classes so that every method/class only does a small task. So the code can be read better.
I assumed your file contained the String "auto bush trumped her tomato in the petunia auto"
Here is the code:
package how_to_calculate_the_frequency;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashMap;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Project {
HashMap<String, Integer> map = new HashMap<String, Integer>();
public static void main(String[] args){
Project project = new Project();
Scanner INPUT_TEXT = project.readFile();
project.analyse(INPUT_TEXT);
project.showResults();
}
/**
* logic to count the occurences of words matched by REGEX in a scanner that
* loaded some text
*
* @param scanner
* the scanner holding the text
*/
public void analyse(Scanner scanner) {
String pattern = "[a-zA-Z'-]+";
Pattern r = Pattern.compile(pattern);
while (scanner.hasNext()) {
// read next word
String Stringcandidate = scanner.next();
// see if pattern matches (boolean find)
Matcher matcher = r.matcher(Stringcandidate);
if (matcher.find()) {
String matchedWord = matcher.group();
//System.out.println(matchedWord); //check what is matched
this.addWord(matchedWord);
}
}
scanner.close();// Close your Scanner.
}
/**
* adds a word to the <word,count> Map if the word is new, a new entry is
* created, otherwise the count of this word is incremented
*/
public void addWord(String matchedWord) {
if (map.containsKey(matchedWord)) {
// increment occurrence
int occurrence = map.get(matchedWord);
occurrence++;
map.put(matchedWord, occurrence);
} else {
// add word and set occurrence to 1
map.put(matchedWord, 1);
}
}
/**
* reads a file from disk and returns a scanner to analyse it
*
* @return the file from disk as scanner
*/
public Scanner readFile() {
Scanner scanner = null;
/* use that for reading a file from disk
* try { scanner = new Scanner(new
* File("moviereview.txt")).useDelimiter(" "); } catch (Exception e) {
* e.printStackTrace(); }
*/
scanner = new Scanner("auto bush trumped her tomato in the petunia auto");
return scanner;
}
/**
* prints the matched words and their occurrences
* in a readable way
*/
public void showResults() {
for (HashMap.Entry<String, Integer> matchedWord : map.entrySet()) {
int occurrence = matchedWord.getValue();
System.out.print("\"" + matchedWord.getKey() + "\" appears " + occurrence);
if (occurrence > 1) {
System.out.print(" times\n");
} else {
System.out.print(" time\n");
}
}
// or as the new Java 8 lambda expression
// map.forEach((word,occurrence)->System.out.println("\"" + word + "\"
// appears " + occurrence + " times"));
}
}
// DONE seperate reading a file, analysing the file and
// word-frequency-counting-logic in different
// methods
// Done implement <word,count> Map and logic to add new and known(to the map)
// words
This yields:
"the" appears 1 time
"auto" appears 2 times
"her" appears 1 time
"in" appears 1 time
"bush" appears 1 time
"trumped" appears 1 time
"tomato" appears 1 time
"petunia" appears 1 time
regards
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.