[英]How do you find words in a text file and print the most frequent word shown using array?
I'm having trouble of figuring out how to find the most frequent word and the most frequent case-insensitive word for a program.我无法弄清楚如何为程序找到最常用的词和最常用的不区分大小写的词。 I have a scanner that reads through the text file and a while loop, but still doesn't know how to implement what I'm trying to find.我有一个扫描仪可以读取文本文件和 while 循环,但仍然不知道如何实现我想要查找的内容。 Do I use a different string function to read and print the word out?我是否使用不同的字符串函数来读取和打印单词?
Here is my code as of now:这是我现在的代码:
public class letters {
public static void main(String[] args) throws FileNotFoundException {
FileInputStream fis = new FileInputStream("input.txt");
Scanner scanner = new Scanner(fis);
String word[] = new String[500];
while (scanner.hasNextLine()) {
String s = scanner.nextLine();
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
}
}
String []roll = s.split("\\s");
for(int i=0;i<roll.length;i++){
String lin = roll[i];
//System.out.println(lin);
}
}
This is what I have so far.这是我到目前为止。 I need the output to say:我需要输出说:
Word:
6 roll
Case-insensitive word:
18 roll
And here is my input file:这是我的输入文件:
@
roll tide roll!
Roll Tide Roll!
ROLL TIDE ROLL!
ROll tIDE ROll!
roll tide roll!
Roll Tide Roll!
ROLL TIDE ROLL!
roll tide roll!
Roll Tide Roll !
@
65-43+21= 43
65.0-43.0+21.0= 43.0
65 -43 +21 = 43
65.0 -43.0 +21.0 = 43.0
65 - 43 + 21 = 43
65.00 - 43.0 + 21.000 = +0043.0000
65 - 43 + 21 = 43
I just need it to find the most occuring word(Which is the maximal consecutive sequence of letters)(which is roll) and print out how many times it is located(which is 6) .我只需要它来找到出现次数最多的单词(这是字母的最大连续序列)(这是滚动)并打印出它所在的次数(这是 6)。 If anybody can help me on this, that would be really great!如果有人能在这方面帮助我,那就太好了! thanks谢谢
Consider using a Map<String,Integer>
for the word then you can implement this to count words and will be work for any number of words.考虑对单词使用Map<String,Integer>
,然后您可以实现它来计算单词并且适用于任意数量的单词。 See Documentation for Map . 请参阅 Map 文档。
Like this (would require modification for case insensitive)像这样(不区分大小写需要修改)
public Map<String,Integer> words_count = new HashMap<String,Integer>();
//read your line (you will have to determine if this line should be split or is equations
//also just noticed that the trailing '!' would need to be removed
String[] words = line.split("\\s+");
for(int i=0;i<words.length;i++)
{
String s = words[i];
if(words_count.ketSet().contains(s))
{
Integer count = words_count.get(s) + 1;
words_count.put(s, count)
}
else
words_count.put(s, 1)
}
Then you have the number of occurrences for each word in the string and to get the most occurring do something like然后你有字符串中每个单词的出现次数,并获得最多出现的次数,例如
Integer frequency = null;
String mostFrequent = null;
for(String s : words_count.ketSet())
{
Integer i = words_count.get(s);
if(frequency == null)
frequency = i;
if(i > frequency)
{
frequency = i;
mostFrequent = s;
}
}
Then to print然后打印
System.out.println("The word "+ mostFrequent +" occurred "+ frequency +" times");
Start with accumulating all the words into a Map as follows:首先将所有单词累积到 Map 中,如下所示:
...
String[] roll = s.split("\\s+");
for (final String word : roll) {
Integer qty = words.get(word);
if (qty == null) {
qty = 1;
} else {
qty = qty + 1;
}
words.put(word, qty);
}
...
Then you need to figure out which has the biggest score:然后你需要找出哪个得分最高:
String bestWord;
int maxQty = 0;
for(final String word : words.keySet()) {
if(words.get(word) > maxQty) {
maxQty = words.get(word);
bestWord = word;
}
}
System.out.println("Word:");
System.out.println(Integer.toString(maxQty) + " " + bestWord);
And last you need to merge all forms of the same word together:最后,您需要将同一单词的所有形式合并在一起:
Map<String, Integer> wordsNoCase = new HashMap<String, Integer>();
for(final String word : words.keySet()) {
Integer qty = wordsNoCase.get(word.toLowerCase());
if(qty == null) {
qty = words.get(word);
} else {
qty += words.get(word);
}
wordsNoCase.put(word.toLowerCase(), qty);
}
words = wordsNoCase;
Then re-run the previous code snippet to find the word with the biggest score.然后重新运行之前的代码片段,找到得分最高的单词。
Try to use HashMap for better results.尝试使用 HashMap 以获得更好的结果。 You need to use BufferedReader
and Filereader
for taking input file as follows:您需要使用BufferedReader
和Filereader
来获取输入文件,如下所示:
FileReader text = new FileReader("file.txt");
BufferedReader textFile = new BufferedReader(text);
The Bufferedreader
object textfile
needs to passed as a parameter to the method below: Bufferedreader
对象textfile
需要作为参数传递给以下方法:
public HashMap<String, Integer> countWordFrequency(BufferedReader textFile) throws IOException
{
/*This method finds the frequency of words in a text file
* and saves the word and its corresponding frequency in
* a HashMap.
*/
HashMap<String, Integer> mapper = new HashMap<String, Integer>();
StringBuffer multiLine = new StringBuffer("");
String line = null;
if(textFile.ready())
{
while((line = textFile.readLine()) != null)
{
multiLine.append(line);
String[] words = line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
for(String word : words)
{
if(!word.isEmpty())
{
Integer freq = mapper.get(word);
if(freq == null)
{
mapper.put(word, 1);
}
else
{
mapper.put(word, freq+1);
}
}
}
}
textFile.close();
}
return mapper;
}
The line line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
行line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
is used for replacing all the characters other than alphabets, the it makes all the words in lower case (which solves your case insensitive problem) and then splits the words seperated by spaces.用于替换字母以外的所有字符,它使所有单词都为小写(这解决了不区分大小写的问题),然后拆分由空格分隔的单词。
/*This method finds the highest value in HashMap
* and returns the same.
*/
public int maxFrequency(HashMap<String, Integer> mapper)
{
int maxValue = Integer.MIN_VALUE;
for(int value : mapper.values())
{
if(value > maxValue)
{
maxValue = value;
}
}
return maxValue;
}
The above code returns that value in hashmap which is highest.上面的代码返回 hashmap 中最高的那个值。
/*This method prints the HashMap Key with a particular Value.
*/
public void printWithValue(HashMap<String, Integer> mapper, Integer value)
{
for (Entry<String, Integer> entry : mapper.entrySet())
{
if (entry.getValue().equals(value))
{
System.out.println("Word : " + entry.getKey() + " \nFrequency : " + entry.getValue());
}
}
}
Now you can print the most frequent word along with its frequency as above.现在您可以打印最常用的单词及其频率,如上所示。
/* i have declared LinkedHashMap containing String as a key and occurrences as a value.
* Creating BufferedReader object
* Reading the first line into currentLine
* Declere while-loop & splitting the currentLine into words
* iterated using for loop. Inside for loop, i have an if else statement
* If word is present in Map increment it's count by 1 else set to 1 as value
* Reading next line into currentLine
*/
public static void main(String[] args) {
Map<String, Integer> map = new LinkedHashMap<String, Integer>();
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("F:\\chidanand\\javaIO\\Student.txt"));
String currentLine = reader.readLine();
while (currentLine!= null) {
String[] input = currentLine.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
for (int i = 0; i < input.length; i++) {
if (map.containsKey(input[i])) {
int count = map.get(input[i]);
map.put(input[i], count + 1);
} else {
map.put(input[i], 1);
}
}
currentLine = reader.readLine();
}
String mostRepeatedWord = null;
int count = 0;
for (Entry<String, Integer> m:map.entrySet())
{
if(m.getValue() > count)
{
mostRepeatedWord = m.getKey();
count = m.getValue();
}
}
System.out.println("The most repeated word in input file is : "+mostRepeatedWord);
System.out.println("Number Of Occurrences : "+count);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.