简体   繁体   English

如何在文本文件中查找单词并打印使用数组显示的最常用单词?

[英]How do you find words in a text file and print the most frequent word shown using array?

I'm having trouble of figuring out how to find the most frequent word and the most frequent case-insensitive word for a program.我无法弄清楚如何为程序找到最常用的词和最常用的不区分大小写的词。 I have a scanner that reads through the text file and a while loop, but still doesn't know how to implement what I'm trying to find.我有一个扫描仪可以读取文本文件和 while 循环,但仍然不知道如何实现我想要查找的内容。 Do I use a different string function to read and print the word out?我是否使用不同的字符串函数来读取和打印单词?

Here is my code as of now:这是我现在的代码:

public class letters {
public static void main(String[] args) throws FileNotFoundException {
    FileInputStream fis = new FileInputStream("input.txt");
    Scanner scanner = new Scanner(fis);
    String word[] = new String[500];
    while (scanner.hasNextLine()) {
        String s = scanner.nextLine();
        for (int i = 0; i < s.length(); i++) {
            char ch = s.charAt(i);
             }

          }
      String []roll = s.split("\\s");
       for(int i=0;i<roll.length;i++){
           String lin = roll[i];
           //System.out.println(lin);
      }
 }

This is what I have so far.这是我到目前为止。 I need the output to say:我需要输出说:

   Word:
   6 roll

  Case-insensitive word:
  18 roll

And here is my input file:这是我的输入文件:

@
roll tide roll!
Roll Tide Roll!
ROLL TIDE ROLL!
ROll tIDE ROll!
 roll  tide  roll! 
 Roll  Tide  Roll! 
 ROLL  TIDE  ROLL! 
   roll    tide    roll!   
    Roll Tide Roll  !   
@
65-43+21= 43
65.0-43.0+21.0= 43.0
 65 -43 +21 = 43 
 65.0 -43.0 +21.0 = 43.0 
 65 - 43 + 21 = 43 
 65.00 - 43.0 + 21.000 = +0043.0000 
    65   -  43  +   21  =   43  

I just need it to find the most occuring word(Which is the maximal consecutive sequence of letters)(which is roll) and print out how many times it is located(which is 6) .我只需要它来找到出现次数最多的单词(这是字母的最大连续序列)(这是滚动)并打印出它所在的次数(这是 6)。 If anybody can help me on this, that would be really great!如果有人能在这方面帮助我,那就太好了! thanks谢谢

Consider using a Map<String,Integer> for the word then you can implement this to count words and will be work for any number of words.考虑对单词使用Map<String,Integer> ,然后您可以实现它来计算单词并且适用于任意数量的单词。 See Documentation for Map . 请参阅 Map 文档

Like this (would require modification for case insensitive)像这样(不区分大小写需要修改)

public Map<String,Integer> words_count = new HashMap<String,Integer>();

//read your line (you will have to determine if this line should be split or is equations
//also just noticed that the trailing '!' would need to be removed

String[] words = line.split("\\s+");
for(int i=0;i<words.length;i++)
{
     String s = words[i];
     if(words_count.ketSet().contains(s))
     {
          Integer count = words_count.get(s) + 1;
          words_count.put(s, count)
     }
     else
          words_count.put(s, 1)

}

Then you have the number of occurrences for each word in the string and to get the most occurring do something like然后你有字符串中每个单词的出现次数,并获得最多出现的次数,例如

Integer frequency = null;
String mostFrequent = null;
for(String s : words_count.ketSet())
{
    Integer i = words_count.get(s);
    if(frequency == null)
         frequency = i;
    if(i > frequency)
    {
         frequency = i;
         mostFrequent = s;
    }
}

Then to print然后打印

System.out.println("The word "+ mostFrequent +" occurred "+ frequency +" times");

Start with accumulating all the words into a Map as follows:首先将所有单词累积到 Map 中,如下所示:

...
String[] roll = s.split("\\s+");
for (final String word : roll) {
    Integer qty = words.get(word);
    if (qty == null) {
        qty = 1;
    } else {
        qty = qty + 1;
    }
    words.put(word, qty);
}
...

Then you need to figure out which has the biggest score:然后你需要找出哪个得分最高:

String bestWord;
int maxQty = 0;
for(final String word : words.keySet()) {
    if(words.get(word) > maxQty) {
        maxQty = words.get(word);
        bestWord = word;
    }
}
System.out.println("Word:");
System.out.println(Integer.toString(maxQty) + " " + bestWord);        

And last you need to merge all forms of the same word together:最后,您需要将同一单词的所有形式合并在一起:

Map<String, Integer> wordsNoCase = new HashMap<String, Integer>();
for(final String word : words.keySet()) {
    Integer qty = wordsNoCase.get(word.toLowerCase());
    if(qty == null) {
        qty = words.get(word);
    } else {
        qty += words.get(word);
    }
    wordsNoCase.put(word.toLowerCase(), qty);
}
words = wordsNoCase;

Then re-run the previous code snippet to find the word with the biggest score.然后重新运行之前的代码片段,找到得分最高的单词。

Try to use HashMap for better results.尝试使用 HashMap 以获得更好的结果。 You need to use BufferedReader and Filereader for taking input file as follows:您需要使用BufferedReaderFilereader来获取输入文件,如下所示:

FileReader text = new FileReader("file.txt");
BufferedReader textFile = new BufferedReader(text);

The Bufferedreader object textfile needs to passed as a parameter to the method below: Bufferedreader对象textfile需要作为参数传递给以下方法:

public HashMap<String, Integer> countWordFrequency(BufferedReader textFile) throws IOException
{
/*This method finds the frequency of words in a text file
 * and saves the word and its corresponding frequency in 
 * a HashMap.
 */
    HashMap<String, Integer> mapper = new HashMap<String, Integer>();
    StringBuffer multiLine = new StringBuffer("");
    String line = null;
    if(textFile.ready())
    {
        while((line = textFile.readLine()) != null)
        {
            multiLine.append(line);
            String[] words = line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
            for(String word : words)
            {
                if(!word.isEmpty())
                {
                    Integer freq = mapper.get(word);
                    if(freq == null)
                    {
                        mapper.put(word, 1);
                    }
                    else
                    {
                        mapper.put(word, freq+1);
                    }
                }
            }
        }
        textFile.close();
    }
    return mapper;
}

The line line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" "); is used for replacing all the characters other than alphabets, the it makes all the words in lower case (which solves your case insensitive problem) and then splits the words seperated by spaces.用于替换字母以外的所有字符,它使所有单词都为小写(这解决了不区分大小写的问题),然后拆分由空格分隔的单词。

/*This method finds the highest value in HashMap
 * and returns the same.
 */
public int maxFrequency(HashMap<String, Integer> mapper)
{
    int maxValue = Integer.MIN_VALUE;
    for(int value : mapper.values())
    {
        if(value > maxValue)
        {
            maxValue = value;
        }
    }
    return maxValue;
}

The above code returns that value in hashmap which is highest.上面的代码返回 hashmap 中最高的那个值。

/*This method prints the HashMap Key with a particular Value.
 */
public void printWithValue(HashMap<String, Integer> mapper, Integer value)
{
    for (Entry<String, Integer> entry : mapper.entrySet()) 
    {
        if (entry.getValue().equals(value)) 
        {
            System.out.println("Word : " + entry.getKey() + " \nFrequency : " + entry.getValue());
        }
    }
}

Now you can print the most frequent word along with its frequency as above.现在您可以打印最常用的单词及其频率,如上所示。

    /*  i have declared LinkedHashMap containing String as a key and occurrences as  a value.
     * Creating BufferedReader object
     * Reading the first line into currentLine
     * Declere while-loop & splitting the currentLine into words
     * iterated using for loop. Inside for loop, i have an if else statement
     * If word is present in Map increment it's count by 1 else set to 1 as value
     * Reading next line into currentLine
     */
    public static void main(String[] args) {

        Map<String, Integer> map = new LinkedHashMap<String, Integer>();

        BufferedReader reader = null;

        try {
            reader = new BufferedReader(new FileReader("F:\\chidanand\\javaIO\\Student.txt"));
              String currentLine = reader.readLine();
            while (currentLine!= null) {
                String[] input = currentLine.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
                  for (int i = 0; i < input.length; i++) {
                    if (map.containsKey(input[i])) {
                        int count = map.get(input[i]);
                        map.put(input[i], count + 1);

                    } else {
                        map.put(input[i], 1);
                    }

                }
                   currentLine = reader.readLine();
            }

            String mostRepeatedWord = null;
             int count = 0;
                 for (Entry<String, Integer> m:map.entrySet())
                    {
                        if(m.getValue() > count)
                        {
                           mostRepeatedWord = m.getKey();

                            count = m.getValue();
                        }
                    }

                 System.out.println("The most repeated word in input file is : "+mostRepeatedWord);

                    System.out.println("Number Of Occurrences : "+count);

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                reader.close();
            } catch (IOException e) {
                e.printStackTrace();
            }

        }

    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 bufferedreader 和 treemap 打印文本中最常见的单词? - Java - How can I print the most frequent word in a text using bufferedreader and treemap ? - Java 如何找到文本中出现频率最高的单词? - How can I find the most frequent word in a text? 在网页上找到最常用的单词(使用Jsoup)? - Find most frequent words on a webpage (using Jsoup)? 在文本文件中查找最常见的单词 - Finding the most frequent word in a text file 如何在大量单词(例如900000)中找到最常见的单词 - How can I find the most frequent word in a huge amount of words (eg. 900000) 如何在二叉搜索树中打印出第n个最常用的单词? - How to print out the nth most frequent words in a binary search tree? 使用Java从文件中查找“ n”个最常见的单词? - Finding 'n' most frequent words from a file using Java? 如何使用Java 8流获取Map中最常用的单词以及相应的出现频率? - How do I get the most frequent word in a Map and it's corresponding frequency of occurrence using Java 8 streams? TreeSet在书中找到k个最常用的单词? - TreeSet to find k most frequent words in a book? 如何打印出数组中最常见的所有元素 - How to print out all elemets that is most frequent elemets in a array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM