简体   繁体   English

查找文档中出现的单词或短语的次数

[英]Find how many times a word or phrase occurs in a document

I am working on a GUI that reads in a file and searches it for how many times a word a phrase occurs. 我正在研究一个读取文件的GUI,并搜索一个单词出现的次数。 I got the code working when searching for individual words, but not phrases. 我在搜索单个单词时使代码工作,但不是短语。 I have posted the specific method for doing this below, can anyone help me? 我已经发布了下面这样做的具体方法,任何人都可以帮助我吗?

public void run() {
    File f = new File("ARI Test.txt");
    try {
        Scanner scanner = new Scanner(f);
        while (scanner.hasNext())
        {
            String str = scanner.next();
            if (str.equals(word))
                count++;
        }
        SwingUtilities.invokeLater(new Runnable() {
            @Override
            public void run() {
                textArea.append(word + " appears: " + count + " time(s)\n");
            }
        });
        scanner.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}

There might be something wrong with the scanner logic. 扫描仪逻辑可能有问题。 When you call scanner.next it will only return the next word but not a whole line. 当你调用scanner.next时,它只会返回下一个单词而不是整行。

Consider that your textfile contains 'Java is good, java is good'. 考虑一下你的文本文件包含'Java是好的,java是好的'。 And you're searching for 'Java is good'. 而你正在寻找'Java是好的'。 Then you're using scanner.next which will return Java, and then you're asking if thats equals to 'Java is good'. 然后你使用的是scan.next,它将返回Java,然后你会问这是否等于'Java is good'。 Obviously that will return a false. 显然会返回虚假。

@Mikkel Andersen is on the right path. @Mikkel Andersen正走在正确的道路上。 The JavaDoc for Scanner states that next works off of a delimiter, and that the default delimiter is whitespace. JavaDoc for Scanner声明next是分隔符,默认分隔符是空格。 While Scanner does provide methods to change its delimiter, I believe that the hasNext(String) and next(String) will be of greater use in this case. 虽然Scanner确实提供了更改其分隔符的方法,但我相信在这种情况下, hasNext(String)next(String)将更有用。 To use these methods, you will need to modify your while loop as follows. 要使用这些方法,您需要修改while循环,如下所示。

 while(scanner.hasNext(word))
 {
     scanner.next(word);
     count++;
 }

Edit: It is also worth mentioning that you may still encounter problems with line breaks. 编辑:还值得一提的是,您可能仍会遇到换行问题。 Since Scanner may see "Java is\\ngood" not "Java is good" To combat this you will need use regular expressions when entering your phrases. 由于Scanner可能会看到“Java is \\ ngood”而非“Java is good”。为了解决这个问题,您需要在输入短语时使用正则表达式。

The behavior you want is critical to the solution. 您想要的行为对解决方案至关重要。

@FrankPuffer asked a great question: "If your text is "xxxx", how many times does the phrase "xx" occur? Two times or three times?" @FrankPuffer问了一个很棒的问题: “如果你的文字是”xxxx“,短语”xx“会出现多少次?两次或三次?”

Fundamental to this question is how the matches are consumed. 这个问题的基础是如何消耗比赛。 In you responded "three" to his question, the behavior of the scan would be that of single character consumption. 在你对他的问题回答“三”时,扫描的行为将是单个字符消费的行为。 That is after you match on position 0, you only search position 1+ afterward. 也就是说,在匹配位置0之后,您只能在之后搜索位置1+。 This is contrasted with a non-overlapping search, which increments the starting search point by word.length . 这与非重叠搜索形成对比,后者通过word.length增加起始搜索点。

You said this: 你说的这个:

Yeah, if I want to find "Java is good" from "Java is good, but ___ is better", the result should be 0 times. 是的,如果我想从“Java很好,但___更好”中找到“Java很好”,结果应该是0次。

This tells me you want neither of these solutions. 这告诉我你不想要这些解决方案。 It sounds like you want "the number of times a search parameter matches a line in a list." 听起来你想要“搜索参数与列表中的行匹配的次数”。 If that is the case, this is easy. 如果是这种情况,这很容易。

Code

public void run() {
    File f = new File("ARI Test.txt");
    try {
        Scanner scanner = new Scanner(f);
        while (scanner.hasNextLine())
        {
            String line = scanner.nextLine();
            if (line.equals(word))
                count++; 
        }
        SwingUtilities.invokeLater(new Runnable() {
            @Override
            public void run() {
                textArea.append(word + " appears: " + count + " time(s)\n");
            }
        });
        scanner.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}

If all you need is only the occurrence count then my solution will be simpler 如果您只需要发生次数,那么我的解决方案将更简单

public class SentenceCounter
{    
  public static void main(String[] args)
  {
    //The sentence for which you need to find the occurrence count
    String sentence = "Game of Thrones is";

    //Find the length of the sentence
    int sentenceLength = sentence.length();

    //This is the original text in which you are going to search
    String text = "Game of Thrones is a wonderful series. Game of Thrones is also a most famous series. Game of Thrones is and always will be the best HBO series";

    //Calculate the length of the entire text
    int initialLength = text.length();

    //Perform String 'replaceAll' operation to remove the sentence from original text
    text = text.replaceAll(sentence, "");

    //Calculate the new length of the 'text'
    int newLength = text.length();

    //Below formula should give you the No. of times the 'sentence' has occurred in the 'text'
    System.out.println((initialLength - newLength) / sentenceLength);
  } 
}

If you understand the logic then I think you can edit your code accordingly. 如果您了解逻辑,那么我认为您可以相应地编辑您的代码。 Hope this helps! 希望这可以帮助!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 一个数字出现多少次 - How many times a number occurs 查找哈希集中每个单词在文本文档中出现的次数 - Finding the number of times each word in a hashset occurs in text document 如何通过从文件中读取整数来找到整数出现的次数和最长的序列? - How do I find how many times an integer occurs and the longest sequence by reading it from a file? 您如何在数组/字符串中找到特定值以及在数组/字符串中出现了多少次 - How do you find a specific value in an array/string and how many times it occurs in the array/string 如何在字符串中搜索短语而不多次查找短语中的单词 - How do I search for a phrase in a string without looking for the word in the phrase multiple times 如果字典单词中的所有字符都出现在短语中,则正则表达式匹配。 每个字符出现的次数也必须相互匹配 - Regex match if all characters in a dictionary word are present in the phrase. The number of times each character occurs must also match in each other 如何处理一些事件发生多少次的计数? - How to handle count of how many times some events occurs? 如何获取元素在ArrayList中出现的次数 - How to get how many times an element occurs in ArrayList 如何使用 indexOf() 查找短语中第二个单词的字符; 和char方法? - How to find the characters of the second word in a phrase using indexOf(); and char method? 读取字符在字符串中出现多少次时出错 - errors reading how many times a character occurs in a string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM