[英]Find how many times a word or phrase occurs in a document
I am working on a GUI that reads in a file and searches it for how many times a word a phrase occurs. 我正在研究一个读取文件的GUI,并搜索一个单词出现的次数。 I got the code working when searching for individual words, but not phrases. 我在搜索单个单词时使代码工作,但不是短语。 I have posted the specific method for doing this below, can anyone help me? 我已经发布了下面这样做的具体方法,任何人都可以帮助我吗?
public void run() {
File f = new File("ARI Test.txt");
try {
Scanner scanner = new Scanner(f);
while (scanner.hasNext())
{
String str = scanner.next();
if (str.equals(word))
count++;
}
SwingUtilities.invokeLater(new Runnable() {
@Override
public void run() {
textArea.append(word + " appears: " + count + " time(s)\n");
}
});
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
There might be something wrong with the scanner logic. 扫描仪逻辑可能有问题。 When you call scanner.next it will only return the next word but not a whole line. 当你调用scanner.next时,它只会返回下一个单词而不是整行。
Consider that your textfile contains 'Java is good, java is good'. 考虑一下你的文本文件包含'Java是好的,java是好的'。 And you're searching for 'Java is good'. 而你正在寻找'Java是好的'。 Then you're using scanner.next which will return Java, and then you're asking if thats equals to 'Java is good'. 然后你使用的是scan.next,它将返回Java,然后你会问这是否等于'Java is good'。 Obviously that will return a false. 显然会返回虚假。
@Mikkel Andersen is on the right path. @Mikkel Andersen正走在正确的道路上。 The JavaDoc for Scanner
states that next
works off of a delimiter, and that the default delimiter is whitespace. JavaDoc for Scanner
声明next
是分隔符,默认分隔符是空格。 While Scanner
does provide methods to change its delimiter, I believe that the hasNext(String)
and next(String)
will be of greater use in this case. 虽然Scanner
确实提供了更改其分隔符的方法,但我相信在这种情况下, hasNext(String)
和next(String)
将更有用。 To use these methods, you will need to modify your while loop as follows. 要使用这些方法,您需要修改while循环,如下所示。
while(scanner.hasNext(word))
{
scanner.next(word);
count++;
}
Edit: It is also worth mentioning that you may still encounter problems with line breaks. 编辑:还值得一提的是,您可能仍会遇到换行问题。 Since Scanner
may see "Java is\\ngood" not "Java is good" To combat this you will need use regular expressions when entering your phrases. 由于Scanner
可能会看到“Java is \\ ngood”而非“Java is good”。为了解决这个问题,您需要在输入短语时使用正则表达式。
@FrankPuffer asked a great question: "If your text is "xxxx", how many times does the phrase "xx" occur? Two times or three times?" @FrankPuffer问了一个很棒的问题: “如果你的文字是”xxxx“,短语”xx“会出现多少次?两次或三次?”
Fundamental to this question is how the matches are consumed. 这个问题的基础是如何消耗比赛。 In you responded "three" to his question, the behavior of the scan would be that of single character consumption. 在你对他的问题回答“三”时,扫描的行为将是单个字符消费的行为。 That is after you match on position 0, you only search position 1+ afterward. 也就是说,在匹配位置0之后,您只能在之后搜索位置1+。 This is contrasted with a non-overlapping search, which increments the starting search point by word.length
. 这与非重叠搜索形成对比,后者通过word.length
增加起始搜索点。
You said this: 你说的这个:
Yeah, if I want to find "Java is good" from "Java is good, but ___ is better", the result should be 0 times. 是的,如果我想从“Java很好,但___更好”中找到“Java很好”,结果应该是0次。
This tells me you want neither of these solutions. 这告诉我你不想要这些解决方案。 It sounds like you want "the number of times a search parameter matches a line in a list." 听起来你想要“搜索参数与列表中的行匹配的次数”。 If that is the case, this is easy. 如果是这种情况,这很容易。
public void run() {
File f = new File("ARI Test.txt");
try {
Scanner scanner = new Scanner(f);
while (scanner.hasNextLine())
{
String line = scanner.nextLine();
if (line.equals(word))
count++;
}
SwingUtilities.invokeLater(new Runnable() {
@Override
public void run() {
textArea.append(word + " appears: " + count + " time(s)\n");
}
});
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
If all you need is only the occurrence count then my solution will be simpler 如果您只需要发生次数,那么我的解决方案将更简单
public class SentenceCounter
{
public static void main(String[] args)
{
//The sentence for which you need to find the occurrence count
String sentence = "Game of Thrones is";
//Find the length of the sentence
int sentenceLength = sentence.length();
//This is the original text in which you are going to search
String text = "Game of Thrones is a wonderful series. Game of Thrones is also a most famous series. Game of Thrones is and always will be the best HBO series";
//Calculate the length of the entire text
int initialLength = text.length();
//Perform String 'replaceAll' operation to remove the sentence from original text
text = text.replaceAll(sentence, "");
//Calculate the new length of the 'text'
int newLength = text.length();
//Below formula should give you the No. of times the 'sentence' has occurred in the 'text'
System.out.println((initialLength - newLength) / sentenceLength);
}
}
If you understand the logic then I think you can edit your code accordingly. 如果您了解逻辑,那么我认为您可以相应地编辑您的代码。 Hope this helps! 希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.