简体   繁体   中英

Find how many times a word or phrase occurs in a document

I am working on a GUI that reads in a file and searches it for how many times a word a phrase occurs. I got the code working when searching for individual words, but not phrases. I have posted the specific method for doing this below, can anyone help me?

public void run() {
    File f = new File("ARI Test.txt");
    try {
        Scanner scanner = new Scanner(f);
        while (scanner.hasNext())
        {
            String str = scanner.next();
            if (str.equals(word))
                count++;
        }
        SwingUtilities.invokeLater(new Runnable() {
            @Override
            public void run() {
                textArea.append(word + " appears: " + count + " time(s)\n");
            }
        });
        scanner.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}

There might be something wrong with the scanner logic. When you call scanner.next it will only return the next word but not a whole line.

Consider that your textfile contains 'Java is good, java is good'. And you're searching for 'Java is good'. Then you're using scanner.next which will return Java, and then you're asking if thats equals to 'Java is good'. Obviously that will return a false.

@Mikkel Andersen is on the right path. The JavaDoc for Scanner states that next works off of a delimiter, and that the default delimiter is whitespace. While Scanner does provide methods to change its delimiter, I believe that the hasNext(String) and next(String) will be of greater use in this case. To use these methods, you will need to modify your while loop as follows.

 while(scanner.hasNext(word))
 {
     scanner.next(word);
     count++;
 }

Edit: It is also worth mentioning that you may still encounter problems with line breaks. Since Scanner may see "Java is\\ngood" not "Java is good" To combat this you will need use regular expressions when entering your phrases.

The behavior you want is critical to the solution.

@FrankPuffer asked a great question: "If your text is "xxxx", how many times does the phrase "xx" occur? Two times or three times?"

Fundamental to this question is how the matches are consumed. In you responded "three" to his question, the behavior of the scan would be that of single character consumption. That is after you match on position 0, you only search position 1+ afterward. This is contrasted with a non-overlapping search, which increments the starting search point by word.length .

You said this:

Yeah, if I want to find "Java is good" from "Java is good, but ___ is better", the result should be 0 times.

This tells me you want neither of these solutions. It sounds like you want "the number of times a search parameter matches a line in a list." If that is the case, this is easy.

Code

public void run() {
    File f = new File("ARI Test.txt");
    try {
        Scanner scanner = new Scanner(f);
        while (scanner.hasNextLine())
        {
            String line = scanner.nextLine();
            if (line.equals(word))
                count++; 
        }
        SwingUtilities.invokeLater(new Runnable() {
            @Override
            public void run() {
                textArea.append(word + " appears: " + count + " time(s)\n");
            }
        });
        scanner.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}

If all you need is only the occurrence count then my solution will be simpler

public class SentenceCounter
{    
  public static void main(String[] args)
  {
    //The sentence for which you need to find the occurrence count
    String sentence = "Game of Thrones is";

    //Find the length of the sentence
    int sentenceLength = sentence.length();

    //This is the original text in which you are going to search
    String text = "Game of Thrones is a wonderful series. Game of Thrones is also a most famous series. Game of Thrones is and always will be the best HBO series";

    //Calculate the length of the entire text
    int initialLength = text.length();

    //Perform String 'replaceAll' operation to remove the sentence from original text
    text = text.replaceAll(sentence, "");

    //Calculate the new length of the 'text'
    int newLength = text.length();

    //Below formula should give you the No. of times the 'sentence' has occurred in the 'text'
    System.out.println((initialLength - newLength) / sentenceLength);
  } 
}

If you understand the logic then I think you can edit your code accordingly. Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM