简体   繁体   中英

Java: Finding the number of word matches in a given string

I am trying to find the number of word matches for a given string and keyword combination, like this:

public int matches(String keyword, String text){
 // ...
}

Example:

Given the following calls:

System.out.println(matches("t", "Today is really great, isn't that GREAT?"));
System.out.println(matches("great", "Today is really great, isn't that GREAT?"));

The result should be:

0
2

So far I found this: Find a complete word in a string java

This only returns if the given keyword exists but not how many occurrences. Also, I am not sure if it ignores case sensitivity (which is important for me).

Remember that substrings should be ignored! I only want full words to be found.


UPDATE

I forgot to mention that I also want keywords that are separated via whitespace to match.

Eg

matches("today is", "Today is really great, isn't that GREAT?")

should return 1

How about taking advantage of indexOf ?

s1 = s1.toLowerCase(Locale.US);
s2 = s2.toLowerCase(Locale.US);
int count = 0;
int x;
int y = s2.length();
while((x=s1.indexOf(s2)) != -1){
   count++;
   s1 = s1.substr(x,x+y);
}
return count;

Efficient version

    int count = 0;
    int y = s2.length();
    for(int i=0; i<=s1.length()-y; i++){
       int lettersMatched = 0;
       int j=0; 
       while(s1[i]==s2[j]){
           j++;
           i++; 
           lettersMatched++;
       }
       if(lettersMatched == y) count++;
    }   
    return count;

For more efficient solution, you will have to modify KMP algorithm a little. Just google it, its simple.

Use a regular expression with word boundaries. It's by far the easiest choice.

  int matches = 0;  
  Matcher matcher = Pattern.compile("\\bgreat\\b", Pattern.CASE_INSENSITIVE).matcher(text);
  while (matcher.find()) matches++;

Your milage may vary on some foreign languages though.

well,you can use "split" to separate the words and find if there exists a word matches exactly. hope that helps!

one option would be RegEx. Basically it sounds like you are looking to match a word with any punctuation on the left or right. so:

" great." " great!" " great " " great," "Great"

would all match, but

"greatest"

wouldn't

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM