简体   繁体   中英

Java: String.contains matches exact word

In Java

String term = "search engines"
String subterm_1 = "engine"
String subterm_2 = "engines"

If I do term.contains(subterm_1) it returns true . I don't want that. I want the subterm to exactly match one of the words in term

Therefore something like term.contains(subterm_1) returns false and term.contains(subterm_2) returns true

\\b Matches a word boundary where a word character is [a-zA-Z0-9_].

This should work for you, and you could easily reuse this method.

public class testMatcher {
public static void main(String[] args){

    String source1="search engines";
    String source2="search engine";
    String subterm_1 = "engines";
    String subterm_2 = "engine";

    System.out.println(isContain(source1,subterm_1));
    System.out.println(isContain(source2,subterm_1));
    System.out.println(isContain(source1,subterm_2));
    System.out.println(isContain(source2,subterm_2));

}

    private static boolean isContain(String source, String subItem){
         String pattern = "\\b"+subItem+"\\b";
         Pattern p=Pattern.compile(pattern);
         Matcher m=p.matcher(source);
         return m.find();
    }

}

Output:

true
false
false
true

If the words are always separated by spaces, this is one way to go:

String string = "search engines";
String[] parts = string.split(" ");
for(int i = 0; i < parts.length; i++) {
    if(parts[i].equals("engine")) {
    //do whatever you want
}

I would suggest using word boundaries. If you compile a pattern like \\bengines\\b, your regular expression will only match on complete words.

Here is an explanation of word boundaries, as well as some examples. http://www.regular-expressions.info/wordboundaries.html

Also, here is the java API for the pattern, which does include word boundaries http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Here is an example using your requirements above

  Pattern p = Pattern.compile("\\bengines\\b");
  Matcher m = p.matcher("search engines");
  System.out.println("matches: " + m.find());

  p = Pattern.compile("\\bengine\\b");
  m = p.matcher("search engines");
  System.out.println("matches: " + m.find());

and here is the output:

matches: true
matches: false

Use indexOf instead and then check whether char at the poistion

index + length of string plus +1 == ` ` or EOS

or I am sure there is a regex way as well.

I want the subterm to exactly match one of the words in term

Then you can't use contains() . You could split the term into words and check equality (with or without case sensitivity).

boolean hasTerm = false;
for (String word : term.split("\\s+") {
  if (word.equals("engine")) {
    hasTerm = true;
    break;
  }
}

Since the contains method verify if does exist that array of char in the string, it will aways return true, you will have to use Regex to make this validation.

If the words are aways separed by space it is easier, you can use the \\s regex to get it.

Here is a good tutorial: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html

One approach could be to split the string by spaces, convert it to a list, and then use the contains method to check for exact matches, like so:

String[] results = term.split("\\s+");
Boolean matchFound = Arrays.asList(results).contains(subterm_1);

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM