简体   繁体   中英

Java regex to match the start of the word?

Objective : for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:

"This is the difficult one Thats it"

I want it to return " true " because of :

This, the, Thats

so consider:

public class HelloWorld{

 public static void main(String []args){

    String term = "t";
    String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
    String str = "This is the difficult one Thats it";
    System.out.println(str.matches(regex));

 }
}

I am getting following Exception :

Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7                                         
/\bt[^\b]*?\b/gi                                                              
       ^                                                                      
        at java.util.regex.Pattern.error(Pattern.java:1924)                   
        at java.util.regex.Pattern.escape(Pattern.java:2416)                  
        at java.util.regex.Pattern.range(Pattern.java:2577)                   
        at java.util.regex.Pattern.clazz(Pattern.java:2507)                   
        at java.util.regex.Pattern.sequence(Pattern.java:2030)                
        at java.util.regex.Pattern.expr(Pattern.java:1964)                    
        at java.util.regex.Pattern.compile(Pattern.java:1665)                 
        at java.util.regex.Pattern.<init>(Pattern.java:1337)                  
        at java.util.regex.Pattern.compile(Pattern.java:1022)                 
        at java.util.regex.Pattern.matches(Pattern.java:1128)                 
        at java.lang.String.matches(String.java:2063)                         
        at HelloWorld.main(HelloWorld.java:8)

Also the following does not work:

import java.util.regex.*;
public class HelloWorld{

 public static void main(String []args){

    String term = "t";
    String regex = "\\b"+term+"gi";
    //String regex = ".";
    System.out.println(regex);
    String str = "This is the difficult one Thats it";
    System.out.println(str.matches(regex));


     Pattern p = Pattern.compile(regex);
     Matcher m = p.matcher(str);
     System.out.println(m.find());
 }
}

Example: { This , one, Two, Those, Thanks } for words This Two Those Thanks; result should be true.

Thanks

Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.

Thus you'd need this instead:

String regex = "(?i)\\b"+term+".*?\\b"

Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html

In Java we don't surround regex with / so instead of "/regex/flags" we just write regex . If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a .
You can also compile your regex into Pattern like this

Pattern pattern = Pattern.compile(regex, flags);

where regex is String (again not enclosed with / ) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE .

Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.

What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either

  • add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
  • use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.

So your code could look like

String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());

In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of

Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);

for safety you should use

Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;

In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".

For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.

If you need to match the full word, use

 String regex = "(?i)\\b"+term + "\\w*";
    String str = "This is the difficult one Thats it";
    String term = "t";
    Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);

    String[] strings = str.split(" ");
    for (String s : strings) {
        if (pattern.matcher(s).matches()) {
            System.out.println(s+"-->"+true);
        } else {
            System.out.println(s+"-->"+false);
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM