简体   繁体   中英

Match word in String in Java

I'm trying to match Strings that contain the word "#SP" (sans quotes, case insensitive) in Java. However, I'm finding using Regexes very difficult!

Strings I need to match: "This is a sample #sp string" , "#SP string text..." , "String text #Sp"

Strings I do not want to match: "Anything with #Spider" , "#Spin #Spoon #SPORK"

Here's what I have so far: http://ideone.com/B7hHkR .Could someone guide me through building my regexp?

I've also tried: "\\\\w*\\\\s*#sp\\\\w*\\\\s*" to no avail.

Edit: Here's the code from IDEone:

java.util.regex.Pattern p = 
    java.util.regex.Pattern.compile("\\b#SP\\b", 
        java.util.regex.Pattern.CASE_INSENSITIVE);

java.util.regex.Matcher m = p.matcher("s #SP s");

if (m.find()) {
    System.out.println("Match!");
}

(edit: positive lookbehind not needed, only matching is done, not replacement)

You are yet another victim of Java's misnamed regex matching methods.

.matches() quite unfortunately so tries to match the whole input , which is a clear violation of the definition of "regex matching" (a regex can match anywhere in the input). The method you need to use is .find() .

This is a braindead API, and unfortunately Java is not the only language having such misguided method names. Python also pleads guilty.

Also, you have the problem that \\\\b will detect on word boundaries and # is not part of a word. You need to use an alternation detecting either the beginning of input or a space.

Your code would need to look like this (non fully qualified classes):

Pattern p = Pattern.compile("(^|\\s)#SP\\b", Pattern.CASE_INSENSITIVE);

Matcher m = p.matcher("s #SP s");

if (m.find()) {
    System.out.println("Match!");
}

You're doing fine, but the \\b in front of the # is misleading. \\b is a word boundary, but # is already not a word character (ie it isn't in the set [0-9A-Za-z_]). Therefore, the space before the # isn't considered a word boundary. Change to:

java.util.regex.Pattern p = 
    java.util.regex.Pattern.compile("(^|\\s)#SP\\b", 
        java.util.regex.Pattern.CASE_INSENSITIVE);

The (^|\\s) means: match either ^ OR \\s, where ^ means the beginning of your string (eg "#SP String"), and \\s means a whitespace character.

The regular expression "\\\\w*\\\\s*#sp\\\\w*\\s*" will match 0 or more words, followed by 0 or more spaces, followed by #sp, followed by 0 or more words, followed by 0 or more spaces. My suggestion is to not use \\s* to break words up in your expression, instead, use \\b.

"(^|\b)#sp(\b|$)"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM