简体   繁体   中英

Spaces in Java Regular Expressions

I am new to both Java and regular expressions

I want to detect a pattern like Section :

I have a code snippet

    String line = "Section 12: sadfdggfgfgf";
    Pattern ptn = Pattern.compile("Section [0-9+]:");
    Matcher mtch = ptn.matcher(line);

When ptn = "Section [0-9+]: mtch is false

I am able to detect the pattern (mtch says TRUE) when ptn = "Section [0-9+]

Is there something I am missing about spaces in the String ? I have to assume they may or may not be spaces between Section and <Number>

Put the + outside the character class so that it would match one or more digits. [0-9+] would match only a single character from the given list ( digit from the range 0-9 or + )

Pattern ptn = Pattern.compile("Section [0-9]+:");

While you running this "Section [0-9+]:" regex, it returns false because there isn't a string Section followed by a single digit or a literal + again followed by a : in your original string (Note: Your original string contains two digits followed by a colon, Section 12: sadfdggfgfgf ).

But "Section [0-9+]" returns true because there is a string Section followed by a single digit.

You need to place the quantifier after your character class. A character class defines a set of characters, any one of which can occur for a match to succeed. Currently you're matching any character of 0 to 9 , + exactly "one time".

The reason the match returns false for your pattern with a colon is because the regex engine is trying to match a colon after a single number in which you have two numbers before the colon. The reason it returns true for the pattern without a colon is because the regex engine is able to match a single number that follows "Section "

The correct syntax would be:

Section [0-9]+:

This matches "Section" followed by a space character then any character of 0 to 9 "one or more" times and a colon.

If you want to accept any number of strings between Section and the number, try this regex:

Pattern.compile("Section[\\s]*[\\d]+");

For at least one space, use this:

Pattern.compile("Section[\\s]+[\\d]+");

In java regular expressions \\s matches whitespace and \\d matches a digit. However, since a backslash starts an escape sequence you must escape the backslash itself, which is why you end up with double backslashes.

You can read more and Java regular expressions and the Pattern class here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Section\s*[0-9]+:

您可以使用它来确保无论sectionnumber之间是否存在space都可以匹配。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM