简体   繁体   中英

Including comments in Java regular expressions

I have some complex regular expressions which I need to comment for readability and maintenance. The Java spec is rather terse and I struggled for a long time getting this working. I finally caught my bug and will post it as an answer but I'd be grateful for any other advice on maintaining regexes

As an example I want to comment the subcomponents (of patternS) in a simple name parser:

    String testTarget = "Waldorf T. Flywheel";
    String patternS = "([A-Za-z]+)\\s+([A-Z]\\.)?\\s+([A-Za-z]+)";
    Pattern pattern = Pattern.compile(patternS, Pattern.COMMENTS);
    Assert.assertTrue(pattern.matcher(testTarget).matches());

EDIT: I would be grateful for examples of the (?x) format as well.

EDIT: @geowa4 has a good suggestion which avoids embedded comments. Sinnce java and others have provided for embedded comments what are the cases where they are useful? (I think I have a case but I'd be interested to see others).

EDIT: As noted below @mikej the regex does not support the optional initial well and would be better as:

        String patternS = "([A-Za-z]+)\\s+([A-Z]\\.\\s+)?([A-Za-z]+)";

but that would end up extracting space in the initial

See the post by Martin Fowler on ComposedRegex for some more ideas on improving regexp readability. In summary, he advocates breaking down a complex regexp into smaller parts which can be given meaningful variable names. eg

String mandatoryName = "([A-Za-z]+)";
String mandatoryWhiteSpace = "\\s+";
String optionalInitial = "([A-Z]\\.)?";
String pattern = mandatoryName + mandatoryWhiteSpace + optionalInitial +
    mandatoryWhiteSpace + mandatoryName;

Why don't you just do this:

String pattern2S = 
    "([A-Za-z]+)" + //    mandatory firstName
    "\\s+" +        //    mandatory whitespace
    ...;

CONTINUATION:

If you want to keep the comments with the pattern and you need to read it in from a properties file, use this:

pattern=\
#comment1\\n\
(A-z)\
#comment2\\n\
(0-9)

I found the following worked:

        String pattern2S = 
            "([A-Za-z]+)      # mandatory firstName\n" +
            "\\s+             # mandatory whitespace\n " +
            "([A-Z]\\.)?      # optional initial\n" +
            "\\s+             # whitespace\n " +
            "([A-Za-z]+)      # mandatory lastName\n"; 

The key thing was to include the newline character \\n explicitly in the string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM