简体   繁体   中英

Java regex that searches multi-line text excluding some string

I have some code:

String test = "int measure; \n" +
              "void introduce() { \n" +
              "while (line != null) { \n" +
              "if(measure > 0) { \n" +
              "System.out.println(smile); \n" +
                "} \n" +
              "}";  

String functions = "^.*(?!(catch|if|while|try|return|finally|new|throw)).*$";
Pattern patternFunctions = Pattern.compile(functions);
Matcher matcherFunctions = patternFunctions.matcher(test);
while(matcherFunctions.find()) 
          System.out.println(matcherFunctions.group());

This was supposed to print all lines except third and fourth, because they contain "if" and "while" words. But in reality it prints nothing. Every help would be grateful. Thank you.

Update:

Thank you guys for your answers! Your examples are working. I have one additional question: after negative lookahead I want to insert condition .*\\\\(.*\\\\).*\\\\{ which means that text includes .*<negotiation>.*(.*).*{ In a simple manner, it should print the second line from my String test . I tried this one regex (?m)^.*(?!(catch|if|while|try|return|finally|new|throw).\\\\(.*\\\\).*\\\\{)*$ but it doesn't work in a proper way. What would you suggest?

Try enabling multiline mode, as demonstrated here: https://stackoverflow.com/a/6143347/584663

And, include the dot in the negative look ahead: https://stackoverflow.com/a/2387072/584663

which yields: (?m)^((?!(catch|if|while|try|return|finally|new|throw)).)*$

It's giving you no output because you're regular expression is incorrect.

You need to remove the beginning .* and place either a capturing or non-capturing group around your Negative Lookahead and reconstruct the ending .* so that the dot . is placed before your last parenthesis and the quantifier * is placed after the last parenthesis before the $ anchor.

You need to use the m modifier (multi-line) causing the ^ and $ anchors to match the beginning/end of each line. And I added the use of the i modifier for case-insensitive matching.

String functions = "(?im)^(?:(?!(?:catch|if|while|try|return|finally|new|throw)).)*$";

Regular expression:

(?im)           set flags for this block (case-insensitive) 
                (with ^ and $ matching start and end of line)
 ^              the beginning of a "line"
 (?:            group, but do not capture (0 or more times)
  (?!           look ahead to see if there is not:
  (?:           group, but do not capture:
    catch       'catch'
   |            OR
    if          'if'
   |            OR
    while       'while'
   |            OR
    try         'try'
   |            OR
    return      'return'
   |            OR
    finally     'finally'
   |            OR
    new         'new'
   |            OR
    throw       'throw'
  )             end of grouping
  )             end of look-ahead
  .             any character except \n
 )*             end of grouping
 $              before an optional \n, and the end of a "line"

See Working demo

  1. Remove first .* from pattern: "^.*(?!(catch..." , because it allow if and while !
  2. Compile your regular expression with the multiline flag.

Working code:

String functions = "^((?!(catch|if|while|try|return|finally|new|throw))).*$";
Pattern patternFunctions = Pattern.compile(functions, Pattern.MULTILINE);
Matcher matcherFunctions = patternFunctions.matcher(test);

More about java.util.regex.Pattern.Multiline

In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM