简体   繁体   中英

Java Regular Expression Negative Look Ahead Finding Wrong Match

Assume I have the following string.

create or replace package test as
-- begin null; end;/
end;
/

I want a regular expression that will find the semicolon not preceded by a set of "--" double dashes on the same line. I'm using the following pattern "(?!--.*);" and I'm still getting matches for the two semicolons on the 2nd line.

I feel like I'm missing something about negative look aheads but I can't figure out what.

First of all, what you need is a negative lookbehind (?<!) and not a negative lookahead (?!) since you want to check what's behind your potential match.

Even with that, you won't be able to use the negative lookbehind in your case since the Java's regex engine does not support variable length lookbehind. This means that you need to know exactly how many characters to look behind your potential match for it to work.

With that said, wouldn't be simpler in your case to just split your String by linefeed/carriage return and then remove the line that start with "--"?

If you want to match semicolons only on the lines which do not start with -- , this regex should do the trick:

^(?!--).*(;)

Example

I only made a few changes from your regex:

  1. Multi-line mode, so we can use ^ and $ and search by line

  2. ^ at the beginning to indicate start of a line

  3. .* between the negative lookahead and the semicolon, because otherwise with the first change it would try to match something like ^; , which is wrong

(I also added parentheses around the semicolon so the demo page displays the result more clearly, but this is not necessary and you can change to whatever is most convenient for your program.)

The reason "(?!--.*);" isn't working is because the negative look ahead is asserting that when positioned before a ; that the next two chars are -- , which of course matches every time ( ; is always not -- ).

In java, to match a ; that doesn't have -- anywhere before it:

"\\G(((?<!--)[^;])*);"

To see this in action using a replaceAll() call:

String s = "foo; -- begin null; end;";
s = s.replaceAll("\\G(((?<!--)[^;])*);", "$1!");
System.out.println(s);

Output:

foo! -- begin null; end;

Showing that only semi colons before a double dash are matched.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM