简体   繁体   中英

How can I write a regex in Java that will perform a .replaceFirst on a group that is not in a comment?

So I need to return modified String where it replaces the first instance of a token with another token while skipping comments. Here's an example of what I'm talking about:

This whole quote is one big String
-- I don't want to replace this @@
But I want to replace this @@!

Being a former .NET developer, I thought this was easy. I'd just do a negative lookbehind like this:

(?<!--.*)@@

But then I learned Java can't do this. So upon learning that the curly braces are okay, I tried this:

(?<!--.{0,9001})@@

That didn't throw an exception, but it did match the @@ in the comment.

When I test this regex with a Java regex tester, it works as expected. About the only thing I can think of is that I'm using Java 1.5. Is it possible that Java 1.5 has a bug in its regex engine? Assuming it does, how do I get Java 1.5 to do what I want it to do without breaking up my string and reassembling it?

EDIT I changed the # to the -- operator since it looks like the regex will be more complex with two chars instead of one. I originally did not reveal that I was modifying a query in order to avoid off topic discussion on "Well you shouldn't modify queries that way!" I have a very good reason for doing this. Please don't discuss query modification good practices. Thanks

You really don't need a negative look-behind here. You can do it without that too.

It would be like this:

String str = "I don't want to replace this @@";     
str = str.replaceAll("^([^#].*?)@@", "$1");

So, it replaces first occurrence of @@ in the string that does not start with # with the part of the string before @@ . So, @@ is removed. Here replaceAll works because it uses a reluctant quantifier - .*? . So, it will automatically stop at the first @@ .


As correctly pointed out by @nhahtdh in the comment, that this might fail, if your comment is at the end of the line. So, you can rather use this one:

String str = "I don't want to # replace this @@";
str = str.replaceAll("^([^#]*?)@@", "$1");

This one will work for any case. And in the given example case, it won't replace the @@ , as it is a part of the comment.


If your comment start is denoted by two characters, then negated character class won't work. You would need to use negative look-ahead like this:

String str = "This whole quote @@  is one big String -- asdf @@\n" +
             "-- I don't want to replace this @@\n" + 
             "But I want to replace this @@!";
str = str.replaceAll("(?m)^(((?!--).)*?)@@", "$1");

System.out.println(str);

Output:

This whole quote   is one big String -- asdf @@
-- I don't want to replace this @@
But I want to replace this !

(?m) at the beginning of the pattern is used to enable MULTILINE mode of matching, so the ^ will match the start of each line, rather than the start of the entire expression.

You can use something like this:

String string = "This whole quote is one big String\n" +
                "# I don't want to replace this @@\n" +
                "And I also # don't want to replace this @@\n" +
                "But I want to replace this @@!\n" +
                "But not this @@!";

Matcher m =
    Pattern.compile (
        "^((?:[^@#]|@[^@]|#[^\n]*)*)@@", Pattern.MULTILINE).
            matcher (string);

StringBuffer result = new StringBuffer ();
if (m.find ())
    m.appendReplacement (result, "$1FOO");
m.appendTail (result);

System.out.println (result.toString ());

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM