简体   繁体   中英

regex to match string content until comment

I'm trying to match to expresions contained within [%___%] in a string, before // (comments) excluding // that are in quotations (inside a string)
so for example
[%tag%] = "a" + "//" + [%tag2%]; //[%tag3%]
should match [%tag%] and [%tag2%]

The closest I can get is ^(?:(?:\[%([^%\]\[]*)%\])|[^"]|"[^"]*")*?(?://)

So the problems I'm having are that this doesn't match any strings which don't end in //
In fact, it aggregates lines until it can conclude in one that contains //
I've tried to remedy this problem with ?.*?$ at the end, to signify that // is not necessary and to go to the first endline, but it doesn't really work.

And Secondly, it only captures the second tag. This isn't because of the "//" since even with [%1%] [%2%] it won't capture the first

I'm using C# and Regex.Matches with the RegexOptions.Multiline option and this is my escaped string

"^(?:(?:\\[%([^%\\]\\[]*)%\\])|[^\"]|\"[^\"]*\")*?(?://)"

First off, let me just say that I love regexes. I read Friedl's Mastering Regular Expressions years ago and never looked back. That being said, do not use one giant regex to solve this problem. Use your programming language. You'll end up with more readable and maintainable code. It looks like you're trying to parse a language here where different rules apply in different contexts. Your pattern could appear in a quoted string. Quoted strings might have quotes inside them which need to be escaped. Capturing all the subtleties in one regex would be a nightmare. I recommend iterating through the string character by character, building tokens along the way, looking for the quotes, and keeping track of whether or not you're in a quoted string. When you encounter a token that matches your criteria (you can use a regex for this part), and you're not within a string, add it to your list. When you hit the end of a statement and encounter the beginning of a comment, discard the remaining characters until the end of the comment.

I think doing this in one shot is a little difficult because of double quotes matching being difficult to check. You can do it in two phases:

¤ Removing all matching double quotes
¤ Finding your pattern

Regex re1 = new Regex(@"""[^""]*""", RegexOptions.Multiline);
Regex re2 = new Regex(@"(?<!//.*)\[%\w+%\]", RegexOptions.Multiline);
string input = @"[%tag%] = ""a"" + ""//"" + [%tag2%]; //[%tag3%]
[%tag%] = ""a"" + ""ii//"" + [%tag2%]; //[%tag3%]";

MatchCollection ms = re2.Matches(re1.Replace(input, ""));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM