简体   繁体   中英

Regex for nested values

I want a regex that can parse ignoring the nested matches

I mean on this for example:

/*asdasdasd /* asdasdsa */ qweqweqwe */

to match the first "/*" with the last "*/" and not stopping to the first "*/"

Thanks...

RegEx expressions will naturally be greedy, so you can just use:

\/\*.*\*\/

If you wanted it to do what you're afraid of and make the RegEx be lazy and stop after the first match you'd have to add an ? like:

\/\*.*?\*\/

Regular expressions can't count nested items by definition (though implementations do go further than the computer scientific definition).

See http://en.wikipedia.org/wiki/Regular_expression#Expressive_power_and_compactness

The solutions presented so far work ok if the text has only one nested comment. However, as LHMathies noted, if the text has more than one comment with stuff you want to keep between them, then these solutions fail. For example, here is some test data to verify the algorithm works correctly:

/* one */
Stuff one
/* two /* three */ two */
Stuff two
/* four */

A correct solution will preserve the two lines with stuff in them. To correctly handle this case in Javascript, you need a regex which matches an innermost comment (and this is the hard part), and then apply this repeatedly until all the comments are gone. Here is a tested function which does precisely that:

function strip_nested_C_comments(text)
{ // Regex to match innermost "C" style comment.
    var re = /\/\*[^*\/]*(?:(?!\/\*|\*\/)[*\/][^*\/]*)*\*\//i;
    // Iterate stripping comments from inside out.
    while (text.search(re) != -1) {
        text = text.replace(re, '');
    }
    return text;
}

Edit: Improved regex efficiency for non-match cases. (ie changed the "special" from [\\S\\s] to [*\\/] ).

Regular expressions aren't good at dealing with nested values, since what you're describing is not a " regular language "

But regular expressions are naturally greedy. That means that * and + quantifiers by default they will do exactly what you're asking for

var data = "/*asdasdasd /* asdasdsa */ qweqweqwe */";
data = data.replace( /\/\*.*\*\//, '' );
alert( 'Data: ' + data );

I'm guessing that you're really after something that will remove or process properly nested comments from a string, even if there's more than one -- the answers giving 'greedy' regexes will go from the first /* to the last */ : in strings like keep /* comment */ keep /* comment */ keep they will treat the middle keep as part of the comment.

The short answer is that Javascript RegExps aren't powerful enough to do that, you need recursive patterns. (Also known as regexps can't count ).

But, if you just want to remove the comments, you can use a loop and remove the innermost ones first (using the non-greedy RegExp from @mVChr, modified to match the last possible starting delimiter instead of the first):

var re = /(.*)\/\*.*?\*\//; while (re.test(string)) string.replace(re, '$1')

This moves the counting (of nesting levels) out of the regexp and into the loop, so to speak. (I didn't put a g flag on the regexp because I'm unsure of the side effects when using such an regexp in two places in a loop. And the loop takes care of finding all occurrences anyway).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM