简体   繁体   中英

Regex to capture Ids from text

I have the following regex where I am trying to capture the Ids of each start comment. But for some reason I am only able to capture the first one. It won't grab the Id of the nested comment. It only prints 1000 to the console. I am trying to get it to capture both 1000 and 2000. Can anyone spot the error in my regex?

<script type="text/javascript">

    function ExtractText() {
        var regex = /\<!--Start([0-9]{4})-->([\s\S]*?)<!--End[0-9]{4}-->/gm;
       var match;
        while (match = regex.exec($("#myHtml").html())) {
            console.log(match[1]);
        }
    }

</script>

<div id="myHtml">
   <!--Start1000-->Text on<!--Start2000-->the left<!--End1000-->Text on the right<!--End2000-->
</div> 

Based on Mike Samuel's answer I updated my JS to the following:

function GetAllIds() {

        var regex = /<!--Start([0-9]{4})-->([\s\S]*?)<!--End\1-->/g;
        var text = $("#myHtml").html();
        var match;
        while (regex.test(text)) {
            text = text.replace(
               regex,
               function (_, id, content) {
                   console.log(id);
                   return content;
               });
        }
    }

In

 <!--Start1000-->Text on<!--Start2000-->the left<!--End1000-->Text on the right<!--End2000--> 

the "1000" region overlaps the "2000" region, but the exec loop only finds non-overlapping matches since each call to exec with the same regex and string starts at the end of the last match. To solve this problem, try

var regex = /<!--Start([0-9]{4})-->([\s\S]*?)<!--End\1-->/g;
for (var s = $("#myHtml").html(), sWithoutComment;
     // Keep going until we fail to replace a comment bracketed chunk
     // with the chunk minus comments.
     true;
     s = sWithoutComment) {
  // Replace one group of non-overlapping comment pairs.
  sWithoutComment = s.replace(
     regex,
     function (_, id, content) {
       console.log(id);
       // Replace the whole thing with the body.
       return content;
     });
  if (s === sWithoutComment) { break; }
}

You can use grouping and then another regexp:

var regex =  /(<!--Start)([0-9]{4})/ig;
var str = document.getElementById('myHtml').innerHTML;
var matches = str.match(regex);
for(var i=0;i<matches.length;i++){
    var m = matches[i];
    var num = m.match(/(\d+)/)[1];
    console.log(num);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM