简体   繁体   中英

regex split with non-capturing groups

I want to match for html tagnames (eg. div in < div > ), and then split the string at the position of the match.

 var str = '&lt;div&gt; div'; var regex = /(?:&lt;)(\\w*)(?=&gt;)?/g; var arr = str.split(regex); console.log(arr); //result: ["", "div", "&gt; div"] //expected: ["&lt;", "&gt; div"] 

However, the "&lt ;" gets lost by doing this, and also I want the div inside of the < and > removed. How can I achieve it?


This one also doesn't work, because then the "fake-div" at the end of the string would also be splitted, even though it is not within < and >:

 var str = '&lt;div&gt; div'; var regex = /(?:&lt;)(\\w*)(?=&gt;)?/g; var match = regex.exec(str); var arr = match.input.split(match[1]); console.log(arr); //result: ["&lt;", "&gt; ", ""] //expected: ["&lt;", "&gt; div"] 

One of the closest you might get if you want to only use a single regex is:

var regex = /\b(?:\w+)(?=&gt;)/gi;
'&lt;div&gt; div'.split(regex);//["&lt;", "&gt; div"]

It gives the expected behavior but the obvious problem with this one is that it does not check preceding &lt; . And javascript does not natively support lookbehind.

A better approach might be to separate &lt; and &gt; and then combine them:

var str = '&lt;div&gt; div';
var ltRgx = /(?:\s|\b|^)(?=&lt)/gi;
var gtRgx = /\b(?:\w+)(?=&gt;)/gi;
var result = str.split(ltRgx).map(function(d,i){
    return d.split(gtRgx)
}).reduce(function(ac,d){
    return ac.concat(d);
});
console.log(result);//["&lt;", "&gt; div"]
/*Another example*/
str = '&lt;div&gt; &lt;img&gt; div';
result = str.split(ltRgx).map(function(d,i){
    return d.split(gtRgx)
}).reduce(function(ac,d){
    return ac.concat(d);
});
console.log(result);//["&lt;", "&gt;", "&lt;", "&gt; div"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM