简体   繁体   中英

Regex to seperate strings by html tags and and spaces in normal string

Spent alot of time on this. This is my current state of codes

var str = '<div class="x"><p>this is <span> example </span>text</p></div>';
var arr = str.split(/\s*(<[^>]*>)/g );
arr = arr.filter(function(n){ return n != '' }); 
alert(arr);

Not a regex fan but struggled to achieve the out put

["<div class="x">", "<p>", "this is", "<span>", " example", "</span>", "text", "</p>", "</div>"]

What my expectation is

["<div class="x">", "<p>", "this", "is", "<span>", " example", "</span>", "text", "</p>", "</div>"]

The difference is expected and current is very minor. All I need is if any string have multiple terms they also need to be get as separated strings, if that doesn't contain any html tags on a whole.

Look at the difference at third element. I would like to achieve in the same regex if possible. Otherwise it is okay to do some processing later on.

Play Ground .

Note : I am using Jsoup in the back end for further processing. Any Jsoup/Java solution also would be fine

Try this:

 var str = '<div class="x"><p>this is <span> example </span>text</p></div>'; var arr = str.split(/\\s*(<[^>]*>)/g ); arr = arr.filter(function(n){ return n !== ''; }); var c = []; for(var i =0; i <arr.length; i++){ if(arr[i].includes("<")){ c.push(arr[i]); } else{ var u = arr[i].split(" "); for(var j = 0; j < u.length; j++){ c.push(u[j]); } } } c = c.filter(function(n){ return n !== ''; }); console.log(c); alert(c); 

Try this:

正则表达式可视化

 var re = /<[^>]+>|\\w+/g; var str = '<div class="x"><p>this is <span> example </span>text</p></div>'; var m; while ((m = re.exec(str)) !== null) { if (m.index === re.lastIndex) { re.lastIndex++; } document.getElementById('console').value+=m[0]+'\\n'; } 
 <textarea id="console" cols="40" rows="15"> </textarea> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM