Spent alot of time on this. This is my current state of codes
var str = '<div class="x"><p>this is <span> example </span>text</p></div>';
var arr = str.split(/\s*(<[^>]*>)/g );
arr = arr.filter(function(n){ return n != '' });
alert(arr);
Not a regex fan but struggled to achieve the out put
["<div class="x">", "<p>", "this is", "<span>", " example", "</span>", "text", "</p>", "</div>"]
What my expectation is
["<div class="x">", "<p>", "this", "is", "<span>", " example", "</span>", "text", "</p>", "</div>"]
The difference is expected and current is very minor. All I need is if any string have multiple terms they also need to be get as separated strings, if that doesn't contain any html tags on a whole.
Look at the difference at third element. I would like to achieve in the same regex if possible. Otherwise it is okay to do some processing later on.
Note : I am using Jsoup in the back end for further processing. Any Jsoup/Java solution also would be fine
Try this:
var str = '<div class="x"><p>this is <span> example </span>text</p></div>'; var arr = str.split(/\\s*(<[^>]*>)/g ); arr = arr.filter(function(n){ return n !== ''; }); var c = []; for(var i =0; i <arr.length; i++){ if(arr[i].includes("<")){ c.push(arr[i]); } else{ var u = arr[i].split(" "); for(var j = 0; j < u.length; j++){ c.push(u[j]); } } } c = c.filter(function(n){ return n !== ''; }); console.log(c); alert(c);
Try this:
var re = /<[^>]+>|\\w+/g; var str = '<div class="x"><p>this is <span> example </span>text</p></div>'; var m; while ((m = re.exec(str)) !== null) { if (m.index === re.lastIndex) { re.lastIndex++; } document.getElementById('console').value+=m[0]+'\\n'; }
<textarea id="console" cols="40" rows="15"> </textarea>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.