简体   繁体   中英

Select text between several different kinds of tags in javascript by using regex

I am trying to code some stuff in HTML, CSS and Javascript. I have some problems with regex.

Let me take a simple example to explain my problem because I can't find the solution.

 <script> var str = "I am <b>a tennis player</b> but I like also playing <i>football</i> and <i>rugby</i>, I am <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>."; var result = str.match(/<b>(.*?)<\\/b>/g).map(function(val){ return val.replace(/<\\/?b>/g,''); }); alert(result) </script> 

So as you may have guessed it, I am looking for selecting all the text between the tags <b></b>,<i></i>,<u></u> . To be clearer I want to be able to select " a tennis player ", " football ", " rubgy ", " 34 ", " cooking " etc.

For the moment, I managed to deal with only one tag. When I try with several ones I fail. I have no experience on regex (I didn't study and work in this field) and the courses I found on the internet didn't answer my question. I don't think it is difficult to combine three regex, but I am lost with clast, with AND or OR etc. :/

You can use following regex to extract the innerText of elements.

/<([biu])>(.*?)<\/\1>/gi

Explanation:

  1. <([biu])> : Matches < followed by either b / i / u and then > . Can also be written as <(b|i|u)> and puts the tagName in the first captured group.
  2. (.*?) : Non-greedy match. Matches as many as possible characters to satisfy the condition
  3. <\\/\\1> : Matches the </ followed by the first captured group(see #1 above) followed by > . Thus matching the closing tag.
  4. gi : g: Global flag to match all possible results. i : Case-insensitive match.

 var str = "I am <b>a tennis player</b> but I like also playing <i>football</i> and <i>rugby</i>, I am <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>."; var regex = /<([biu])>(.*?)<\\/\\1>/gi, result = []; while (match = regex.exec(str)) { result.push(match[2]); } console.log(result); document.body.innerHTML = '<pre>' + JSON.stringify(result, 0, 4) + '</pre>'; 


You can also use jQuery.

 var str = "I am <b>a tennis player</b> but I like also playing <i>football</i> and <i>rugby</i>, I am <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>."; var result = []; $('<div/>').html(str).find('b, i, u').each(function(i, e) { result.push(e.innerText); }); console.log(result); $('body').html('<pre>' + JSON.stringify(result, 0, 4) + '</pre>'); 
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min.js"></script> 

Getting all text from u , b and i tags can be easily achieved with plain JS DOM parser:

 function getTagTexts(str, tag) { var el = document.createElement( 'html' ); // create an empty element el.innerHTML = '<faketag>' + str + '</faketag>'; // init the innerHTML property of the element var arr = []; // declare the array for the results [].forEach.call(el.getElementsByTagName(tag), function(v,i,a) { // iterate through the tags we want arr.push(v.innerText); // and add the innerText property to the array }); return arr; } var txt = "I am <b>a tennis player</b> but I like also playing <i>football</i> and <i>rugby</i>, I am <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>."; var arrayI = getTagTexts(txt, "i"); var arrayU = getTagTexts(txt, "u"); var arrayB = getTagTexts(txt, "b"); document.body.innerHTML += JSON.stringify(arrayI, 0, 4) + "<br/>"; // => ["football", "rugby", "tennis", "football", "rugby"] document.body.innerHTML += JSON.stringify(arrayU, 0, 4) + "<br/>"; // => ["cooking"] document.body.innerHTML += JSON.stringify(arrayB, 0, 4); // => ["a tennis player", "34"] 

Note that the faketag is necessary if you need to parse an HTML fragment without html / body tags.

See code below:

 var str = "I am <b>a tennis player</b> but I like also playing <i>football</i> and <i>rugby</i>, I am <b>34</b> years old, I like <u>cooking</u> even if there is nothing in common with <i>tennis</i>, <i>football</i> or <i>rugby</i>."; var result = str.match(/<(b|i|u)>(.*?)<\\/\\1>/g).map(function(val){ return val.replace(/<\\/?b>|<\\/?i>|<\\/?u>/g,''); }); alert(result) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM