简体   繁体   中英

regular expression (javascript) How to match anything beween two tags any number of times

I'm trying to find all occurrences of items in HTML page that are in between <nobr> and </nobr> tags. EDIT:(nobr is an example. I need to find content between random strings, not always tags)

I tried this

var match = /<nobr>(.*?)<\/nobr>/img.exec(document.documentElement.innerHTML);
alert (match);

But it gives only one occurrence. + it appears twice, once with the <nobr></nobr> tags and once without them. I need only the version without the tags.

use the DOM

var nobrs = document.getElementsByTagName("nobr")

and you can then loop through all nobrs and extract the innerHTML or apply any other action on them.

you need to do it in a loop

var match, re = /<nobr>(.*?)<\/nobr>/img;
while((match = re.exec(document.documentElement.innerHTML)) !== null){
   alert(match[1]);
}

(Since I can't comment on Rafael's correct answer...)

exec is doing what it is supposed to do - finding the first match, returning the result in the match object, and setting you up for the next exec call. The match object contains (at index 0) the whole of the string matched by the whole of the regex. In subsequent slots are the bits of the string matched by the parenthesized subgroups. So match[1] contains the bit of the string matched by "(.*?)" in your example.

you can use

while (match = /<nobr>(.*?)<\/nobr>/img.exec("foo <nobr> hello </nobr> bar <nobr> world </nobr> foobar"))
    alert (match[1]);

If the strings you're using aren't xml elements, and you're sticking with regexes the return value you're getting can be explained by the bracketing. .exec returns the whole matching string followed by the contents of the bracketed expressions.

If your doc contains:

This is out.
Bzz. This is in. unBzz.

then

/Bzz.(.*?)unBzz./img.exec(document.documentElement.innerHTML)

Will give you 'Bzz. This is in. unBzz.' in element 0 of the returned array and 'This is in.' in element 1. Trying to display the whole array gives both as a comma separated list because that's what JavaScript does to try to display it.

So alert($match[1]); is what you're after.

it takes to steps but you could do it like this

match = document.documentElement.innerHTML.match(/<nobr>(.*?)<\/nobr>/img)
alert(match)//includes '<nobr>'

match_length = match.length;
for (var i = 0; i < match_length; i++)
{
    var match2 = match[i].match(/<nobr>(.*?)<\/nobr>/im);//same regex without the g option
    alert(match2[1]);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM