简体   繁体   English

正则表达式(javascript)如何多次匹配两个标签之间的任何内容

[英]regular expression (javascript) How to match anything beween two tags any number of times

I'm trying to find all occurrences of items in HTML page that are in between <nobr> and </nobr> tags. 我正在尝试查找HTML页面中<nobr></nobr>标记之间的所有项目。 EDIT:(nobr is an example. I need to find content between random strings, not always tags) 编辑:(nobr是一个例子。我需要在随机字符串之间找到内容,而不总是在标签之间)

I tried this 我试过了

var match = /<nobr>(.*?)<\/nobr>/img.exec(document.documentElement.innerHTML);
alert (match);

But it gives only one occurrence. 但是它只发生一次。 + it appears twice, once with the <nobr></nobr> tags and once without them. +它出现两次,一次带有<nobr></nobr>标记,一次没有它们。 I need only the version without the tags. 我只需要没有标签的版本。

use the DOM 使用DOM

var nobrs = document.getElementsByTagName("nobr")

and you can then loop through all nobrs and extract the innerHTML or apply any other action on them. 然后您可以遍历所有nobr并提取innerHTML或对它们进行任何其他操作。

you need to do it in a loop 您需要循环执行

var match, re = /<nobr>(.*?)<\/nobr>/img;
while((match = re.exec(document.documentElement.innerHTML)) !== null){
   alert(match[1]);
}

(Since I can't comment on Rafael's correct answer...) (因为我无法评论拉斐尔的正确答案...)

exec is doing what it is supposed to do - finding the first match, returning the result in the match object, and setting you up for the next exec call. exec正在执行应做的工作-查找第一个匹配项,将结果返回到match对象中,并为下一个exec调用进行设置。 The match object contains (at index 0) the whole of the string matched by the whole of the regex. match对象包含(在索引0处)与整个正则表达式匹配的整个字符串。 In subsequent slots are the bits of the string matched by the parenthesized subgroups. 在随后的时隙中,是由括号括起来的子组匹配的字符串的位。 So match[1] contains the bit of the string matched by "(.*?)" in your example. 因此match[1]包含示例中与“(。*?)”匹配的字符串的位。

you can use 您可以使用

while (match = /<nobr>(.*?)<\/nobr>/img.exec("foo <nobr> hello </nobr> bar <nobr> world </nobr> foobar"))
    alert (match[1]);

If the strings you're using aren't xml elements, and you're sticking with regexes the return value you're getting can be explained by the bracketing. 如果您使用的字符串不是xml元素,并且您坚持使用正则表达式,则返回的值可以用方括号来解释。 .exec returns the whole matching string followed by the contents of the bracketed expressions. .exec返回整个匹配字符串,后跟方括号表达式的内容。

If your doc contains: 如果您的文档包含:

This is out.
Bzz. This is in. unBzz.

then 然后

/Bzz.(.*?)unBzz./img.exec(document.documentElement.innerHTML)

Will give you 'Bzz. 会给你'Bzz。 This is in. unBzz.' 进来了。 in element 0 of the returned array and 'This is in.' 在返回数组的元素0中,并且“此为in”。 in element 1. Trying to display the whole array gives both as a comma separated list because that's what JavaScript does to try to display it. 在元素1中。尝试显示整个数组会以逗号分隔的形式给出两者,因为这就是JavaScript试图显示它的方式。

So alert($match[1]); 因此, alert($match[1]); is what you're after. 是你所追求的。

it takes to steps but you could do it like this 它需要采取步骤,但您可以这样做

match = document.documentElement.innerHTML.match(/<nobr>(.*?)<\/nobr>/img)
alert(match)//includes '<nobr>'

match_length = match.length;
for (var i = 0; i < match_length; i++)
{
    var match2 = match[i].match(/<nobr>(.*?)<\/nobr>/im);//same regex without the g option
    alert(match2[1]);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM