简体   繁体   English

使用Javascript(regex)查找不在HTML标记内的文本

[英]Find text not within HTML tags with Javascript (regex)

I have a string from a DOM element, which contains something similar to the following: 我有一个来自DOM元素的字符串,其中包含类似于以下内容:

<span class='greenhornet'>Can you catch the green?</span>

I need to know the position of the word green . 我需要知道绿色一词的位置。

In this case, if I setup a pattern /green/ , JS exec() of course will return the first occurrence of green (position 13). 在这种情况下,如果我设置了模式/green/ ,则JS exec()当然会返回绿色的第一个匹配项(位置13)。

Is there a way to tell JS regexp to ignore ! 有没有办法告诉JS正则表达式忽略! the word green , if it's between < and > or is there an easier way to do this? 单词green ,如果它介于<>之间,或者有更简单的方法吗?

Oh, and I can't just strip the HTML either! 哦,我也不能只剥离HTML!

thanks. 谢谢。

As the commentors (and user1883592) have suggested, stripping the HTML or parsing the text out of the HTML is the correct answer here. 正如评论者(和user1883592)所建议的那样,在此处剥离HTML或从HTML中解析文本是正确的答案。 Using regular expressions with HTML is a loser's game; 在HTML上使用正则表达式是失败者的游戏; you've been warned. 您已被警告。

But, that being said, if you really want to play that game, I'd start by ensuring there are no opening brackets in between your term and the last closing bracket; 但是,话虽如此,如果您真的想玩这个游戏,我首先要确保您的任期与最后一个结束括号之间没有任何括号。 in other words: 换一种说法:

var greenRegex = />[^<]+(green)/;
var position = "<span class='greenhornet'>Can you catch the green?</span>".search(greenRegex);
// position = 25, not 13

You can get innerHTML of the span element. 您可以获取span元素的innerHTML。 No Regex needed. 无需正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM