简体   繁体   English

仅在不存在的情况下匹配字符串<script> or <a> tags

[英]Matching a string only if it is not in <script> or <a> tags

I'm working on a browser plugin that replaces all instances of "someString" (as defined by a complicated regex) with <a href="http://domain.com/$1">$1</a> . 我正在开发一个浏览器插件,该插件用<a href="http://domain.com/$1">$1</a>替换所有“ someString”(由复杂的正则表达式定义)的实例。 This generally works ok just doing a global replace on the body's innerHTML. 通常,只要对主体的innerHTML进行全局替换,就可以正常工作。 However it breaks the page when it finds (and replaces) the "someString" inside <script> tags (ie as a JS variable or other JS reference). 但是,当它找到(并替换) <script>标记内的“ someString”(即作为JS变量或其他JS引用)时,它将中断页面。 It also breaks if "someString" is already part of an anchor. 如果“ someString”已经是锚的一部分,它也会中断。

So basically I want to do a global replace on all instances of "someString" unless it falls inside a <script></script> or <a></a> tag set. 因此,基本上,我想对“ someString”的所有实例进行全局替换,除非它属于<script></script><a></a>标记集。

Essentially what I have now is: 我现在基本上拥有的是:

var body = document.getElementsByTagName('body')[0].innerHTML;
body = body.replace(/(someString)/gi, '<a href="http://domain.com/$1">$1</a>');
document.getElementsByTagName('body')[0].innerHTML = body;

But obviously that's not good enough. 但是显然那还不够好。 I've been struggling for a couple hours now and reading all of the answers here (including the many adamant ones that insist regex should not be used with HTML), so I'm open to suggestions on how to do this. 我已经苦苦挣扎了几个小时,并且在这里阅读了所有答案(包括许多坚决要求正则表达式不能与HTML一起使用的答案),所以我愿意就如何做到这一点提出建议。 I'd prefer using straight JS, but can use jQuery if necessary. 我更喜欢直接使用JS,但如有必要可以使用jQuery。

Edit - Sample HTML : 编辑-示例HTML

<body>
  someString
  <script type="text/javascript">
  var someString = 'blah';
  console.log(someString);
  </script>
  <a href="someString.html">someString</a>
</body>

In that case, only the very first instance of "someString" should be replaced. 在这种情况下,仅应替换“ someString”的第一个实例。

Well, You can use XPath with Mozilla (assuming you're writing the plugin for FireFox). 好吧,您可以将XPath与Mozilla结合使用(假设您正在为FireFox编写插件)。 The call is document.evaluate . 呼叫是document.evaluate Or you can use an XPath library to do it (there are a few out there)... 或者,您可以使用XPath库来实现(那里有一些)...

var matches = document.evaluate(
    '//*[not(name() = "a") and not(name() = "script") and contains(., "string")]',
    document,
    null,
    XPathResult.UNORDERED_NODE_ITERATOR_TYPE
    null
);

Then replace using a callback function: 然后使用回调函数替换:

var callback = function(node) {
    var text = node.nodeValue;
    text = text.replace(/(someString)/gi, '<a href="http://domain.com/$1">$1</a>');
    var div = document.createElement('div');
    div.innerHTML = text;
    for (var i = 0, l = div.childNodes.length; i < l; i++) {
        node.parentNode.insertBefore(div.childNodes[i], node);
    }
    node.parentNode.removeChild(node);
};
var nodes = [];
//cache the tree since we want to modify it as we iterate
var node = matches.iterateNext();
while (node) {
    nodes.push(node);
    node = matches.iterateNext();
}
for (var key = 0, length = nodes.length; key < length; key++) {
    node = nodes[key];
    // Check for a Text node
    if (node.nodeType == Node.TEXT_NODE) {
        callback(node);
    } else {
        for (var i = 0, l = node.childNodes.length; i < l; i++) {
            var child = node.childNodes[i];
            if (child.nodeType == Node.TEXT_NODE) {
                callback(child);
            }
        }
    }
}

Try this and see if it meets your needs (tested in IE 8 and Chrome). 尝试一下,看看它是否满足您的需求(已在IE 8和Chrome中进行了测试)。

<script src="jquery-1.4.4.js" type="text/javascript"></script>
<script>
  var pattern = /(someString)/gi;
  var replacement = "<a href=\"http://domain.com/$1\">$1</a>";

  $(function() {
    $("body :not(a,script)")
      .contents()
      .filter(function() { 
        return this.nodeType == 3 && this.nodeValue.search(pattern) != -1;
      })
      .each(function() {
        var span = document.createElement("span");
        span.innerHTML = "&nbsp;" + $.trim(this.nodeValue.replace(pattern, replacement));
        this.parentNode.insertBefore(span, this);
        this.parentNode.removeChild(this);
      });
  });
</script>

The code uses jQuery to find all the text nodes within the document's <body> that are not in <anchor> or <script> blocks, and contain the search pattern. 该代码使用jQuery查找文档的<body>中不在<anchor><script>块中的所有文本节点,并且包含搜索模式。 Once those are found, a span is injected containing the target node's modified content, and the old text node is removed. 找到这些内容后,将注入包含目标节点的已修改内容的跨度,并删除旧的文本节点。

The only issue I saw was that IE 8 handles text nodes containing only whitespace differently than Chrome, so sometimes a replacement would lose a leading space, hence the insertion of the non-breaking space before the text containing the regex replacements. 我看到的唯一问题是IE 8处理的文本节点仅包含空白而不是Chrome,因此有时替换会丢失前导空格,因此在包含正则表达式替换的文本之前插入了不间断空格。

I know you don't want to hear this, but this doesn't sound like a job for a regex. 我知道您不想听这个,但这听起来不像是正则表达式的工作。 Regular expressions don't do negative matches very well before becoming complicated and unreadable. 正则表达式在变得复杂且难以理解之前,不能很好地进行否定匹配。

Perhaps this regex might be close enough though: 也许这个正则表达式可能足够接近:

/>[^<]*(someString)[^<]*</

It captures any instance of someString that are inbetween a > and a <. 它捕获在>和<之间的someString的任何实例。

Another idea is if you do use jQuery, you can use the :contains pseudo-selector. 另一个想法是,如果您确实使用jQuery,则可以使用:contains伪选择器。

$('*:contains(someString)').each(function(i)
{
    var markup = $(this).html();
    // modify markup to insert anchor tag
    $(this).html(markup)
});

This will grab any DOM item that contains 'someString' in it's text. 这将获取文本中包含“ someString”的所有DOM项目。 I dont think it will traverse <script> tags or so you should be good. 我认为它不会遍历<script>标记,所以您应该不错。

You could try the following: 您可以尝试以下方法:

/(someString)(?![^<]*?(<\/a>|<\/script>))/

I didn't test every schenario, but it is basically using a negative lookahead to look for the next opening bracket following someString , and if that bracket is part of an anchor or script closing tag, it does not match. 我没有测试每个schenario,但是它基本上是使用负前行查找someString之后的下一个左括号,并且如果该括号是锚或脚本结束标记的一部分,则不匹配。

Your example seems to work in this fiddle , although it certainly doesn't cover all possibilities. 您的示例似乎在这种提琴上奏效,尽管它当然不能涵盖所有可能性。 In cases where the innerHTML in your <a></a> contains tags (like <b> or <span> ), or the code in your script tags generates html (contains strings with tags in it), you would need something more complex. 如果<a></a>中的innerHTML包含标签(例如<b><span> ),或者脚本标签中的代码生成html(包含其中包含标签的字符串),则您还需要更多内容复杂。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM