使用正则表达式搜索文本，而忽略html标签

Question

我需要在搜索到的文本周围添加突出显示类。 但是其他html标记妨碍了我。 这是一个例子：

从...开始：

<div class="source">your <b><i>text</i></b> using <a href="#">regex ignoring html</a> tags</div>

我搜索： text using regex

预期的结果（在此示例中，我将使用span突出显示）：

<div class="source">your <b><i><span>text</span></i></b><span> using </span><a href="#"><span>regex</span> ignoring html</a> tags</div>

我对此有解决方案，但是它需要特定的正则表达式来搜索忽略HTML标签的文本。 如果有其他解决方案，请在下面提出。 而且它不必用vanilla js编写 。 下面是我当前解决方案的简化版本，缺少提到的正则表达式。

下面的示例由于缺少正则表达式而无法正常工作

var source = document.querySelector('.source').innerHTML; // html from example
var text = 'text using regex'; // what we searching for
var htmlTag = new RegExp('(<\\/?([a-z]+)([^<]+)*(?:>))+', 'g'); // find html tags
var missingRegExp = new RegExp('', 'i'); // << missing regex

// Wrap searched text with span tag
var result = source.replace(missingRegExp, function (searchedText) {
  // Wrap html tags inside searched text with span tag
  searchedText = searchedText.replace(htmlTag, function (match) {
    return '</span>' + match + '<span>';
  });

  return '<span>' + searchedText + '</span>';
});

console.log('Result: ' + result);

在这种情况下，删除html标签不是一个选择。

Answer 1

您text using regex有一个类似于text using regex的字符串。 您应该关心中间空格，并用适当的RegEx替换它们以匹配HTML标记，但是首先需要将每个单词括在括号中：

> '(' + "text using regex".split(' ').join(') (') + ')'
< "(text) (using) (regex)"

下一步是用RegEx替换空格： ((?:\\s*(?:<\\/?\\w[^<>]*>)?\\s*)*)所以我们最后修改的版本应该是：

< "(text)((?:\s*(?:<\/?\w[^<>]*>)?\s*)*)(using)((?:\s*(?:<\/?\w[^<>]*>)?\s*)*)(regex)"

如果我们有3个单词要搜索，那么我们最终将总共有5个捕获组（ n单词-> n + n-1捕获组），因此您应该基于此创建替换字符串。 在这里，我们应该有这样的替换字符串：

<span>$1</span>$2<span>$3</span>$4<span>$5</span>

现在，您有了已编译的RegEx版本和替换字符串， .replace()方法将成功结束它们。

现场演示

使用正则表达式搜索文本，而忽略html标签

问题描述

1 个解决方案

解决方案1
-1 已采纳 2016-09-23 09:03:13

使用正则表达式搜索文本，而忽略html标签

问题描述

1 个解决方案

解决方案1 -1 已采纳 2016-09-23 09:03:13

解决方案1
-1 已采纳 2016-09-23 09:03:13