简体   繁体   English

JavaScript RegExp 匹配文本忽略 HTML

[英]JavaScript RegExp match text ignoring HTML

Is it possible to match "the dog is really really fat" in " The <strong>dog</strong> is really <em>really</em> fat! " and add " <span class="highlight">WHAT WAS MATCHED</span> " around it?是否可以在“ The <strong>dog</strong> is really <em>really</em> fat! ”中匹配“the dog is really really fat”并添加“ <span class="highlight">WHAT WAS MATCHED</span> ” 围绕它?

I don't mean this specifically, but generally be able to search text ignoring HTML, keeping it in the end result, and just add the span above around it all?我不是这个意思,但通常能够搜索文本而忽略 HTML,将其保留在最终结果中,然后在上面添加跨度?

EDIT:编辑:
Considering the HTML tag overlapping problem, would it be possible to match a phrase and just add the span around each of the matched words?考虑到 HTML 标签重叠问题,是否可以匹配一个短语并在每个匹配的单词周围添加跨度? The problem here is that I don't want the word "dog" matched when it's not in the searched context, in this case, "the dog is really really fat."这里的问题是,当“狗”这个词不在搜索的上下文中时,我不希望它匹配,在这种情况下,“狗真的很胖”。

Update:更新:

Here is a working fiddle that does what you want.这是一个可以完成您想要的工作的小提琴。 However, you will need to update the htmlTagRegEx to handle matching on any HTML tag, as this just performs a simple match and will not handle all the cases.但是,您需要更新htmlTagRegEx以处理对任何 HTML 标记的匹配,因为这只是执行简单匹配,不会处理所有情况。

http://jsfiddle.net/briguy37/JyL4J/ http://jsfiddle.net/briguy37/JyL4J/

Also, below is the code.另外,下面是代码。 Basically, it takes out the html elements one by one, then does a replace in the text to add the highlight span around the matched selection, and then pushes back in the html elements one by one.基本上就是将html元素一一取出,然后在文本中进行替换以在匹配的选择周围添加高亮跨度,然后将html元素一一推回。 It's ugly, but it's the easiest way I could think of to get it to work...这很丑陋,但这是我能想到的让它工作的最简单方法......

function highlightInElement(elementId, text){
    var elementHtml = document.getElementById(elementId).innerHTML;
    var tags = [];
    var tagLocations= [];
    var htmlTagRegEx = /<{1}\/{0,1}\w+>{1}/;

    //Strip the tags from the elementHtml and keep track of them
    var htmlTag;
    while(htmlTag = elementHtml.match(htmlTagRegEx)){
        tagLocations[tagLocations.length] = elementHtml.search(htmlTagRegEx);
        tags[tags.length] = htmlTag;
        elementHtml = elementHtml.replace(htmlTag, '');
    }

    //Search for the text in the stripped html
    var textLocation = elementHtml.search(text);
    if(textLocation){
        //Add the highlight
        var highlightHTMLStart = '<span class="highlight">';
        var highlightHTMLEnd = '</span>';
        elementHtml = elementHtml.replace(text, highlightHTMLStart + text + highlightHTMLEnd);

        //plug back in the HTML tags
        var textEndLocation = textLocation + text.length;
        for(i=tagLocations.length-1; i>=0; i--){
            var location = tagLocations[i];
            if(location > textEndLocation){
                location += highlightHTMLStart.length + highlightHTMLEnd.length;
            } else if(location > textLocation){
                location += highlightHTMLStart.length;
            }
            elementHtml = elementHtml.substring(0,location) + tags[i] + elementHtml.substring(location);
        }
    }

    //Update the innerHTML of the element
    document.getElementById(elementId).innerHTML = elementHtml;
}

Naah... just use the good old RegExp ;) Naah...只需使用旧的 RegExp ;)

var htmlString = "The <strong>dog</strong> is really <em>really</em> fat!";
var regexp = /<\/?\w+((\s+\w+(\s*=\s*(?:\".*?"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>/gi;
var result = '<span class="highlight">' + htmlString.replace(regexp, '') + '</span>';

A simpler way with JQuery would be.使用 JQuery 的一种更简单的方法是。

originalHtml = $("#div").html();

    newHtml = originalHtml.replace(new RegExp(keyword + "(?![^<>]*>)", "g"), function(e){
                      return "<span class='highlight'>" + e + "</span>";
                   });

$("#div").html(newHtml);

This works just fine for me.这对我来说很好用。

Here is a working regex example to exclude matches inside html tags as well as javascripts:这是一个有效的正则表达式示例,用于排除 html 标签和 javascripts 中的匹配项:

http://refiddle.com/lwy6 http://refiddle.com/lwy6

Use this regex in a replace() script.在 replace() 脚本中使用此正则表达式。

    /(a)(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$)/gi
this.keywords.forEach(keyword => {
  el.innerHTML = el.innerHTML.replace(
    RegExp(keyword + '(?![^<>]*>)', 'ig'),
    matched => `<span class=highlight>${matched}</span>`
  )
})

你可以用这个表达式</?\\w*>使用字符串替换,你会得到你的字符串

If you use jQuery, you can use the text property on the element containing the text you're searching for.如果您使用 jQuery,则可以在包含您要搜索的文本的元素上使用text属性。 Given this markup:鉴于此标记:

<p id="the-text">
  The <strong>dog</strong> is really <em>really</em> fat!
</p>

This would yield "The dog is really really fat!":这将产生“这只狗真的很胖!”:

$('#the-text').text();

You could do your regex search on that text instead of trying to do so in the markup.您可以对该文本进行正则表达式搜索,而不是尝试在标记中进行搜索。

Without jQuery, I'm unsure of an easy way to extract and concatenate the text nodes from all child elements.如果没有 jQuery,我不确定从所有子元素中提取和连接文本节点的简单方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM