简体   繁体   中英

Match characters at start of string, ignore strings in html tags

A little help required please...

I have a regular expression that matches characters at the start of a string as follows:

If I have a set of strings like so:

Ray Fox 
Foster Joe
Finding Forrester

REGEX

/\bfo[^\b]*?\b/gi 

This will match 'FO' in Fox, Foster, and Forrester as expected:

However, I am faced with an issue where if the set of strings are wrapped in html tags like so;-

<span class="fontColor1">Ray Fox</span>
<span class="fontColor2">Foster Joe</span>
<span class="fontColor3">Finding Forrester</span>

This will match 'FO' in fontColor* as well.

I'm fairly green with Regular expressions, I need a little help updating the query so that it only searches values between HTML tags where HTML tags exist, but still works correctly if HTML tags do not exist.

What about

<.*?span.*?>(.*?)<\s?\/.*?span.*?>

And where do you have text where html tags don't exist? That makes no sense.

EDIT:

This solution will not match nested tags, but as the question is written, that doesn't seem to be an issue.

You can use a html parser and extract pure text, and match that.

var root;

try {
    root = document.implementation.createHTMLDocument("").body;
}
catch(e) {
    root = document.createElement("body");
}

root.innerHTML = '<span class="fontColor1">Ray Fox</span>\
            <span class="fontColor2">Foster Joe</span>\
            <span class="fontColor3">Finding Forrester</span>';

//If you are using jQuery
var text = $(root).text();

//Proceed as normal with the text variable

If you are not using jQuery, you can replace $(root).text() with findText(root) , where findText :

function findText(root) {
    var ret = "",
        nodes = root.childNodes;
    for (var i = 0; i < nodes.length; ++i) {
        if (nodes[i].nodeType === 3) {
            ret += nodes[i].nodeValue;
        } else if (nodes[i].nodeType === 1) {
            ret += findText(nodes[i]);
        }
    }
    return ret;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM