简体   繁体   中英

Matching on words with possibly special characters

I'm trying to replace all the occurrence of a given word in a string but it is possible that the word contains a special character that needs to be escaped. Here's an example:

The ERA is the mean of earned runs given up by a pitcher per nine innings pitched. Meanwhile, the ERA+, the adjusted ERA, is a pitcher's earned run average (ERA) according to the pitcher's ballpark (in case the ballpark favors batters or pitchers) and the ERA of the pitcher's league.

I would like to be able to do the following:

string = "The ERA..." // from above
string = string.replaceAll("ERA", "<b>ERA</b>");
string = string.replaceAll("ERA+", "<u>ERA+</u>");

without ERA and ERA conflicting. I've been using the protoype replaceAll posted previously along with a regular expression found somewhere else on SO (I can't seem to find the link in my history unfortunately)

String.prototype.replaceAll = function (find, replace) {
    var str = this;
    return str.replace(new RegExp(find.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), 'g'),     replace);
};

function loadfunc() {
    var markup = document.getElementById('thetext').innerHTML;
    var terms = Object.keys(acronyms);
    for (i=0; i<terms.length; i++) {
        markup = markup.replaceAll(terms[i], '<abbr title=\"' + acronyms[terms[i]] + '\">' + terms[i] + '</abbr>');
    }
    document.getElementById('thetext').innerHTML = markup;
}

Basically what the code does is adding an tag to abbreviation to include the definition when mouseovering. The problem is that the current regular expression is way too loose. My previous attempts worked partially but failed to make the difference between things like ERA and ERA+ or would completely skip over something like "K/9" or "IP/GS" (which should be a match by itself and not for "IP" or "GS" individually)

I should mention that acronyms is an array that looks like:

var acronyms = {
    "ERA": "Earned Run Average: ...",
    "ERA+": "Earned Run Average adjusted to ..."
};

Also (although this is fairly obvious) 'thetext' is a dummy div containing some text. The loadfunc() function is executed from <body onload="loadfunc()">

Thanks!

OK, this is a lot to work with -- after looking at your jsFiddle.

I think the best you're going to get is searching for whole words that begin with a capital letter and may contain / or % . Something like this: ([AZ][\\w/%]+)

Caveat: no matter how you do this, if you're doing it in the browser (eg you can't update the raw data) it's going to be process intensive.

And you can implement it like this:

var repl = str.replace(/([A-Z][\w\/%]+)/g, function(match) {
    //alert(match);
    if (match in acronyms)
        return "<abbr title='" + acronyms[match] + "'>" + match + "</abbr>";
    else
        return match;
});

Here's a working jsFiddle: http://jsfiddle.net/remus/9z6fg/

Note that jQuery isn't required, just used it in this case for ease of updating the DOM in jsFiddle.

You want to use regular expressions with negative lookahead:

string.replace(/\bERA(?!\+)\b/g, "<b>ERA</b>");

and

string.replace(/\bERA\+/g, "<u>ERA+</u>");

The zero-width word boundary \\b has been added for good measure, so you don't accidentally match strings like 'BERA', etc.

Another idea is to sort the list of acronyms by longest key to smallest. This way you are sure to substitute all 'ERA+' before 'ERA', so there is no substring conflict.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM