简体   繁体   English

匹配可能带有特殊字符的单词

[英]Matching on words with possibly special characters

I'm trying to replace all the occurrence of a given word in a string but it is possible that the word contains a special character that needs to be escaped. 我正在尝试替换字符串中给定单词的所有出现,但单词中可能包含需要转义的特殊字符。 Here's an example: 这是一个例子:

The ERA is the mean of earned runs given up by a pitcher per nine innings pitched. ERA是投手每九局投手放弃的获胜奔跑的平均值。 Meanwhile, the ERA+, the adjusted ERA, is a pitcher's earned run average (ERA) according to the pitcher's ballpark (in case the ballpark favors batters or pitchers) and the ERA of the pitcher's league. 同时,ERA +(调整后的ERA)是根据投手的棒球场(如果棒球场偏向击球手或投手的情况)和投手联赛的ERA得出的投手的平均得分(ERA)。

I would like to be able to do the following: 我希望能够执行以下操作:

string = "The ERA..." // from above
string = string.replaceAll("ERA", "<b>ERA</b>");
string = string.replaceAll("ERA+", "<u>ERA+</u>");

without ERA and ERA conflicting. 没有ERA和ERA冲突。 I've been using the protoype replaceAll posted previously along with a regular expression found somewhere else on SO (I can't seem to find the link in my history unfortunately) 我一直在使用原型发布的replaceAll以及在SO上其他地方找到的正则表达式(不幸的是,我似乎无法在历史记录中找到链接)

String.prototype.replaceAll = function (find, replace) {
    var str = this;
    return str.replace(new RegExp(find.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), 'g'),     replace);
};

function loadfunc() {
    var markup = document.getElementById('thetext').innerHTML;
    var terms = Object.keys(acronyms);
    for (i=0; i<terms.length; i++) {
        markup = markup.replaceAll(terms[i], '<abbr title=\"' + acronyms[terms[i]] + '\">' + terms[i] + '</abbr>');
    }
    document.getElementById('thetext').innerHTML = markup;
}

Basically what the code does is adding an tag to abbreviation to include the definition when mouseovering. 基本上,代码的作用是在鼠标悬停时为缩写添加标签以包括定义。 The problem is that the current regular expression is way too loose. 问题在于当前的正则表达式过于宽松。 My previous attempts worked partially but failed to make the difference between things like ERA and ERA+ or would completely skip over something like "K/9" or "IP/GS" (which should be a match by itself and not for "IP" or "GS" individually) 我之前的尝试部分奏效,但未能使ERA和ERA +之类的区别出现,或者完全跳过了诸如“ K / 9”或“ IP / GS”之类的内容(它们本身应该是匹配项,而不是“ IP”或分别为“ GS”)

I should mention that acronyms is an array that looks like: 我应该提到的首字母缩写是一个数组,如下所示:

var acronyms = {
    "ERA": "Earned Run Average: ...",
    "ERA+": "Earned Run Average adjusted to ..."
};

Also (although this is fairly obvious) 'thetext' is a dummy div containing some text. 同样(尽管这很明显), 'thetext'是一个包含一些文本的虚拟div。 The loadfunc() function is executed from <body onload="loadfunc()"> <body onload="loadfunc()">执行loadfunc()函数

Thanks! 谢谢!

OK, this is a lot to work with -- after looking at your jsFiddle. 好的,在您查看jsFiddle之后,可以使用很多东西。

I think the best you're going to get is searching for whole words that begin with a capital letter and may contain / or % . 我认为最好的方法是搜索以大写字母开头且可能包含/%整个单词。 Something like this: ([AZ][\\w/%]+) 像这样的东西: ([AZ][\\w/%]+)

Caveat: no matter how you do this, if you're doing it in the browser (eg you can't update the raw data) it's going to be process intensive. 注意:无论如何执行此操作,如果您在浏览器中执行此操作(例如,无法更新原始数据),则将需要大量过程。

And you can implement it like this: 您可以这样实现:

var repl = str.replace(/([A-Z][\w\/%]+)/g, function(match) {
    //alert(match);
    if (match in acronyms)
        return "<abbr title='" + acronyms[match] + "'>" + match + "</abbr>";
    else
        return match;
});

Here's a working jsFiddle: http://jsfiddle.net/remus/9z6fg/ 这是一个工作的jsFiddle: http : //jsfiddle.net/remus/9z6fg/

Note that jQuery isn't required, just used it in this case for ease of updating the DOM in jsFiddle. 请注意,jQuery不是必需的,在这种情况下仅使用jQuery即可轻松更新jsFiddle中的DOM。

You want to use regular expressions with negative lookahead: 您想使用带有负前瞻的正则表达式:

string.replace(/\bERA(?!\+)\b/g, "<b>ERA</b>");

and

string.replace(/\bERA\+/g, "<u>ERA+</u>");

The zero-width word boundary \\b has been added for good measure, so you don't accidentally match strings like 'BERA', etc. 零宽度字边界\\b已添加,可以很好地解决问题,因此您不会意外匹配字符串'BERA'等。

Another idea is to sort the list of acronyms by longest key to smallest. 另一个想法是按最长的键到最小的键对缩写词列表进行排序。 This way you are sure to substitute all 'ERA+' before 'ERA', so there is no substring conflict. 这样,您确保在“ ERA”之前替换所有“ ERA +”,因此不会出现子字符串冲突。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM