简体   繁体   English

JavaScript 中不区分大小写的字符串替换?

[英]Case insensitive string replacement in JavaScript?

I need to highlight, case insensitively, given keywords in a JavaScript string.我需要突出显示 JavaScript 字符串中给定的关键字,不区分大小写。

For example:例如:

  • highlight("foobar Foo bar FOO", "foo") should return "<b>foo</b>bar <b>Foo</b> bar <b>FOO</b>" highlight("foobar Foo bar FOO", "foo")应该返回"<b>foo</b>bar <b>Foo</b> bar <b>FOO</b>"

I need the code to work for any keyword, and therefore using a hardcoded regular expression like /foo/i is not a sufficient solution.我需要代码适用于任何关键字,因此使用像/foo/i这样的硬编码正则表达式并不是一个充分的解决方案。

What is the easiest way to do this?什么是最简单的方法来做到这一点?

(This an instance of a more general problem detailed in the title, but I feel that it's best to tackle with a concrete, useful example.) (这是标题中详述的更一般问题的一个实例,但我觉得最好用一个具体的、有用的例子来解决。)

You can use regular expressions if you prepare the search string.如果准备搜索字符串,则可以使用正则表达式。 In PHP eg there is a function preg_quote, which replaces all regex-chars in a string with their escaped versions.例如,在 PHP 中有一个函数 preg_quote,它将字符串中的所有正则表达式字符替换为其转义版本。

Here is such a function for javascript ( source ):这是 javascript 的这样一个函数(源代码):

function preg_quote (str, delimiter) {
  //  discuss at: https://locutus.io/php/preg_quote/
  // original by: booeyOH
  // improved by: Ates Goral (https://magnetiq.com)
  // improved by: Kevin van Zonneveld (https://kvz.io)
  // improved by: Brett Zamir (https://brett-zamir.me)
  // bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
  //   example 1: preg_quote("$40")
  //   returns 1: '\\$40'
  //   example 2: preg_quote("*RRRING* Hello?")
  //   returns 2: '\\*RRRING\\* Hello\\?'
  //   example 3: preg_quote("\\.+*?[^]$(){}=!<>|:")
  //   returns 3: '\\\\\\.\\+\\*\\?\\[\\^\\]\\$\\(\\)\\{\\}\\=\\!\\<\\>\\|\\:'

  return (str + '')
    .replace(new RegExp('[.\\\\+*?\\[\\^\\]$(){}=!<>|:\\' + (delimiter || '') + '-]', 'g'), '\\$&')
}

So you could do the following:因此,您可以执行以下操作:

function highlight(str, search) {
    return str.replace(new RegExp("(" + preg_quote(search) + ")", 'gi'), "<b>$1</b>");
}
function highlightWords( line, word )
{
     var regex = new RegExp( '(' + word + ')', 'gi' );
     return line.replace( regex, "<b>$1</b>" );
}

You can enhance the RegExp object with a function that does special character escaping for you:您可以使用为您执行特殊字符转义的函数来增强 RegExp 对象:

RegExp.escape = function(str) 
{
  var specials = /[.*+?|()\[\]{}\\$^]/g; // .*+?|()[]{}\$^
  return str.replace(specials, "\\$&");
}

Then you would be able to use what the others suggested without any worries:然后你就可以毫无顾虑地使用其他人的建议:

function highlightWordsNoCase(line, word)
{
  var regex = new RegExp("(" + RegExp.escape(word) + ")", "gi");
  return line.replace(regex, "<b>$1</b>");
}

What about something like this:这样的事情怎么样:

if(typeof String.prototype.highlight !== 'function') {
  String.prototype.highlight = function(match, spanClass) {
    var pattern = new RegExp( match, "gi" );
    replacement = "<span class='" + spanClass + "'>$&</span>";

    return this.replace(pattern, replacement);
  }
}

This could then be called like so:然后可以这样调用:

var result = "The Quick Brown Fox Jumped Over The Lazy Brown Dog".highlight("brown","text-highlight");

Regular expressions are fine as long as keywords are really words, you can just use a RegExp constructor instead of a literal to create one from a variable:只要关键字是真正的单词,正则表达式就可以了,您可以只使用 RegExp 构造函数而不是文字来从变量创建一个:

var re= new RegExp('('+word+')', 'gi');
return s.replace(re, '<b>$1</b>');

The difficulty arises if 'keywords' can have punctuation in, as punctuation tends to have special meaning in regexps.如果“关键字”可以包含标点符号,则会出现困难,因为标点符号在正则表达式中往往具有特殊含义。 Unfortunately unlike most other languages/libraries with regexp support, there is no standard function to escape punctation for regexps in JavaScript.不幸的是,与大多数其他支持 regexp 的语言/库不同,JavaScript 中没有标准函数可以为 regexp 转义标点。

And you can't be totally sure exactly what characters need escaping because not every browser's implementation of regexp is guaranteed to be exactly the same.而且您不能完全确定到底哪些字符需要转义,因为并非每个浏览器的 regexp 实现都保证完全相同。 (In particular, newer browsers may add new functionality.) And backslash-escaping characters that are not special is not guaranteed to still work, although in practice it does. (特别是,较新的浏览器可能会添加新功能。)并不能保证非特殊的反斜杠转义字符仍然有效,尽管在实践中确实如此。

So about the best you can do is one of:因此,您可以做的最好的事情是:

  • attempting to catch each special character in common browser use today [add: see Sebastian's recipe]尝试捕捉当今常见浏览器使用中的每个特殊字符 [添加:请参阅 Sebastian 的食谱]
  • backslash-escape all non-alphanumerics.反斜杠转义所有非字母数字。 care: \\W will also match non-ASCII Unicode characters, which you don't really want.注意: \\W 也将匹配您并不真正想要的非 ASCII Unicode 字符。
  • just ensure that there are no non-alphanumerics in the keyword before searching在搜索之前确保关键字中没有非字母数字

If you are using this to highlight words in HTML which already has markup in, though, you've got trouble.但是,如果您使用它来突出显示 HTML 中已经有标记的单词,那么您就会遇到麻烦。 Your 'word' might appear in an element name or attribute value, in which case attempting to wrap a < b> around it will cause brokenness.您的“单词”可能出现在元素名称或属性值中,在这种情况下,尝试用 < b> 包裹它会导致损坏。 In more complicated scenarios possibly even an HTML-injection to XSS security hole.在更复杂的场景中,甚至可能是对 XSS 安全漏洞的 HTML 注入。 If you have to cope with markup you will need a more complicated approach, splitting out '< ... >' markup before attempting to process each stretch of text on its own.如果您必须处理标记,您将需要一种更复杂的方法,在尝试单独处理每一段文本之前分离出 '< ... >' 标记。

For those poor with disregexia or regexophobia:对于那些患有语言障碍或恐惧症的穷人:

 function replacei(str, sub, f){ let A = str.toLowerCase().split(sub.toLowerCase()); let B = []; let x = 0; for (let i = 0; i < A.length; i++) { let n = A[i].length; B.push(str.substr(x, n)); if (i < A.length-1) B.push(f(str.substr(x + n, sub.length))); x += n + sub.length; } return B.join(''); } s = 'Foo and FOO (and foo) are all -- Foo.' t = replacei(s, 'Foo', sub=>'<'+sub+'>') console.log(t)

Output:输出:

<Foo> and <FOO> (and <foo>) are all -- <Foo>.

Why not just create a new regex on each call to your function?为什么不在每次调用您的函数时创建一个新的正则表达式? You can use:您可以使用:

new Regex([pat], [flags])

where [pat] is a string for the pattern, and [flags] are the flags.其中 [pat] 是模式的字符串,[flags] 是标志。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM