简体   繁体   English

是否可以在js中使用正则表达式来替换几个不同的字符?

[英]is it possible to use regex in js to replace several different chars?

I need to replace all accented char in a string by it's unaccented version, for sorting. 我需要将字符串中的所有重音字符替换为未重音版本,以进行排序。 I found how to match the accented ones, but is it possible to use a regex to replace each one? 我找到了如何匹配重音符号,但是是否可以使用正则表达式替换每个重音符号? I mean: 我的意思是:

var re = /ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ/g;
var str = "ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ";
var newstr = str.replace(re, 'M');
console.log(newstr);

this prints 'M' but I need :'uUuUaaaeeeiiiooouuuAAAEEEIIIOOOUUnN' 这会打印'M',但我需要:'uUuUaaaeeeiiiooouuuAAAEEEIIIOOOUUnN'

Is this possible? 这可能吗? thanks 谢谢

You need to use character classes. 您需要使用字符类。

var re = /[ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ]/g;

Then, you can pass a function as a second argument to the replace function. 然后,您可以将函数作为第二个参数传递给replace函数。 This function shall contain the conversion logic. 该功能应包含转换逻辑。 A simple way would be to use a conversion map. 一种简单的方法是使用转换图。

Eg 例如

var re = /[ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ]/g;

//incomplete but you get the idea
var conversionMap = {
    'ù': 'u',
    'Ù': 'U',
    'ü': 'u',
    'Ü': 'U',
    'ä': 'a'
};

"ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ".replace(re, function (c) {
    return conversionMap[c] || c;
}); //uUuUaàáëèéïìíöòóuuúÄÀÁËÈÉÏÌÍÖÒÓUÚñÑ

FIDDLE 小提琴

http://jsfiddle.net/Victornpb/YPtaN/4 http://jsfiddle.net/Victornpb/YPtaN/4

var deaccentuate = (function(){

    var accent = "ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ",
        latin  = "uUuUaaaeeeiiiooouuuAAAEEEIIIOOOUUnN".split("");

    var re = new RegExp("["+accent+"]", "g");

    return function(str){
        return str.replace(re, function(c){
            return latin[accent.indexOf(c)]; }
        );
    } 
})();

deaccentuate("Olá, como estás?"); //Ola, como estas?

Benchmark 基准测试

I realized a benchmark test with a 2KB text and my function was faster than other answers, reaching 59000 Ops/sec 我用2KB文字实现了基准测试,并且我的功能比其他答案更快 ,达到了59000 Ops / sec

http://jsperf.com/deaccentuate http://jsperf.com/deaccentuate

在此处输入图片说明

This is fairly verbose, in order to be readable. 为了便于阅读,这非常冗长。 (Well, to each their own, anyway.) (不管怎么说,每个人自己。)

var deaccentuate = (function() {
  var conversion =
      { 'a' : /[äàá]/g
      , 'e' : /[ëèé]/g
      , 'i' : /[ïìí]/g
      , 'o' : /[öòó]/g
      , 'u' : /[üùú]/g
      , 'n' : /ñ/g
      , 'A' : /[ÄÀÁ]/g
      , 'E' : /[ËÈÉ]/g
      , 'I' : /[ÏÌÍ]/g
      , 'O' : /[ÖÒÓ]/g
      , 'U' : /[ÜÙÚ]/g
      , 'N' : /Ñ/g
      }

  return function(str) {
    return Object.keys(conversion).reduce(function(str, c) {
      return str.replace(conversion[c], c)
    }, str)
  }
}())

Usage: ( http://jsbin.com/UFEbuho/1/ ) 用法:( http://jsbin.com/UFEbuho/1/

var input = "ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ"

console.log(deaccentuate(input))

The idea is to loop over the keys of the conversion table and replace anything that matches the pattern of that key to the key itself. 这个想法是循环转换表的键,并将与该键的模式匹配的任何内容替换为键本身。 This is certainly not the most efficient way to do this, but unless the input strings are fairly long it shouldn't matter much. 当然,这不是最有效的方法,但是除非输入字符串相当长,否则无关紧要。

I can't think about an easier way to efficiently remove all diacritics from a string than using this amazing solution . 与使用这种惊人的解决方案相比,我想不到一种更有效地从字符串中删除所有变音符的简便方法。

See it in action: 实际观看:

 var str = "ùÙüÜäàáëèéïìíöòóüùúÄÀÁËÈÉÏÌÍÖÒÓÜÚñÑ"; var str_norm = str.normalize('NFD').replace(/[\̀-\ͯ]/g, ''); console.log(str_norm); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM