简体   繁体   English

如何转换拉丁和Unicode混合字符的字符串

[英]How to convert string of mixed latin and unicode characters

I have a number of strings consisting of mixed latin and unicode encoded cyrillic symbols. 我有许多由混合的拉丁和unicode编码西里尔符号组成的字符串。 What I need is a javascript function to convert these strings into a human readable form. 我需要的是一个javascript函数,可将这些字符串转换为人类可读的形式。 Here is what I came up with : 这是我想出的:

var EGstr = 'Гриф Kettler прямой';
var newStr = EGstr.replace(/&#(\d+);/g, String.fromCharCode('$1') );

Supposed to be working fine but it's not... Please tell me how to change the code properly. 本来可以正常工作,但事实并非如此……请告诉我如何正确更改代码。

You can use: 您可以使用:

var d = document.createElement('div');
d.innerHTML = 'Гриф Kettler прямой';
alert(d.innerHTML); //Гриф Kettler прямой

instead of regex. 而不是正则表达式。

Or if we put it into a function... 或者如果我们将其放入函数中...

function getText(txt) {
  var d = document.createElement('div');
  d.innerHTML = txt;
  return d.innerHTML;
}

You can supply a replacement function to replace method: 您可以提供替换功能来replace方法:

var newStr = EGstr.replace(/&#(\d+);/g, function(_, $1) {
    return String.fromCharCode($1);
});

The 1st argument to the replacement function will be the text that matches the whole expression (which we don't need). 替换函数的第一个参数是与整个表达式匹配的文本(我们不需要)。

The 2nd argument onwards will be whatever captured by capturing groups. 从第二个参数开始,将是捕获组捕获的任何参数。

The next to last argument and the last argument will contain the offset of the match and the source string respectively (which we also don't need here, so I don't declare them in the replacement function). 倒数第二个参数和最后一个参数将分别包含match和源字符串的偏移量(我们在这里也不需要,所以我不在替换函数中声明它们)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM