简体   繁体   中英

How to convert string of mixed latin and unicode characters

I have a number of strings consisting of mixed latin and unicode encoded cyrillic symbols. What I need is a javascript function to convert these strings into a human readable form. Here is what I came up with :

var EGstr = 'Гриф Kettler прямой';
var newStr = EGstr.replace(/&#(\d+);/g, String.fromCharCode('$1') );

Supposed to be working fine but it's not... Please tell me how to change the code properly.

You can use:

var d = document.createElement('div');
d.innerHTML = 'Гриф Kettler прямой';
alert(d.innerHTML); //Гриф Kettler прямой

instead of regex.

Or if we put it into a function...

function getText(txt) {
  var d = document.createElement('div');
  d.innerHTML = txt;
  return d.innerHTML;
}

You can supply a replacement function to replace method:

var newStr = EGstr.replace(/&#(\d+);/g, function(_, $1) {
    return String.fromCharCode($1);
});

The 1st argument to the replacement function will be the text that matches the whole expression (which we don't need).

The 2nd argument onwards will be whatever captured by capturing groups.

The next to last argument and the last argument will contain the offset of the match and the source string respectively (which we also don't need here, so I don't declare them in the replacement function).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM