简体   繁体   English

JavaScript等效于Java的Charset / String类组合,用于解码字节数组

[英]JavaScript equivalent to Java's Charset/String class combination for decoding byte arrays

In Java, if we know the encoding for a byte array, we can decode it and get the corresponding characters as follows - 在Java中,如果我们知道字节数组的编码,我们可以对其进行解码并获得相应的字符,如下所示 -

Charset charset = Charset.forName(encoding);
String decodedString = new String(byteArray, charset);

How can the same result be achieved in JavaScript? 如何在JavaScript中实现相同的结果?

Suppose I read a file that I know is windows-1253 encoded (Greek). 假设我读了一个我知道是windows-1253编码的文件(希腊文)。 In order to correctly display the file contents, I would have to decode the bytes in the file. 为了正确显示文件内容,我必须解码文件中的字节。

If we do not decode (or open the file in a text editor that doesn't know the encoding), we may see something like this - 如果我们不解码(或在不知道编码的文本编辑器中打开文件),我们可能会看到类似这样的内容 -

ÁõôÞ åßíáé ç åëëçíéêÞ.

But when this text (ie the bytes) is decoded, we get 但是当这个文本(即字节)被解码时,我们得到了

Αυτή είναι η ελληνική.

in JavaScript strings are always UTF-16 encoded. 在JavaScript字符串中始终采用UTF-16编码。 ECMAScript ECMAScript中

Hope this will help you: 希望这个能对您有所帮助:

var getString = function (strBytes) {

    var MAX_SIZE = 0x4000;
    var codeUnits = [];
    var highSurrogate;
    var lowSurrogate;
    var index = -1;

    var result = '';

    while (++index < strBytes.length) {
        var codePoint = Number(strBytes[index]);

    if (codePoint === (codePoint & 0x7F)) {


    } else if (0xF0 === (codePoint & 0xF0)) {
        codePoint ^= 0xF0;
        codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
        codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
        codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
    } else if (0xE0 === (codePoint & 0xE0)) {
        codePoint ^= 0xE0;
        codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
        codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
    } else if (0xC0 === (codePoint & 0xC0)) {
        codePoint ^= 0xC0;
        codePoint = (codePoint << 6) | (strBytes[++index] ^ 0x80);
    } 

        if (!isFinite(codePoint) || codePoint < 0 || codePoint > 0x10FFFF || Math.floor(codePoint) != codePoint)
            throw RangeError('Invalid code point: ' + codePoint);

        if (codePoint <= 0xFFFF)
            codeUnits.push(codePoint);
        else {
            codePoint -= 0x10000;
            highSurrogate = (codePoint >> 10) | 0xD800;
            lowSurrogate = (codePoint % 0x400) | 0xDC00;
            codeUnits.push(highSurrogate, lowSurrogate);
        }
        if (index + 1 == strBytes.length || codeUnits.length > MAX_SIZE) {
            result += String.fromCharCode.apply(null, codeUnits);
            codeUnits.length = 0;
        }
    }

    return result;
}

All the best ! 祝一切顺利 !

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM