简体   繁体   English

Javascript Base64编码UTF8字符串在webkit / safari中失败

[英]Javascript Base64 encoding UTF8 string fails in webkit/safari

I'm trying to base64 encode a utf8 string containing Thai characters. 我正在尝试base64编码包含泰语字符的utf8字符串。 I'm using the browser's built in btoa function. 我正在使用浏览器内置的btoa功能。 It works for ascii text, however Thai is causing it to throw a INVALID_CHARACTER_ERR: DOM Exception 5 exception. 它适用于ascii文本,但是Thai会导致它抛出INVALID_CHARACTER_ERR: DOM Exception 5异常。

Here's a sample that fails (the character that looks like an "n" is Thai) 这是一个失败的样本(看起来像“n”的字符是泰语)

btoa('aก')


What do I need to do to base64 encode non-ascii strings? 我需要做什么来base64编码非ascii字符串?

var Base64 = {
  encode: function(s) {
    return btoa(unescape(encodeURIComponent(s)));
  },
  decode: function(s) {
    return decodeURIComponent(escape(atob(s)));
  }
};

Unfortunately btoa/atob aren't specified in any standard, but the implementations in firefox and webkit both fail on multibyte characters so even if they were now specified those builtin functions would not be able to support multibyte characters (as the input and output strings would necessarily change). 不幸的是btoa / atob没有在任何标准中指定,但firefox和webkit中的实现都在多字节字符上失败,所以即使它们现在被指定,那些内置函数也不能支持多字节字符(因为输入和输出字符串会必然会改变)。

It would seem your only option would be to roll your own base64 encode+decode routines 看起来您唯一的选择就是推出自己的base64编码+解码程序

I know this is old, but I was recently looking for a UTF8-to-Base64 encoder as well. 我知道这是旧的,但我最近也在寻找UTF8-to-Base64编码器。 I found a handy little script at http://www.webtoolkit.info/javascript-base64.html , and a performance improved version at http://jsbase64.codeplex.com/ . 我在http://www.webtoolkit.info/javascript-base64.html找到了一个方便的小脚本,在http://jsbase64.codeplex.com/找到了性能改进的版本。

Here is the script: 这是脚本:

var B64 = {
    alphabet: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=',
    lookup: null,
    ie: /MSIE /.test(navigator.userAgent),
    ieo: /MSIE [67]/.test(navigator.userAgent),
    encode: function (s) {
        var buffer = B64.toUtf8(s),
            position = -1,
            len = buffer.length,
            nan0, nan1, nan2, enc = [, , , ];
        if (B64.ie) {
            var result = [];
            while (++position < len) {
                nan0 = buffer[position];
                nan1 = buffer[++position];
                enc[0] = nan0 >> 2;
                enc[1] = ((nan0 & 3) << 4) | (nan1 >> 4);
                if (isNaN(nan1))
                    enc[2] = enc[3] = 64;
                else {
                    nan2 = buffer[++position];
                    enc[2] = ((nan1 & 15) << 2) | (nan2 >> 6);
                    enc[3] = (isNaN(nan2)) ? 64 : nan2 & 63;
                }
                result.push(B64.alphabet.charAt(enc[0]), B64.alphabet.charAt(enc[1]), B64.alphabet.charAt(enc[2]), B64.alphabet.charAt(enc[3]));
            }
            return result.join('');
        } else {
            var result = '';
            while (++position < len) {
                nan0 = buffer[position];
                nan1 = buffer[++position];
                enc[0] = nan0 >> 2;
                enc[1] = ((nan0 & 3) << 4) | (nan1 >> 4);
                if (isNaN(nan1))
                    enc[2] = enc[3] = 64;
                else {
                    nan2 = buffer[++position];
                    enc[2] = ((nan1 & 15) << 2) | (nan2 >> 6);
                    enc[3] = (isNaN(nan2)) ? 64 : nan2 & 63;
                }
                result += B64.alphabet[enc[0]] + B64.alphabet[enc[1]] + B64.alphabet[enc[2]] + B64.alphabet[enc[3]];
            }
            return result;
        }
    },
    decode: function (s) {
        if (s.length % 4)
            throw new Error("InvalidCharacterError: 'B64.decode' failed: The string to be decoded is not correctly encoded.");
        var buffer = B64.fromUtf8(s),
            position = 0,
            len = buffer.length;
        if (B64.ieo) {
            var result = [];
            while (position < len) {
                if (buffer[position] < 128)
                    result.push(String.fromCharCode(buffer[position++]));
                else if (buffer[position] > 191 && buffer[position] < 224)
                    result.push(String.fromCharCode(((buffer[position++] & 31) << 6) | (buffer[position++] & 63)));
                else
                    result.push(String.fromCharCode(((buffer[position++] & 15) << 12) | ((buffer[position++] & 63) << 6) | (buffer[position++] & 63)));
            }
            return result.join('');
        } else {
            var result = '';
            while (position < len) {
                if (buffer[position] < 128)
                    result += String.fromCharCode(buffer[position++]);
                else if (buffer[position] > 191 && buffer[position] < 224)
                    result += String.fromCharCode(((buffer[position++] & 31) << 6) | (buffer[position++] & 63));
                else
                    result += String.fromCharCode(((buffer[position++] & 15) << 12) | ((buffer[position++] & 63) << 6) | (buffer[position++] & 63));
            }
            return result;
        }
    },
    toUtf8: function (s) {
        var position = -1,
            len = s.length,
            chr, buffer = [];
        if (/^[\x00-\x7f]*$/.test(s)) while (++position < len)
            buffer.push(s.charCodeAt(position));
        else while (++position < len) {
            chr = s.charCodeAt(position);
            if (chr < 128)
                buffer.push(chr);
            else if (chr < 2048)
                buffer.push((chr >> 6) | 192, (chr & 63) | 128);
            else
                buffer.push((chr >> 12) | 224, ((chr >> 6) & 63) | 128, (chr & 63) | 128);
        }
        return buffer;
    },
    fromUtf8: function (s) {
        var position = -1,
            len, buffer = [],
            enc = [, , , ];
        if (!B64.lookup) {
            len = B64.alphabet.length;
            B64.lookup = {};
            while (++position < len)
                B64.lookup[B64.alphabet.charAt(position)] = position;
            position = -1;
        }
        len = s.length;
        while (++position < len) {
            enc[0] = B64.lookup[s.charAt(position)];
            enc[1] = B64.lookup[s.charAt(++position)];
            buffer.push((enc[0] << 2) | (enc[1] >> 4));
            enc[2] = B64.lookup[s.charAt(++position)];
            if (enc[2] == 64)
                break;
            buffer.push(((enc[1] & 15) << 4) | (enc[2] >> 2));
            enc[3] = B64.lookup[s.charAt(++position)];
            if (enc[3] == 64)
                break;
            buffer.push(((enc[2] & 3) << 6) | enc[3]);
        }
        return buffer;
    }
};

Disclaimer: I haven't tested this with Thai characters specifically, but assume it will work. 免责声明:我没有专门用泰语字符测试过这个,但是假设它会起作用。

Sav SAV

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM