簡體   English   中英

更正編碼錯誤的字符串(ASCII字符返回UTF-8)

[英]Correcting incorrectly encoded string (ASCII characters back to UTF-8)

這是我從一個Android“ wifi配置文件”( wpa_supplicant.conf )中提取的示例WiFi ssid。

我正在嘗試顯示文件中的所有ssid,大多數都可以,因為它們是用引號引起來的普通字符串,例如,

network={
    ssid="Linksys"
    ...
}

但是,有些條目只是想有所不同和特殊,例如,

network={
    ssid=e299aa20e6b7a1e5ae9ae69c89e98ca2e589a920e299ab
    ...
}

現在,問題是,如何將其轉換回可讀的字符串(最好是在JS中)? 我懷疑編碼錯誤(盡管它在本機設備上正確顯示。)

顯然,該字符串未進行十六進制編碼。 通過一些字符串操作將其轉換回二進制,我能夠將其編碼回可讀形式。

function HextoUTF8(txt) {
    function HexStringToBytes(str) {
        if (str.length % 2) throw TypeError("Not a valid length");

        return [].map.call(str, function(e) {
            return ("000" + parseInt(e, 16).toString(2)).slice(-4);
        }).join("").match(/.{8}/g);
    }

    function BytesToUTF8(bytes) {
        var inExpectationMode = false,
            itr = new Iterator(bytes),
            byte,
            availableBitsTable = {
                "1": -7,
                "2": -5,
                "3": -4,
                "4": -3
            },
            expectingBitsLeft = 0,
            currectCharacter = "",
            result = "";

        while (byte = itr.next(), !byte.ended) {
            byte = byte.value;

            if (inExpectationMode) {
                currectCharacter += byte.slice(-6);
            } else {
                //First in sequence
                expectingBitsLeft = determineSequenceLength(byte);
                currectCharacter += byte.slice(availableBitsTable[expectingBitsLeft]);
            }

            inExpectationMode = true;
            expectingBitsLeft--;

            if (!expectingBitsLeft) {
                inExpectationMode = false;
                result += String.fromCharCode(parseInt(currectCharacter, 2));
                currectCharacter = "";
            }
        }

        return result;
    }

    function determineSequenceLength(byte) {
        if (byte[0] === "0") return 1;
        else if (byte.slice(0, 3) === "110") return 2;
        else if (byte.slice(0, 4) === "1110") return 3;
        else if (byte.slice(0, 5) === "11110") return 4;
    }

    function Iterator(array) {
        if (this === window) throw TypeError("This is a class");
        if (!Array.isArray(array)) throw TypeError("An array is required");

        this.i = -1;
        this.ended = !array.length;
        this.array = function() {
            return array;
        };
    }

    Iterator.prototype.next = function() {
        if (this.ended || ++this.i == this.array().length) {
            this.ended = true;
            return {
                ended: true
            };
        } else {
            return {
                ended: this.ended,
                value: this.array()[this.i]
            };
        }
    }

    return BytesToUTF8(HexStringToBytes(txt));
}

理想情況下,我應該進行位操作,但是無論如何,

> HextoUTF8("e299aa20e6b7a1e5ae9ae69c89e98ca2e589a920e299ab");
> "♪ 淡定有錢剩 ♫"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM