简体   繁体   中英

ICQ encoding of Special Characters

I'm working with ICQ protocol and I found problem with special letters (fxp diacritics). I read that ICQ using another encoding (CP-1251 if I remember).

How can I decode string with text to correct encoding?

I've tried using UTF8Encoding class, but without success.

Using ICQ-sharp library.

    private void ParseMessage (string uin, byte[] data)
    {
        ushort capabilities_length = LittleEndianBitConverter.Big.ToUInt16 (data, 2);
        ushort msg_tlv_length = LittleEndianBitConverter.Big.ToUInt16 (data, 6 + capabilities_length);
        string message = Encoding.UTF8.GetString (data, 12 + capabilities_length, msg_tlv_length - 4);

        Debug.WriteLine(message);
    }

If contact using the same client it's OK, but if not incoming and outcoming messages with diacritics are just unreadable.

I've determinated (using this -> https://stackoverflow.com/a/12853721/846232 ) that it's in BigEndianUnicode encoding. But if string not contains diacritics its unreadable (chinese letters). But if I use UTF8 encoding on text without diacritics its ok. But I don't know how to do that it will be encoded right allways.

If UTF-8 kinda works (ie it works for "english", or any US-ASCII characters), then you don't have UTF-16. Latin1 (or Windows-1252, Microsoft's variant), or eg Windows-1251 or Windows-1250 are perfectly possible though, since these the first part containing latin letters without diacritics are the same.

Decode like this:

var encoding = Encoding.GetEncoding("Windows-1250");
string message = encoding.GetString(data, 12 + capabilities_length, msg_tlv_length - 4);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM