简体   繁体   中英

Decode cyrillic quoted-printable content

I'm using this sample for getting mail from server. Problem is that response contains cyrillic symbols I cannot decode. Here is a header:

Content-type: text/html; charset="koi8-r"
Content-Transfer-Encoding: quoted-printable

And receive response function:

static void receiveResponse(string command)
{
    try
    {
        if (command != "")
        {
            if (tcpc.Connected)
            {
                dummy = Encoding.ASCII.GetBytes(command);
                ssl.Write(dummy, 0, dummy.Length);
            }
            else
            {
                throw new ApplicationException("TCP CONNECTION DISCONNECTED");
            }
        }
        ssl.Flush();

        byte[] bigBuffer = new byte[1024*16];
        int bites = ssl.Read(bigBuffer, 0, bigBuffer.Length);

        byte[] buffer = new byte[bites];
        Array.Copy(bigBuffer, 0, buffer, 0, bites);

        sb.Append(Encoding.ASCII.GetString(buffer));

        string result = sb.ToString();

        // here is an unsuccessful attempt at decoding
        result = Regex.Replace(result, @"=([0-9a-fA-F]{2})",
            m => m.Groups[1].Success
            ? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
            : "");

        byte[] bytes = Encoding.Default.GetBytes(result);
        result = Encoding.GetEncoding("koi8r").GetString(bytes);
    }
    catch (Exception ex)
    {
        throw new ApplicationException(ex.ToString());
    }
}

How to decode stream correctly? In result string I got <p>=F0=D2=C9=D7=C5=D4 =D1 =F7=C1=CE=D1</p> instead of <p>Привет я Ваня</p> .

As @Max pointed out, you will need to decode the content using the encoding algorithm declared in the Content-Transfer-Encoding header.

In your case, it is the quoted-printable encoding.

You will need to decode the text of the message into an array of bytes and then you'll need to convert that array of bytes into a string using the appropriate System.Text.Encoding. The name of the encoding to use will typically be specified in the Content-Type header as the charset parameter (in your case, koi8-r).

Since you already have the text as bytes in the buffer variable, simply perform the deciding on that:

byte[] buffer = new byte[bites];
int decodedLength = 0;

for (int i = 0; i < bites; i++) {
    if (bigBuffer[i] == (byte) '=') {
        if (bites > i + 1) {
            // possible hex sequence
            byte b1 = bigBuffer[i + 1];
            byte b2 = bigBuffer[i + 2];

            if (IsXDigit (b1) && IsXDigit (b2)) {
                // decode
                buffer[decodedLength++] = (ToXDigit (b1) << 4) | ToXDigit (b2);
                i += 2;
            } else if (b1 == (byte) '\r' && b2 == (byte) '\n') {
                // folded line, drop the '=\r\n' sequence
                i += 2;
            } else {
                // error condition, just pass it through
                buffer[decodedLength++] = bigBuffer[i];
            }
        } else {
            // truncated? just pass it through
            buffer[decodedLength++] = bigBuffer[i];
        }
    } else {
        buffer[decodedLength++] = bigBuffer[i];
    }
}

string result = Encoding.GetEncoding ("koi8-r").GetString (buffer, 0, decodedLength);

Custom functions:

static byte ToXDigit (byte c)
{
    if (c >= 0x41) {
        if (c >= 0x61)
            return (byte) (c - (0x61 - 0x0a));

        return (byte) (c - (0x41 - 0x0A));
    }

    return (byte) (c - 0x30);
}

static bool IsXDigit (byte c)
{
    return (c >= (byte) 'A' && c <= (byte) 'F') || (c >= (byte) 'a' && c <= (byte) 'f') || (c >= (byte) '0' && c <= (byte) '9');
}

Of course, instead of writing your own hodge podge IMAP library, you could just use MimeKit and MailKit ;-)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM