vietnamese email subject encoding?

Question

Subject: Re:
 =?UTF-8?Q?Th=E1=BA=A7y_g=E1=BB=ADi_b=C3=A0i_t=E1=BA=ADp_cho_em_v=E1=BB?=
 =?UTF-8?Q?=9Bi.?=

I received an email with this subject header. How should it be decoded?

Answer 1

It's a MIME encoded-word . The syntax is =? charset ? transfer-encoding ? encoded-data ?= . Transfer encoding is B ase64 or Q uoted-printable.

To decode it:

Split the encoded word into its 3 parts.
Decode the data (3rd part) into byte[] according to its transfer encoding (2nd part). In this case, the Q encoding is used, so replace the = xx sequences with the corresponding octets. This gives you the two byte arrays [84, 104, 225, 186, 167, 121, 95, 103, 225, 187, 173, 105, 95, 98, 195, 160, 105, 95, 116, 225, 186, 173, 112, 95, 99, 104, 111, 95, 101, 109, 95, 118, 225, 187] and [155, 105, 46].
Decode these byte arrays according to the specified encoding.

In this particular example, both of the encoded-words are invalid : The first one is missing a trail byte of a 3-byte UTF-8 character, and the second one starts with a trail byte. But combined , they're valid UTF-8, and decode to the string Thầy_gửi_bài_tập_cho_em_với. (which Google Translates to "Teacher sent me to exercise.")

Answer 2

This is defined in RFC 2047: http://tools.ietf.org/html/rfc2047

See section 4 on encodings. I'm not sure if there's anything in the base framework that handles this/handles this correctly.

Edit: here's one person's attempt at this: http://vsevolodp.blogspot.com/2010/11/how-to-decode-encoded-word-header.html

vietnamese email subject encoding?

Question

2 answers

solution1
6 ACCPTED 2011-03-26 16:56:51

solution2
6 2011-03-26 18:04:24

vietnamese email subject encoding?

Question

2 answers

solution1 6 ACCPTED 2011-03-26 16:56:51

solution2 6 2011-03-26 18:04:24

solution1
6 ACCPTED 2011-03-26 16:56:51

solution2
6 2011-03-26 18:04:24