简体   繁体   中英

Python: gb2312 codec can't decode bytes

I have a word-encoded string from received mail. When parsing encoded word in Python3, I got an exception

'gb2312' codec can't decode bytes in position 18-19: illegal multibyte sequence

raised from make_header method.

from email.header import decode_header, make_header

hdr = decode_header("""=?gb2312?B?QSBWIM34IMXMILP2IMrbICAgqEMgs8kgyMsg?=""")
make_header(hdr)

Parsing encoded string in online tools works without problems ( http://dogmamix.com/MimeHeadersDecoder/ ). Any suggestions what I am doing wrong? Thanks

The error message tells you that the bytes in position 18-19 are not valid for this encoding.

decode_header simply extracts a bunch of bytes and an encoding. make_header actually attempts to interpret those bytes in that encoding, and fails, because these bytes are not valid in that encoding.

Similarly,

bash$ base64 -D <<<'QSBWIM34IMXMILP2IMrbICAgqEMgs8kgyMsg' |
> iconv -f gb2312 -t utf-8
A V 网 盘 出 售   
iconv: (stdin):1:18: cannot convert

So the error message simply tells you that this data is not valid. We cannot tell without more information what the data should be, and neither can Python or your program do that.

For a rough parable, you can g??ss which b?t?s are m?ss?ng here, but not in ?h?? l?ng?? s???e???.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM