简体   繁体   English

.decode()有什么意义?

[英]What is the point of .decode()

>>> infile = urllib.request.urlopen("http://www.yahoo.com")

With decoding: 带解码:

>>>infile.read(100).decode()

'<!DOCTYPE html>\n<html lang="en-US" class="dev-desktop uni-purple-border  bkt901 https  uni-dark-purp'

Without decoding: 不解码:

>>>infile.read(100)

b'le" style="">\n<!-- m2 template  -->\n<head>\n    <meta http-equiv="Content-Type" content="text/html; c'

It appears the difference is the 'b before the output, which I assume means bytes. 看来区别是输出之前的'b ,我认为这意味着字节。 Besides that, the output is exactly the same though. 除此之外,输出完全相同。

No, the output is not the same; 不,输出不一样; one is a Unicode value, the other an undecoded bytes value. 一个是Unicode值,另一个是未解码的字节值。

For ASCII, that looks the same, but when you load any web page that uses characters outside the ASCII characterset, the difference will be much clearer. 对于ASCII, 看起来是一样的,但是当您加载使用ASCII字符集之外的字符的任何网页时,两者之间的区别将更加明显。

Take UTF-8 encoded data, for example: 以UTF-8编码的数据为例:

>>> '–'
'–'
>>> '–'.encode('utf8')
b'\xe2\x80\x93'

That's a simple U+2013 EN DASH character. 这是一个简单的U + 2013 EN DASH字符。 The bytes representation shows the 3 bytes UTF-8 uses to encode the codepoint. 字节表示形式显示了UTF-8用于编码代码点的3个字节。

You really want to read up on Unicode vs. encoded data here, I recommend: 您真的想在这里阅读Unicode和编码数据,我建议:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM