.decode（）有什么意义？

Question

>>> infile = urllib.request.urlopen("http://www.yahoo.com")

With decoding: 带解码：

>>>infile.read(100).decode()

'<!DOCTYPE html>\n<html lang="en-US" class="dev-desktop uni-purple-border  bkt901 https  uni-dark-purp'

Without decoding: 不解码：

>>>infile.read(100)

b'le" style="">\n<!-- m2 template  -->\n<head>\n    <meta http-equiv="Content-Type" content="text/html; c'

It appears the difference is the 'b before the output, which I assume means bytes. 看来区别是输出之前的'b ，我认为这意味着字节。 Besides that, the output is exactly the same though. 除此之外，输出完全相同。

Answer 1

No, the output is not the same; 不，输出不一样； one is a Unicode value, the other an undecoded bytes value. 一个是Unicode值，另一个是未解码的字节值。

For ASCII, that looks the same, but when you load any web page that uses characters outside the ASCII characterset, the difference will be much clearer. 对于ASCII， 看起来是一样的，但是当您加载使用ASCII字符集之外的字符的任何网页时，两者之间的区别将更加明显。

Take UTF-8 encoded data, for example: 以UTF-8编码的数据为例：

>>> '–'
'–'
>>> '–'.encode('utf8')
b'\xe2\x80\x93'

That's a simple U+2013 EN DASH character. 这是一个简单的U + 2013 EN DASH字符。 The bytes representation shows the 3 bytes UTF-8 uses to encode the codepoint. 字节表示形式显示了UTF-8用于编码代码点的3个字节。

You really want to read up on Unicode vs. encoded data here, I recommend: 您真的想在这里阅读Unicode和编码数据，我建议：

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky 每个软件开发人员绝对，肯定必须绝对了解Unicode和字符集（无借口！）作者：Joel Spolsky
The Python Unicode HOWTO Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder Ned Batchelder的实用Unicode

.decode（）有什么意义？

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-04-26 19:00:40

.decode（）有什么意义？

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-04-26 19:00:40

解决方案1
3 已采纳 2014-04-26 19:00:40