Python 3中for循环的数据类型和文档

Question

I am very confused about the data types and UTF-8 encoding. 我对数据类型和UTF-8编码感到困惑。 What is actually happening under the hood? 实际情况是什么？ I am reading a messy JSON data without delimiters in Python 3 (data has Japanese/Chinese characters time to time). 我正在Python 3中读取不带分隔符的凌乱JSON数据（数据有时会包含日语/中文字符）。

I am reading in the data: 我正在读取数据：

url = "http://localhost:8001"
data = urllib.request.urlopen(url).read()
type(data)

And it returns bytes at the moment 它现在返回字节

Then I want to read it letter by letter 然后我想逐字阅读

for letter in data:
    type(letter)

It returns me that letter is an integer right now. 它返回我字母现在是整数。 Why it was a byte and now it is an integer? 为什么是一个字节，现在是整数？ PS I understand that the integer that I am getting represents a decimal representation of the character. PS我知道我得到的整数表示字符的十进制表示形式。 But this jumping back and forth makes me confused. 但是这种来回跳跃让我感到困惑。

PS I also couldn't find official documentation for for-loop. PS我也找不到for循环的官方文档。 Is there one? 有一个吗？

Thank you. 谢谢。

Answer 1

Decoding the data as Padraic Cunningham suggested should work: 按照Padraic Cunningham的建议解码数据应该可以：

data = urllib.request.urlopen(url).read().decode("utf-8")

You also asked for the official documentatio for the for-loop. 您还要求for循环的正式文档。 I'm not sure if you refer to this or you are talking about the iteration behaviour of data . 我不确定您是否提及此信息，还是在谈论data的迭代行为。

The iteration behaviour of a bytes is as stated here : 一个的迭代行为bytes作为说明在这里：

Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1) 由于字节对象是整数序列（类似于元组），因此对于字节对象b，b [0]将是整数，而b [0：1]将是长度为1的字节对象。（这与文本相反字符串，其中索引和切片都会产生长度为1的字符串）

Not enough rep to post it as a comment to the previous answer, I'm sorry. 很抱歉，没有足够的代表将其发布为对先前答案的评论。

Answer 2

You need to decode the bytes to str : 您需要将字节解码为str ：

In [12]: data = urllib.request.urlopen("http://stackoverflow.com/questions/38014233/data-types-and-documentation-for-for-loop-in-python-3/38014292#38014292").read()

In [13]: type(data)
Out[13]: bytes

In [14]: type(data.decode("utf-8"))
Out[14]: str

In [15]: data[0]
Out[15]: 60

In [16]: data.decode("utf-8")[0]
Out[16]: '<'

After decoding you will get see the characters when you loop and print. 解码后，您在循环打印时会看到字符。 urllib.request.urlopen(url).read() returns bytes , it is up to you to decode the bytes into a str. urllib.request.urlopen(url).read()返回bytes ，这取决于您将这些字节解码为str。

Python 3中for循环的数据类型和文档

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-06-24 13:17:47

解决方案2
1 2016-06-24 13:10:42

Python 3中for循环的数据类型和文档

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-06-24 13:17:47

解决方案2 1 2016-06-24 13:10:42

解决方案1
3 已采纳 2016-06-24 13:17:47

解决方案2
1 2016-06-24 13:10:42