简体   繁体   English

Python 3中for循环的数据类型和文档

[英]Data types and documentation for for-loop in Python 3

I am very confused about the data types and UTF-8 encoding. 我对数据类型和UTF-8编码感到困惑。 What is actually happening under the hood? 实际情况是什么? I am reading a messy JSON data without delimiters in Python 3 (data has Japanese/Chinese characters time to time). 我正在Python 3中读取不带分隔符的凌乱JSON数据(数据有时会包含日语/中文字符)。

I am reading in the data: 我正在读取数据:

url = "http://localhost:8001"
data = urllib.request.urlopen(url).read()
type(data)

And it returns bytes at the moment 它现在返回字节

Then I want to read it letter by letter 然后我想逐字阅读

for letter in data:
    type(letter)

It returns me that letter is an integer right now. 它返回我字母现在是整数 Why it was a byte and now it is an integer? 为什么是一个字节,现在是整数? PS I understand that the integer that I am getting represents a decimal representation of the character. PS我知道我得到的整数表示字符的十进制表示形式。 But this jumping back and forth makes me confused. 但是这种来回跳跃让我感到困惑。

PS I also couldn't find official documentation for for-loop. PS我也找不到for循环的官方文档。 Is there one? 有一个吗?

Thank you. 谢谢。

Decoding the data as Padraic Cunningham suggested should work: 按照Padraic Cunningham的建议解码数据应该可以:

data = urllib.request.urlopen(url).read().decode("utf-8")

You also asked for the official documentatio for the for-loop. 您还要求for循环的正式文档。 I'm not sure if you refer to this or you are talking about the iteration behaviour of data . 我不确定您是否提及此信息,还是在谈论data的迭代行为。

The iteration behaviour of a bytes is as stated here : 一个的迭代行为bytes作为说明在这里

Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1) 由于字节对象是整数序列(类似于元组),因此对于字节对象b,b [0]将是整数,而b [0:1]将是长度为1的字节对象。(这与文本相反字符串,其中索引和切片都会产生长度为1的字符串)

Not enough rep to post it as a comment to the previous answer, I'm sorry. 很抱歉,没有足够的代表将其发布为对先前答案的评论。

You need to decode the bytes to str : 您需要将字节解码为str

In [12]: data = urllib.request.urlopen("http://stackoverflow.com/questions/38014233/data-types-and-documentation-for-for-loop-in-python-3/38014292#38014292").read()

In [13]: type(data)
Out[13]: bytes

In [14]: type(data.decode("utf-8"))
Out[14]: str

In [15]: data[0]
Out[15]: 60

In [16]: data.decode("utf-8")[0]
Out[16]: '<'

After decoding you will get see the characters when you loop and print. 解码后,您在循环打印时会看到字符。 urllib.request.urlopen(url).read() returns bytes , it is up to you to decode the bytes into a str. urllib.request.urlopen(url).read()返回bytes ,这取决于您将这些字节解码为str。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM