简体   繁体   English

Python 2.6 和 3 中的字节与字节数组

[英]bytes vs bytearray in Python 2.6 and 3

I'm experimenting with bytes vs bytearray in Python 2.6.我正在 Python 2.6 中试验bytesbytes bytearray I don't understand the reason for some differences.我不明白某些差异的原因。

A bytes iterator returns strings: bytes迭代器返回字符串:

for i in bytes(b"hi"):
    print(type(i))

Gives:给出:

<type 'str'>
<type 'str'>

But a bytearray iterator returns int s:但是bytearray迭代器返回int s:

for i in bytearray(b"hi"):
    print(type(i))

Gives:给出:

<type 'int'>
<type 'int'>

Why the difference?为什么会有差异?

I'd like to write code that will translate well into Python 3. So, is the situation the same in Python 3?我想编写能够很好地转换为 Python 3 的代码。那么,Python 3 中的情况是否相同?

For (at least) Python 3.7对于(至少)Python 3.7

According to the docs:根据文档:

bytes objects are immutable sequences of single bytes bytes对象是不可变的单字节序列

bytearray objects are a mutable counterpart to bytes objects. bytearray对象是 bytes 对象的可变对应物。

And that's pretty much it as far as bytes vs bytearray .bytesbytearray In fact, they're fairly interchangeable and designed to flexible enough to be mixed in operations without throwing errors.事实上,它们是相当可互换的,并且设计得足够灵活,可以在操作中混合使用而不会引发错误。 In fact, there is a whole section in the official documentation dedicated to showing the similarities between the bytes and bytearray apis.事实上, 官方文档中有一个完整的部分专门用于展示bytesbytearray apis 之间的相似之处。

Some clues as to why from the docs:关于原因的一些线索来自文档:

Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways.由于许多主要的二进制协议都基于 ASCII 文本编码,因此字节对象提供了几种方法,这些方法仅在处理 ASCII 兼容数据时才有效,并以各种其他方式与字符串对象密切相关。

In Python 2.6 bytes is merely an alias for str .在 Python 中,2.6 字节只是 str 的别名
This "pseudo type" was introduced to [partially] prepare programs [and programmers!] to be converted/compatible with Python 3.0 where there is a strict distinction of semantics and use for str (which are systematically unicode) and bytes (which are arrays of octets, for storing data, but not text)这种“伪类型”被引入 [部分] 准备程序 [和程序员!] 要转换/兼容 Python 3.0,其中严格区分语义和使用 str(系统地是 unicode)和字节(这是数组八位字节,用于存储数据,但不是文本)

Similarly the b prefix for string literals is ineffective in 2.6, but it is a useful marker in the program, which flags explicitly the intent of the programmer to have the string as a data string rather than a text string.类似地,字符串字面量的 b 前缀在 2.6 中是无效的,但它在程序中是一个有用的标记,它明确标记了程序员将字符串作为数据字符串而不是文本字符串的意图。 This info can then be used by the 2to3 converter or similar utilities when the program is ported to Py3k.当程序被移植到 Py3k 时,这个信息可以被 2to3 转换器或类似的实用程序使用。

You may want to check this SO Question for additional info.您可能需要查看此SO 问题以获取更多信息。

TL;DR TL; 博士

python2.6+ bytes = python2.6+ str = python3.x bytes != python3.x str python2.6+ bytes = python2.6+ str = python3.x bytes != python3.x str

python2.6+ bytearray = python3.x bytearray python2.6+ bytearray = python3.x bytearray

python2.x unicode = python3.x str python2.x unicode = python3.x str

Long Answer长答案

bytes and str have changed meaning in python since python 3.x.从 python 3.x 开始, bytesstr在 python 中的含义发生了变化。

First to answer your question shortly , in python 2.6 bytes(b"hi") is an immutable array of bytes (8-bits or octets).首先简短地回答你的问题,在 python 中 2.6 bytes(b"hi")是一个不可变的字节数组(8 位或八位字节)。 So the type of each byte is simply byte , which is the same as str in python 2.6+ (However, this is not the case in python 3.x)所以每个byte的类型只是byte ,这与 python 2.6+ 中的str相同(但是,在 python 3.x 中不是这种情况)

bytearray(b"hi") is again a mutable array of bytes. bytearray(b"hi")又是一个可变的字节数组。 But when you ask its type, it's an int , because python represents each element of bytearray as an integer in range 0-255 (all possible values for an 8-bit integer).但是当你询问它的类型时,它是一个int ,因为 python 将bytearray每个元素表示为一个 0-255 范围内的整数(8 位整数的所有可能值)。 However, an element of bytes array is represented as an ASCII value of that byte.但是, bytes数组的元素表示为该字节的 ASCII 值。

For example, consider in Python 2.6+例如,在Python 2.6+ 中考虑

>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0] # python shows you an int value for the 8 bits 0110 1000
104 
>>> bs[0] # python shows you an ASCII value for the 8 bits 0110 1000
'h'
>>> chr(barr[0]) # chr converts 104 to its corresponding ASCII value
'h'
>>> bs[0]==chr(barr[0]) # python compares ASCII value of 1st byte of bs and ASCII value of integer represented by first byte of barr
True

Now python 3.x is an entirely different story.现在 python 3.x 是一个完全不同的故事。 As you might have suspected, it is weird why an str literal would mean a byte in python2.6+.正如您可能怀疑的那样,奇怪的是为什么str文字在 python2.6+ 中意味着一个byte Well this answer explains that那么这个答案解释了

In Python 3.x, an str is a Unicode text (which was previously just an array of bytes, note that Unicode and bytes are two completely different things).在 Python 3.x 中, str是一个 Unicode 文本(以前只是一个字节数组,请注意 Unicode 和字节是两个完全不同的东西)。 bytearray is a mutable array of bytes while bytes is an immutable array of bytes. bytearray是一个可变的字节数组,而bytes是一个不可变的字节数组。 They both have almost the same functions.它们的功能几乎相同。 Now if I run the above same code again in python 3.x, here is the result.现在,如果我在 python 3.x 中再次运行上述相同的代码,结果如下。 In Python 3.xPython 3.x 中

>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0]
104
>>> bs[0]
104
>>> bs[0]==barr[0] # bytes and bytearray are same thing in python 3.x
True

bytes and bytearray are the same things in python 3.x, except for there mutability. bytesbytearray在 python 3.x 中是相同的东西,除了可变性。

What happened to str you might ask?你可能会问str怎么了? str in python 3 got converted to what unicode was in python 2, and unicode type was subsequently removed from python 3 as it was redundant. str在Python 3得到转化成什么unicode是在Python 2,并unicode类型,随后从蟒3移除,因为它是多余的。

I'd like to write code that will translate well into Python 3. So, is the situation the same in Python 3?我想编写能够很好地转换为 Python 3 的代码。那么,Python 3 中的情况是否相同?

It depends on what you are trying to do.这取决于您要尝试做什么。 Are you dealing with bytes or are you dealing with ASCII representation of bytes?您是在处理字节还是在处理字节的 ASCII 表示?

If you are dealing with bytes , then my advice is to use bytearray in Python 2, which is the same in python 3. But you loose immutability, if that matter to you.如果您正在处理 bytes ,那么我的建议是在 Python 2 中使用bytearray ,这在 Python 3 中是相同的。但是如果这对您来说很重要,那么您失去了不变性。

If you are dealing with ASCII or text , then represent your string as u'hi' in Python 2, which has the same meaning in python 3. 'u' has special meaning in Python 2, which instructs python 2 to treat a string literal as unicode type.如果您正在处理 ASCII 或 text ,那么在 Python 2 u'hi'您的字符串表示为u'hi' ,这在 Python 3 中具有相同的含义。 'u'在 Python 2 中具有特殊含义,它指示 Python 2 处理字符串文字作为unicode类型。 'u' in python 3 as no meaning, because all string literal in Python 3 are Unicode by default (which is confusingly called str type in python 3, and unicode type in python 2). python 3中的'u'没有意义,因为Python 3中的所有字符串文字默认都是Unicode(在python 3中被混淆地称为str类型,在python 2中被称为unicode类型)。

I am not sure since which version, but bytes is actually a str , which you can see if you do type(bytes(b"hi")) -> <type 'str'> .我不确定从哪个版本开始,但bytes实际上是一个str ,如果你做type(bytes(b"hi")) -> <type 'str'> ,你可以看到它。

bytearray is a mutable array of bytes, one constructor of which takes a string. bytearray是一个可变的字节数组,它的一个构造函数接受一个字符串。

I tried it on Python 3.0.我在 Python 3.0 上试过了。

In Python 3.0, a bytes iterator returns int s, not strings as Python 2.6 did:在 Python 3.0 中, bytes迭代器返回int s,而不是像 Python 2.6 那样的字符串:

for i in bytes(b"hi"):
    print(type(i))

Gives:给出:

<class 'int'>
<class 'int'>

A bytearray iterator also returns class 'int' s. bytearray迭代器也返回class 'int' s。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM