[英]bytes vs bytearray in Python 2.6 and 3
I'm experimenting with bytes
vs bytearray
in Python 2.6.我正在 Python 2.6 中试验
bytes
与bytes
bytearray
。 I don't understand the reason for some differences.我不明白某些差异的原因。
A bytes
iterator returns strings: bytes
迭代器返回字符串:
for i in bytes(b"hi"):
print(type(i))
Gives:给出:
<type 'str'>
<type 'str'>
But a bytearray
iterator returns int
s:但是
bytearray
迭代器返回int
s:
for i in bytearray(b"hi"):
print(type(i))
Gives:给出:
<type 'int'>
<type 'int'>
Why the difference?为什么会有差异?
I'd like to write code that will translate well into Python 3. So, is the situation the same in Python 3?我想编写能够很好地转换为 Python 3 的代码。那么,Python 3 中的情况是否相同?
For (at least) Python 3.7对于(至少)Python 3.7
bytes
objects are immutable sequences of single bytesbytes
对象是不可变的单字节序列
bytearray
objects are a mutable counterpart to bytes objects.bytearray
对象是 bytes 对象的可变对应物。
And that's pretty much it as far as bytes
vs bytearray
.就
bytes
与bytearray
。 In fact, they're fairly interchangeable and designed to flexible enough to be mixed in operations without throwing errors.事实上,它们是相当可互换的,并且设计得足够灵活,可以在操作中混合使用而不会引发错误。 In fact, there is a whole section in the official documentation dedicated to showing the similarities between the
bytes
and bytearray
apis.事实上, 官方文档中有一个完整的部分专门用于展示
bytes
和bytearray
apis 之间的相似之处。
Some clues as to why from the docs:关于原因的一些线索来自文档:
Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways.
由于许多主要的二进制协议都基于 ASCII 文本编码,因此字节对象提供了几种方法,这些方法仅在处理 ASCII 兼容数据时才有效,并以各种其他方式与字符串对象密切相关。
In Python 2.6 bytes is merely an alias for str .在 Python 中,2.6 字节只是 str 的别名。
This "pseudo type" was introduced to [partially] prepare programs [and programmers!] to be converted/compatible with Python 3.0 where there is a strict distinction of semantics and use for str (which are systematically unicode) and bytes (which are arrays of octets, for storing data, but not text)这种“伪类型”被引入 [部分] 准备程序 [和程序员!] 要转换/兼容 Python 3.0,其中严格区分语义和使用 str(系统地是 unicode)和字节(这是数组八位字节,用于存储数据,但不是文本)
Similarly the b prefix for string literals is ineffective in 2.6, but it is a useful marker in the program, which flags explicitly the intent of the programmer to have the string as a data string rather than a text string.类似地,字符串字面量的 b 前缀在 2.6 中是无效的,但它在程序中是一个有用的标记,它明确标记了程序员将字符串作为数据字符串而不是文本字符串的意图。 This info can then be used by the 2to3 converter or similar utilities when the program is ported to Py3k.
当程序被移植到 Py3k 时,这个信息可以被 2to3 转换器或类似的实用程序使用。
You may want to check this SO Question for additional info.您可能需要查看此SO 问题以获取更多信息。
TL;DR TL; 博士
python2.6+
bytes
= python2.6+str
= python3.xbytes
!= python3.xstr
python2.6+
bytes
= python2.6+str
= python3.xbytes
!= python3.xstr
python2.6+
bytearray
= python3.xbytearray
python2.6+
bytearray
= python3.xbytearray
python2.x
unicode
= python3.xstr
python2.x
unicode
= python3.xstr
Long Answer长答案
bytes
and str
have changed meaning in python since python 3.x.从 python 3.x 开始,
bytes
和str
在 python 中的含义发生了变化。
First to answer your question shortly , in python 2.6 bytes(b"hi")
is an immutable array of bytes (8-bits or octets).首先简短地回答你的问题,在 python 中 2.6
bytes(b"hi")
是一个不可变的字节数组(8 位或八位字节)。 So the type of each byte
is simply byte
, which is the same as str
in python 2.6+ (However, this is not the case in python 3.x)所以每个
byte
的类型只是byte
,这与 python 2.6+ 中的str
相同(但是,在 python 3.x 中不是这种情况)
bytearray(b"hi")
is again a mutable array of bytes. bytearray(b"hi")
又是一个可变的字节数组。 But when you ask its type, it's an int
, because python represents each element of bytearray
as an integer in range 0-255 (all possible values for an 8-bit integer).但是当你询问它的类型时,它是一个
int
,因为 python 将bytearray
每个元素表示为一个 0-255 范围内的整数(8 位整数的所有可能值)。 However, an element of bytes
array is represented as an ASCII value of that byte.但是,
bytes
数组的元素表示为该字节的 ASCII 值。
For example, consider in Python 2.6+例如,在Python 2.6+ 中考虑
>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0] # python shows you an int value for the 8 bits 0110 1000
104
>>> bs[0] # python shows you an ASCII value for the 8 bits 0110 1000
'h'
>>> chr(barr[0]) # chr converts 104 to its corresponding ASCII value
'h'
>>> bs[0]==chr(barr[0]) # python compares ASCII value of 1st byte of bs and ASCII value of integer represented by first byte of barr
True
Now python 3.x is an entirely different story.现在 python 3.x 是一个完全不同的故事。 As you might have suspected, it is weird why an
str
literal would mean a byte
in python2.6+.正如您可能怀疑的那样,奇怪的是为什么
str
文字在 python2.6+ 中意味着一个byte
。 Well this answer explains that那么这个答案解释了
In Python 3.x, an str
is a Unicode text (which was previously just an array of bytes, note that Unicode and bytes are two completely different things).在 Python 3.x 中,
str
是一个 Unicode 文本(以前只是一个字节数组,请注意 Unicode 和字节是两个完全不同的东西)。 bytearray
is a mutable array of bytes while bytes
is an immutable array of bytes. bytearray
是一个可变的字节数组,而bytes
是一个不可变的字节数组。 They both have almost the same functions.它们的功能几乎相同。 Now if I run the above same code again in python 3.x, here is the result.
现在,如果我在 python 3.x 中再次运行上述相同的代码,结果如下。 In Python 3.x
在Python 3.x 中
>>> barr=bytearray(b'hi')
>>> bs=bytes(b'hi')
>>> barr[0]
104
>>> bs[0]
104
>>> bs[0]==barr[0] # bytes and bytearray are same thing in python 3.x
True
bytes
and bytearray
are the same things in python 3.x, except for there mutability. bytes
和bytearray
在 python 3.x 中是相同的东西,除了可变性。
What happened to str
you might ask?你可能会问
str
怎么了? str
in python 3 got converted to what unicode
was in python 2, and unicode
type was subsequently removed from python 3 as it was redundant. str
在Python 3得到转化成什么unicode
是在Python 2,并unicode
类型,随后从蟒3移除,因为它是多余的。
I'd like to write code that will translate well into Python 3. So, is the situation the same in Python 3?
我想编写能够很好地转换为 Python 3 的代码。那么,Python 3 中的情况是否相同?
It depends on what you are trying to do.这取决于您要尝试做什么。 Are you dealing with bytes or are you dealing with ASCII representation of bytes?
您是在处理字节还是在处理字节的 ASCII 表示?
If you are dealing with bytes , then my advice is to use bytearray
in Python 2, which is the same in python 3. But you loose immutability, if that matter to you.如果您正在处理 bytes ,那么我的建议是在 Python 2 中使用
bytearray
,这在 Python 3 中是相同的。但是如果这对您来说很重要,那么您失去了不变性。
If you are dealing with ASCII or text , then represent your string as u'hi'
in Python 2, which has the same meaning in python 3. 'u'
has special meaning in Python 2, which instructs python 2 to treat a string literal as unicode
type.如果您正在处理 ASCII 或 text ,那么在 Python 2
u'hi'
您的字符串表示为u'hi'
,这在 Python 3 中具有相同的含义。 'u'
在 Python 2 中具有特殊含义,它指示 Python 2 处理字符串文字作为unicode
类型。 'u' in python 3 as no meaning, because all string literal in Python 3 are Unicode by default (which is confusingly called str
type in python 3, and unicode
type in python 2). python 3中的'u'没有意义,因为Python 3中的所有字符串文字默认都是Unicode(在python 3中被混淆地称为
str
类型,在python 2中被称为unicode
类型)。
I am not sure since which version, but bytes
is actually a str
, which you can see if you do type(bytes(b"hi"))
-> <type 'str'>
.我不确定从哪个版本开始,但
bytes
实际上是一个str
,如果你做type(bytes(b"hi"))
-> <type 'str'>
,你可以看到它。
bytearray
is a mutable array of bytes, one constructor of which takes a string. bytearray
是一个可变的字节数组,它的一个构造函数接受一个字符串。
I tried it on Python 3.0.我在 Python 3.0 上试过了。
In Python 3.0, a bytes
iterator returns int
s, not strings as Python 2.6 did:在 Python 3.0 中,
bytes
迭代器返回int
s,而不是像 Python 2.6 那样的字符串:
for i in bytes(b"hi"):
print(type(i))
Gives:给出:
<class 'int'>
<class 'int'>
A bytearray
iterator also returns class 'int'
s. bytearray
迭代器也返回class 'int'
s。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.