[英]Converting bytes to string with str() returns string with speech marks
Say I have a variable containing bytes: 说我有一个包含字节的变量:
>>> a = b'Hello World'
It can be verified with: 可以通过以下方式验证:
>>> type(a)
<class 'bytes'>
Now I try and convert a into a string with str()
: 现在,我尝试使用
str()
将a转换为字符串:
>>> b = str(a)
and sure enough it is a string: 并且肯定是一个字符串:
>>> type(b)
<class 'str'>
Now I try and print b
but I get a totally unexpected result: 现在我尝试打印
b
但是得到了完全意外的结果:
>>> print(b)
b'Hello World'
It returns a string, as I would expect but also it keeps the b
(byte symbol) and the '
(quotation marks). 就像我期望的那样,它返回一个字符串,但它也保留
b
(字节符号)和'
(引号)。
Why does it do this, and not just print the message between the quotation marks? 为什么这样做,而不仅仅是在引号之间打印消息?
Don't think of a bytes
value as a string in some default 8-bit encoding. 在某些默认的8位编码中,不要将
bytes
值视为字符串。 It's just binary data. 这只是二进制数据。 As such,
str(a)
returns an encoding-agnostic string to represent the value of the byte string. 这样,
str(a)
返回一个与编码无关的字符串,以表示字节字符串的值。 If you want 'Hello World'
, be specific and decode the value. 如果要使用
'Hello World'
,请明确说明并解码该值。
>>> b = a.decode()
>>> type(b)
>>> str
>>> print(b)
Hello World
In Python 2, the distinction between bytes and text was blurred. 在Python 2中,字节和文本之间的区别变得模糊。 Python 3 went to great lengths to separate the two:
bytes
for binary data, and str
for readable text. Python 3竭尽全力将两者分开:
bytes
表示二进制数据,而str
表示可读文本。
For another perspective, compare 从另一个角度来看,比较
>>> list("Hello")
['H', 'e', 'l', 'l', 'o']
with 与
>>> list(b"Hello")
[72, 101, 108, 108, 111]
What str(b)
does here is convert bytes to a string by trying to call thing.__str__
, which fails because bytes have no __str__
and then falling back to __repr__
, which returns the string required to create this object in the repl. str(b)
所做的是通过尝试调用thing.__str__
将字节转换为字符串,这失败,因为字节没有__str__
,然后回__repr__
,后者返回在repl中创建此对象所需的字符串。
If you think about it, just converting bytes
to a str
makes little sense, as you need to know the encoding. 如果您考虑一下,将
bytes
转换为str
毫无意义,因为您需要了解编码。 You can use bytes.decode(encoding)
to convert bytes
to str
properly. 您可以使用
bytes.decode(encoding)
将bytes
正确转换为str
。
b.decode("utf-8")
The encoding can also be left empty, in which case a default (likely utf-8) will be chosen. 编码也可以保留为空,在这种情况下,将选择默认值(可能是utf-8)。
str
usually transforms an object into a string that represents it. str
通常将对象转换为表示该对象的字符串。 There is no better representation than b'contains' of a bytes object. 没有比b个对象包含字节对象更好的表示形式了。 You probably want to use
decode
, where you also specify encoding of the bytes object, that should be used when transforming to string 您可能要使用
decode
,在此您还指定bytes对象的编码,在转换为字符串时应使用
In Python 3.x, when you type-cast byte string using str(s)
, it creates a new string as b'Hello World'
(keeping the "b"
denoting byte string at the start) . 在Python 3.x中,当您使用
str(s)
输入字节字符串时,它会创建一个新字符串作为b'Hello World'
(在开头保留"b"
表示字节字符串) 。 It is because byte-string doesn't have a __str__
function defined. 这是因为字节字符串没有定义
__str__
函数。 Hence, it makes the call to __repr__
which returns the same string which byte used for the representation of it's object values (ie string preceded by "b"). 因此,它对
__repr__
进行调用,该调用返回与用于表示其对象值的字节相同的字符串(即,字符串前面__repr__
“ b”)。 For example: 例如:
>>> a = b'Hello World'
>>> str(a)
"b'Hello World'"
There are two ways to convert byte-like object to string. 有两种方法可以将类似字节的对象转换为字符串。 For example:
例如:
Decode byte-string to string : You can decode
your byte-string a
to string as: 将字节字符串解码为字符串 :您可以将字节字符串
a
decode
为字符串:
>>> a.decode() 'Hello World'
Convert byte-string to utf-8
string as: 将字节字符串转换为
utf-8
字符串,如下所示:
>>> str(a, 'utf-8') 'Hello World'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.