简体   繁体   English


[英]Converting bytes to string with str() returns string with speech marks

Say I have a variable containing bytes: 说我有一个包含字节的变量:

>>> a = b'Hello World'

It can be verified with: 可以通过以下方式验证:

>>> type(a)
<class 'bytes'>

Now I try and convert a into a string with str() : 现在,我尝试使用str()将a转换为字符串:

>>> b = str(a)

and sure enough it is a string: 并且肯定是一个字符串:

>>> type(b)
<class 'str'>

Now I try and print b but I get a totally unexpected result: 现在我尝试打印b但是得到了完全意外的结果:

>>> print(b)
b'Hello World'

It returns a string, as I would expect but also it keeps the b (byte symbol) and the ' (quotation marks). 就像我期望的那样,它返回一个字符串,但它也保留b (字节符号)和' (引号)。

Why does it do this, and not just print the message between the quotation marks? 为什么这样做,而不仅仅是在引号之间打印消息?

Don't think of a bytes value as a string in some default 8-bit encoding. 在某些默认的8位编码中,不要将bytes值视为字符串。 It's just binary data. 这只是二进制数据。 As such, str(a) returns an encoding-agnostic string to represent the value of the byte string. 这样, str(a)返回一个与编码无关的字符串,以表示字节字符串的值。 If you want 'Hello World' , be specific and decode the value. 如果要使用'Hello World' ,请明确说明并解码该值。

>>> b = a.decode()
>>> type(b)
>>> str
>>> print(b)
Hello World

In Python 2, the distinction between bytes and text was blurred. 在Python 2中,字节和文本之间的区别变得模糊。 Python 3 went to great lengths to separate the two: bytes for binary data, and str for readable text. Python 3竭尽全力将两者分开: bytes表示二进制数据,而str表示可读文本。

For another perspective, compare 从另一个角度来看,比较

>>> list("Hello")
['H', 'e', 'l', 'l', 'o']


>>> list(b"Hello")
[72, 101, 108, 108, 111]

What str(b) does here is convert bytes to a string by trying to call thing.__str__ , which fails because bytes have no __str__ and then falling back to __repr__ , which returns the string required to create this object in the repl. str(b)所做的是通过尝试调用thing.__str__将字节转换为字符串,这失败,因为字节没有__str__ ,然后回__repr__ ,后者返回在repl中创建此对象所需的字符串。

If you think about it, just converting bytes to a str makes little sense, as you need to know the encoding. 如果您考虑一下,将bytes转换为str毫无意义,因为您需要了解编码。 You can use bytes.decode(encoding) to convert bytes to str properly. 您可以使用bytes.decode(encoding)bytes正确转换为str


The encoding can also be left empty, in which case a default (likely utf-8) will be chosen. 编码也可以保留为空,在这种情况下,将选择默认值(可能是utf-8)。

str usually transforms an object into a string that represents it. str通常将对象转换为表示该对象的字符串。 There is no better representation than b'contains' of a bytes object. 没有比b个对象包含字节对象更好的表示形式了。 You probably want to use decode , where you also specify encoding of the bytes object, that should be used when transforming to string 您可能要使用decode ,在此您还指定bytes对象的编码,在转换为字符串时应使用

In Python 3.x, when you type-cast byte string using str(s) , it creates a new string as b'Hello World' (keeping the "b" denoting byte string at the start) . 在Python 3.x中,当您使用str(s)输入字节字符串时,它会创建一个新字符串作为b'Hello World' (在开头保留"b"表示字节字符串) It is because byte-string doesn't have a __str__ function defined. 这是因为字节字符串没有定义__str__函数。 Hence, it makes the call to __repr__ which returns the same string which byte used for the representation of it's object values (ie string preceded by "b"). 因此,它对__repr__进行调用,该调用返回与用于表示其对象值的字节相同的字符串(即,字符串前面__repr__ “ b”)。 For example: 例如:

>>> a = b'Hello World'
>>> str(a)
"b'Hello World'"

There are two ways to convert byte-like object to string. 有两种方法可以将类似字节的对象转换为字符串。 For example: 例如:

  1. Decode byte-string to string : You can decode your byte-string a to string as: 将字节字符串解码为字符串 :您可以将字节字符串a decode为字符串:

     >>> a.decode() 'Hello World' 
  2. Convert byte-string to utf-8 string as: 将字节字符串转换为utf-8字符串,如下所示:

     >>> str(a, 'utf-8') 'Hello World' 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM