简体   繁体   English

使用str()将字节转换为字符串会返回带有语音标记的字符串

[英]Converting bytes to string with str() returns string with speech marks

Say I have a variable containing bytes: 说我有一个包含字节的变量:

>>> a = b'Hello World'

It can be verified with: 可以通过以下方式验证:

>>> type(a)
<class 'bytes'>

Now I try and convert a into a string with str() : 现在,我尝试使用str()将a转换为字符串:

>>> b = str(a)

and sure enough it is a string: 并且肯定是一个字符串:

>>> type(b)
<class 'str'>

Now I try and print b but I get a totally unexpected result: 现在我尝试打印b但是得到了完全意外的结果:

>>> print(b)
b'Hello World'

It returns a string, as I would expect but also it keeps the b (byte symbol) and the ' (quotation marks). 就像我期望的那样,它返回一个字符串,但它也保留b (字节符号)和' (引号)。

Why does it do this, and not just print the message between the quotation marks? 为什么这样做,而不仅仅是在引号之间打印消息?

Don't think of a bytes value as a string in some default 8-bit encoding. 在某些默认的8位编码中,不要将bytes值视为字符串。 It's just binary data. 这只是二进制数据。 As such, str(a) returns an encoding-agnostic string to represent the value of the byte string. 这样, str(a)返回一个与编码无关的字符串,以表示字节字符串的值。 If you want 'Hello World' , be specific and decode the value. 如果要使用'Hello World' ,请明确说明并解码该值。

>>> b = a.decode()
>>> type(b)
>>> str
>>> print(b)
Hello World

In Python 2, the distinction between bytes and text was blurred. 在Python 2中,字节和文本之间的区别变得模糊。 Python 3 went to great lengths to separate the two: bytes for binary data, and str for readable text. Python 3竭尽全力将两者分开: bytes表示二进制数据,而str表示可读文本。

For another perspective, compare 从另一个角度来看,比较

>>> list("Hello")
['H', 'e', 'l', 'l', 'o']

with

>>> list(b"Hello")
[72, 101, 108, 108, 111]

What str(b) does here is convert bytes to a string by trying to call thing.__str__ , which fails because bytes have no __str__ and then falling back to __repr__ , which returns the string required to create this object in the repl. str(b)所做的是通过尝试调用thing.__str__将字节转换为字符串,这失败,因为字节没有__str__ ,然后回__repr__ ,后者返回在repl中创建此对象所需的字符串。

If you think about it, just converting bytes to a str makes little sense, as you need to know the encoding. 如果您考虑一下,将bytes转换为str毫无意义,因为您需要了解编码。 You can use bytes.decode(encoding) to convert bytes to str properly. 您可以使用bytes.decode(encoding)bytes正确转换为str

b.decode("utf-8")

The encoding can also be left empty, in which case a default (likely utf-8) will be chosen. 编码也可以保留为空,在这种情况下,将选择默认值(可能是utf-8)。

str usually transforms an object into a string that represents it. str通常将对象转换为表示该对象的字符串。 There is no better representation than b'contains' of a bytes object. 没有比b个对象包含字节对象更好的表示形式了。 You probably want to use decode , where you also specify encoding of the bytes object, that should be used when transforming to string 您可能要使用decode ,在此您还指定bytes对象的编码,在转换为字符串时应使用

In Python 3.x, when you type-cast byte string using str(s) , it creates a new string as b'Hello World' (keeping the "b" denoting byte string at the start) . 在Python 3.x中,当您使用str(s)输入字节字符串时,它会创建一个新字符串作为b'Hello World' (在开头保留"b"表示字节字符串) It is because byte-string doesn't have a __str__ function defined. 这是因为字节字符串没有定义__str__函数。 Hence, it makes the call to __repr__ which returns the same string which byte used for the representation of it's object values (ie string preceded by "b"). 因此,它对__repr__进行调用,该调用返回与用于表示其对象值的字节相同的字符串(即,字符串前面__repr__ “ b”)。 For example: 例如:

>>> a = b'Hello World'
>>> str(a)
"b'Hello World'"

There are two ways to convert byte-like object to string. 有两种方法可以将类似字节的对象转换为字符串。 For example: 例如:

  1. Decode byte-string to string : You can decode your byte-string a to string as: 将字节字符串解码为字符串 :您可以将字节字符串a decode为字符串:

     >>> a.decode() 'Hello World' 
  2. Convert byte-string to utf-8 string as: 将字节字符串转换为utf-8字符串,如下所示:

     >>> str(a, 'utf-8') 'Hello World' 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM