简体   繁体   English

如何在python中将字符串转换为字节?

[英]How do I convert a string into bytes in python?

In my code, I encode a string with utf-8. 在我的代码中,我使用utf-8编码了一个字符串。 I get the output, convert it to a string, and send it to my other program. 我得到输出,将其转换为字符串,然后将其发送到其他程序。 The other program gets this string, but, when I try to decode the string, it gives me an error, AttributeError: 'str' object has no attribute 'decode'. 另一个程序获取此字符串,但是,当我尝试对该字符串进行解码时,它给我一个错误,AttributeError:'str'对象没有属性'decode'。 I need to send the encoded data as a string because my other program receives it in a json. 我需要将编码后的数据作为字符串发送,因为我的其他程序在json中接收到它。 My first program is in python 3, and the other program is in python 2. 我的第一个程序在python 3中,另一个程序在python 2中。

# my first program
x = u"宇宙"
x = str(x.encode('utf-8'))


# my other program
text = x.decode('utf-8')
print(text)

What should I do to convert the string received by the second program to bytes so the decode works? 我应该怎么做才能将第二个程序收到的字符串转换为字节,以便解码工作?

The most important part to properly answer this is the information on how you pass these objetcts to the Python2 program: you are using JSON. 正确回答此问题的最重要部分是有关如何将这些对象传递给Python2程序的信息:您正在使用JSON。

So, stay with me: 所以,和我在一起:

After you do the .encode step in program 1, you have a bytes object. 在程序1中执行.encode步骤后,便有了一个byte对象。 By calling str(...) on it, you are just putting a escaping layer on this bytes object, and turning it back to a string - but when this string is written as is to a file, or transmited over the network, it will be encoded again - any non-ASCII tokens are usually escaped with the \\u\u003c/code> prefix and the codepoint for each character - but the original Chinese chracters themselves are now encoded in utf-8 and doubly-escaped. 通过在其上调用str(...) ,您只是在该字节对象上放了一个转义层,然后将其转回一个字符串-但是当此字符串按原样写入文件或通过网络传输时,它将将会再次编码-所有非ASCII令牌通常都使用\\u\u003c/code>前缀和每个字符的代码点进行转义-但原始的中文字符本身现在已以utf-8编码并进行了双转义。

Python's JSON load methods already decode the contents of json data into text-strings: so a decode method is not to be expected at all. Python的JSON加载方法已经将json数据的内容解码为文本字符串:因此完全不需要解码方法。

In short : to pass data around, simply encode your original text as JSON in the first program, and do not botter with any decoding after json.load on the target Python 2 program: 简而言之 :要传递数据,只需在第一个程序中将原始文本编码为JSON,并且在目标Python 2程序上的json.load之后不要进行任何解码:

# my first program
x = "宇宙"
# No str-encode-decode dance needed here.
...
data =  json.dumps({"example_key": x, ...})
# code to transmit json string by network or file as it is...


# my other program
text = json.loads(data)["example_key"]
# text is a Unicode text string ready to be used!

As you are doing, you are probably gettint the text doubly-encoded - I will mimick it on the Python 3 console. 在执行操作时,您可能会得到双倍编码的文本-我将在Python 3控制台上对其进行模仿。 I will print the result from each step so you can undestand the transforms that are taking place. 我将打印每个步骤的结果,以便您不了解正在发生的转换。

In [1]: import json

In [2]: x = "宇宙"

In [3]: print(x.encode("utf-8"))
b'\xe5\xae\x87\xe5\xae\x99'

In [4]: text = str(x.encode("utf-8"))

In [5]: print(text)
b'\xe5\xae\x87\xe5\xae\x99'

In [6]: json_data = json.dumps(text)

In [7]: print(json_data)
"b'\\xe5\\xae\\x87\\xe5\\xae\\x99'"
# as you can see, it is doubly escaped, and it is mostly useless in this form

In [8]: recovered_from_json = json.loads(json_data)

In [9]: print(recovered_from_json)
b'\xe5\xae\x87\xe5\xae\x99'

In [10]: print(repr(recovered_from_json))
"b'\\xe5\\xae\\x87\\xe5\\xae\\x99'"

In [11]: # and if you have data like this in files/databases you need to recover:

In [12]: import ast

In [13]: recovered_text = ast.literal_eval(recovered_from_json).decode("utf-8")

In [14]: print(recovered_text)
宇宙

Mainly you are dealing with two different python version and it has the library issue. 主要是您正在处理两个不同的python版本,并且它具有库问题。

six library solve this issue. 库解决了这个问题。

Six provides simple utilities for wrapping over differences between Python 2 and Python 3. It is intended to support codebases that work on both Python 2 and 3 without modification. 六个提供了用于包装Python 2和Python 3之间差异的简单实用程序。它旨在支持无需修改即可在Python 2和3上运行的代码库。

use this library and decode in this way. 使用该库并以这种方式解码。

import six

def bytes_to_str(s, encoding='utf-8'):
    """Returns a str if a bytes object is given."""
    if six.PY2 and isinstance(s, bytes):
        return s.decode(encoding)
    return s

text = bytes_to_str(x)
print(text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM