简体   繁体   中英

How do I convert a string into bytes in python?

In my code, I encode a string with utf-8. I get the output, convert it to a string, and send it to my other program. The other program gets this string, but, when I try to decode the string, it gives me an error, AttributeError: 'str' object has no attribute 'decode'. I need to send the encoded data as a string because my other program receives it in a json. My first program is in python 3, and the other program is in python 2.

# my first program
x = u"宇宙"
x = str(x.encode('utf-8'))


# my other program
text = x.decode('utf-8')
print(text)

What should I do to convert the string received by the second program to bytes so the decode works?

The most important part to properly answer this is the information on how you pass these objetcts to the Python2 program: you are using JSON.

So, stay with me:

After you do the .encode step in program 1, you have a bytes object. By calling str(...) on it, you are just putting a escaping layer on this bytes object, and turning it back to a string - but when this string is written as is to a file, or transmited over the network, it will be encoded again - any non-ASCII tokens are usually escaped with the \\u\u003c/code> prefix and the codepoint for each character - but the original Chinese chracters themselves are now encoded in utf-8 and doubly-escaped.

Python's JSON load methods already decode the contents of json data into text-strings: so a decode method is not to be expected at all.

In short : to pass data around, simply encode your original text as JSON in the first program, and do not botter with any decoding after json.load on the target Python 2 program:

# my first program
x = "宇宙"
# No str-encode-decode dance needed here.
...
data =  json.dumps({"example_key": x, ...})
# code to transmit json string by network or file as it is...


# my other program
text = json.loads(data)["example_key"]
# text is a Unicode text string ready to be used!

As you are doing, you are probably gettint the text doubly-encoded - I will mimick it on the Python 3 console. I will print the result from each step so you can undestand the transforms that are taking place.

In [1]: import json

In [2]: x = "宇宙"

In [3]: print(x.encode("utf-8"))
b'\xe5\xae\x87\xe5\xae\x99'

In [4]: text = str(x.encode("utf-8"))

In [5]: print(text)
b'\xe5\xae\x87\xe5\xae\x99'

In [6]: json_data = json.dumps(text)

In [7]: print(json_data)
"b'\\xe5\\xae\\x87\\xe5\\xae\\x99'"
# as you can see, it is doubly escaped, and it is mostly useless in this form

In [8]: recovered_from_json = json.loads(json_data)

In [9]: print(recovered_from_json)
b'\xe5\xae\x87\xe5\xae\x99'

In [10]: print(repr(recovered_from_json))
"b'\\xe5\\xae\\x87\\xe5\\xae\\x99'"

In [11]: # and if you have data like this in files/databases you need to recover:

In [12]: import ast

In [13]: recovered_text = ast.literal_eval(recovered_from_json).decode("utf-8")

In [14]: print(recovered_text)
宇宙

Mainly you are dealing with two different python version and it has the library issue.

six library solve this issue.

Six provides simple utilities for wrapping over differences between Python 2 and Python 3. It is intended to support codebases that work on both Python 2 and 3 without modification.

use this library and decode in this way.

import six

def bytes_to_str(s, encoding='utf-8'):
    """Returns a str if a bytes object is given."""
    if six.PY2 and isinstance(s, bytes):
        return s.decode(encoding)
    return s

text = bytes_to_str(x)
print(text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM