简体   繁体   中英

Python string encode and decode

Encoding in JS means converting a string with special characters to escaped usable string. like : encodeURIComponent would convert spaces to %20 etc to be usable in URIs.

So encoding here means converting to a particular format.

In Python 2.7, I have a string : 奥多比. To convert it into UTF-8 format, however, I need to use decode() function. Like: "奥多比".decode("utf-8") == u'\奥\多\比'

I want to understand how the meaning of encode and decode is changing with language. To me essentially I should be doing "奥多比".encode("utf-8")

What am I missing here.

You appear to be confusing Unicode text (represented in Python 2 as the unicode type, indicated by the u prefix on the literal syntax), with one of the standard Unicode encodings, UTF-8.

You are not creating UTF-8, you created a Unicode text object, by decoding from a UTF-8 byte stream.

The byte string literal `"奥多比"' is a sequence of binary data, bytes. You either entered these in a text editor and saved the file as UTF-8 (and told Python to treat your source code as UTF-8 by starting the file with a PEP 263 codec header ), or you typed it into the Python interactive prompt in a terminal that was configured to send UTF-8 data.

I strongly urge you to read more about the difference between bytes, codecs and Unicode text. The following links are highly recommended:

In Python v2, it's type str , ie sequence of bytes. To convert it to a Unicode string, you need to decode this sequence of bytes using a codec . Simply said, it specifies how should bytes be converted to a sequence of Unicode code points. Look into Unicode HOWTO for more in-depth article on this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM