简体繁体 English

Python字符串编码和解码

[英]Python string encode and decode

原文 2018-01-08 13:11:06 1 2 python/ encoding/ utf-8

Encoding in JS means converting a string with special characters to escaped usable string. JS 中的编码意味着将带有特殊字符的字符串转换为转义的可用字符串。 like : encodeURIComponent would convert spaces to %20 etc to be usable in URIs.如：encodeURIComponent 会将空格转换为 %20 等以在 URI 中可用。

So encoding here means converting to a particular format.所以这里的编码意味着转换为特定的格式。

In Python 2.7, I have a string : 奥多比.在 Python 2.7 中，我有一个字符串：奥多比。 To convert it into UTF-8 format, however, I need to use decode() function.但是，要将其转换为 UTF-8 格式，我需要使用 decode() 函数。 Like: "奥多比".decode("utf-8") == u'\奥\多\比'如： "奥多比".decode("utf-8") == u'\奥\多\比'

I want to understand how the meaning of encode and decode is changing with language.我想了解编码和解码的含义是如何随着语言而变化的。 To me essentially I should be doing "奥多比".encode("utf-8")对我来说基本上我应该做"奥多比".encode("utf-8")

What am I missing here.我在这里错过了什么。

2 个解决方案

You appear to be confusing Unicode text (represented in Python 2 as the unicode type, indicated by the u prefix on the literal syntax), with one of the standard Unicode encodings, UTF-8.您似乎将Unicode 文本（在 Python 2 中表示为unicode类型，由文字语法上的u前缀表示）与标准 Unicode 编码之一 UTF-8 混淆。

You are not creating UTF-8, you created a Unicode text object, by decoding from a UTF-8 byte stream.您不是在创建 UTF-8，而是通过从 UTF-8 字节流解码来创建 Unicode 文本对象。

The byte string literal `"奥多比"' is a sequence of binary data, bytes.字节串文字“奥多比”是一个二进制数据字节序列。 You either entered these in a text editor and saved the file as UTF-8 (and told Python to treat your source code as UTF-8 by starting the file with a PEP 263 codec header ), or you typed it into the Python interactive prompt in a terminal that was configured to send UTF-8 data.您要么在文本编辑器中输入这些内容并将文件保存为 UTF-8（并告诉 Python 通过使用PEP 263 编解码器标头启动文件来将您的源代码视为 UTF-8），或者您将其输入到 Python 交互式提示中在配置为发送 UTF-8 数据的终端中。

I strongly urge you to read more about the difference between bytes, codecs and Unicode text.我强烈建议您阅读更多有关字节、编解码器和 Unicode 文本之间差异的信息。 The following links are highly recommended:强烈推荐以下链接：

Ned Batchelder's Pragmatic Unicode Ned Batchelder 的实用 Unicode
The Python Unicode HOWTO Python Unicode HOWTO
Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Joel Spolsky 的《每个软件开发人员绝对、肯定必须了解 Unicode 和字符集的绝对最低要求》（没有任何借口！）

In Python v2, it's type str , ie sequence of bytes.在 Python v2 中，它的类型是str ，即字节序列。 To convert it to a Unicode string, you need to decode this sequence of bytes using a codec .要将其转换为 Unicode 字符串，您需要使用编解码器解码此字节序列。 Simply said, it specifies how should bytes be converted to a sequence of Unicode code points.简单地说，它指定了如何将字节转换为 Unicode 代码点序列。 Look into Unicode HOWTO for more in-depth article on this.查看Unicode HOWTO以获得更深入的文章。