简体   繁体   English

Python字符串编码和解码

[英]Python string encode and decode

Encoding in JS means converting a string with special characters to escaped usable string. JS 中的编码意味着将带有特殊字符的字符串转换为转义的可用字符串。 like : encodeURIComponent would convert spaces to %20 etc to be usable in URIs.如:encodeURIComponent 会将空格转换为 %20 等以在 URI 中可用。

So encoding here means converting to a particular format.所以这里的编码意味着转换为特定的格式。

In Python 2.7, I have a string : 奥多比.在 Python 2.7 中,我有一个字符串:奥多比。 To convert it into UTF-8 format, however, I need to use decode() function.但是,要将其转换为 UTF-8 格式,我需要使用 decode() 函数。 Like: "奥多比".decode("utf-8") == u'\奥\多\比'如: "奥多比".decode("utf-8") == u'\奥\多\比'

I want to understand how the meaning of encode and decode is changing with language.我想了解编码和解码的含义是如何随着语言而变化的。 To me essentially I should be doing "奥多比".encode("utf-8")对我来说基本上我应该做"奥多比".encode("utf-8")

What am I missing here.我在这里错过了什么。

You appear to be confusing Unicode text (represented in Python 2 as the unicode type, indicated by the u prefix on the literal syntax), with one of the standard Unicode encodings, UTF-8.您似乎将Unicode 文本(在 Python 2 中表示为unicode类型,由文字语法上的u前缀表示)与标准 Unicode 编码之一 UTF-8 混淆。

You are not creating UTF-8, you created a Unicode text object, by decoding from a UTF-8 byte stream.您不是在创建 UTF-8,而是通过从 UTF-8 字节流解码来创建 Unicode 文本对象。

The byte string literal `"奥多比"' is a sequence of binary data, bytes.字节串文字“奥多比”是一个二进制数据字节序列。 You either entered these in a text editor and saved the file as UTF-8 (and told Python to treat your source code as UTF-8 by starting the file with a PEP 263 codec header ), or you typed it into the Python interactive prompt in a terminal that was configured to send UTF-8 data.您要么在文本编辑器中输入这些内容并将文件保存为 UTF-8(并告诉 Python 通过使用PEP 263 编解码器标头启动文件来将您的源代码视为 UTF-8),或者您将其输入到 Python 交互式提示中在配置为发送 UTF-8 数据的终端中。

I strongly urge you to read more about the difference between bytes, codecs and Unicode text.我强烈建议您阅读更多有关字节、编解码器和 Unicode 文本之间差异的信息。 The following links are highly recommended:强烈推荐以下链接:

In Python v2, it's type str , ie sequence of bytes.在 Python v2 中,它的类型是str ,即字节序列。 To convert it to a Unicode string, you need to decode this sequence of bytes using a codec .要将其转换为 Unicode 字符串,您需要使用编解码器解码此字节序列。 Simply said, it specifies how should bytes be converted to a sequence of Unicode code points.简单地说,它指定了如何将字节转换为 Unicode 代码点序列。 Look into Unicode HOWTO for more in-depth article on this.查看Unicode HOWTO以获得更深入的文章。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM