简体   繁体   English

python的字符串是unicode字符

[英]python's string is unicode characters

What does Unicode characters in python 3 string mean? python 3字符串中的Unicode characters是什么意思?

Since Python 3.0, the language features a str type that contain Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode 从Python 3.0开始,该语言的str类型包含Unicode字符,这意味着使用“unicode rocks!”,“unicode rocks!”创建的任何字符串,或者三重引用的字符串语法都存储为Unicode

from python doc. 来自python doc。

for a string abc , does Python holds [61, 62, 63] in memory? 对于字符串abc ,Python是否在内存中保存[61,62,63]? (since a is U+0061) (因为a是U + 0061)

Does unicode character mean unicode codepoints? unicode字符是否意味着unicode代码点?

Does unicode character mean unicode codepoints? unicode字符是否意味着unicode代码点?

Yes and no. 是的,不是。 It depends on the version of python, and how it was built. 这取决于python的版本,以及它是如何构建的。

For versions 2.2 to 3.2 inclusive, python supported both narrow and wide unicode builds (see PEP-261 ). 对于2.2到3.2版本,python支持窄和宽的unicode构建(参见PEP-261 )。 On a narrow build, the unicode range is restricted to the BMP : 在狭窄的版本中,unicode范围仅限于BMP

Python 3.2.6 (default, Feb 21 2016, 12:42:00)
[GCC 5.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> sys.maxunicode
65535

and so characters outside this range have to represented as a surrogate pair : 因此,此范围之外的字符必须表示为代理项对

>>> s = '😬'
>>> ord(s)
128556
>>> len(s)
2

With the introduction of PEP-0393 , narrow builds are no longer supported in python3, and so one character is always equivalent to one code-point: 随着PEP-0393的引入,python3不再支持窄版本,因此一个字符总是等同于一个代码点:

Python 3.5.1 (default, Mar 3 2016, 09:29:07)
[GCC 5.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> sys.maxunicode
1114111
>>> s = '😬'
>>> ord(s)
128556
>>> len(s)
1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM