简体   繁体   English

如何确定Python是用UCS-2还是UCS-4编译的?

[英]How to find out if Python is compiled with UCS-2 or UCS-4?

Just what the title says. 正如标题所说的那样。

$ ./configure --help | grep -i ucs
  --enable-unicode[=ucs[24]]

Searching the official documentation, I found this: 搜索官方文档,我发现了这个:

sys.maxunicode : An integer giving the largest supported code point for a Unicode character. sys.maxunicode :一个整数,给出Unicode字符支持的最大代码点。 The value of this depends on the configuration option that specifies whether Unicode characters are stored as UCS-2 or UCS-4. 取决于配置选项,该选项指定Unicode字符是否存储为UCS-2或UCS-4。

What is not clear here is - which value(s) correspond to UCS-2 and UCS-4. 这里不清楚的是 - 哪些值对应于UCS-2和UCS-4。

The code is expected to work on Python 2.6+. 该代码预计适用于Python 2.6+。

When built with --enable-unicode=ucs4: 使用--enable-unicode = ucs4构建时:

>>> import sys
>>> print sys.maxunicode
1114111

When built with --enable-unicode=ucs2: 使用--enable-unicode = ucs2构建时:

>>> import sys
>>> print sys.maxunicode
65535

It's 0xFFFF (or 65535) for UCS-2, and 0x10FFFF (or 1114111) for UCS-4: UCS-2为0xFFFF(或65535),UCS-4为0x10FFFF(或1114111):

Py_UNICODE
PyUnicode_GetMax(void)
{
#ifdef Py_UNICODE_WIDE
    return 0x10FFFF;
#else
    /* This is actually an illegal character, so it should
       not be passed to unichr. */
    return 0xFFFF;
#endif
}

The maximum character in UCS-4 mode is defined by the maxmimum value representable in UTF-16. UCS-4模式中的最大字符由UTF-16中可表示的maxmimum值定义。

I had this same issue once. 我有过同样的问题一次。 I documented it for myself on my wiki at 我在我的wiki上为自己记录了这个

http://arcoleo.org/dsawiki/Wiki.jsp?page=Python%20UTF%20-%20UCS2%20or%20UCS4 http://arcoleo.org/dsawiki/Wiki.jsp?page=Python%20UTF%20-%20UCS2%20or%20UCS4

I wrote - 我写 -

import sys
sys.maxunicode > 65536 and 'UCS4' or 'UCS2'

sysconfig will tell the unicode size from the configuration variables of python. sysconfig将从python的配置变量中告诉unicode大小。

The buildflags can be queried like this. 可以像这样查询构建标志。

Python 2.7: Python 2.7:

import sysconfig
sysconfig.get_config_var('Py_UNICODE_SIZE')

Python 2.6: Python 2.6:

import distutils
distutils.sysconfig.get_config_var('Py_UNICODE_SIZE')

I had the same issue and found a semi-official piece of code that does exactly that and may be interesting for people with the same issue: https://bitbucket.org/pypa/wheel/src/cf4e2d98ecb1f168c50a6de496959b4a10c6b122/wheel/pep425tags.py?at=default&fileviewer=file-view-default#pep425tags.py-83:89 . 我遇到了同样的问题,并发现了一个半官方的代码,它确实可以解决同样的问题: https//bitbucket.org/pypa/wheel/src/cf4e2d98ecb1f168c50a6de496959b4a10c6b122/wheel/pep425tags.py ?at = default&fileviewer = file-view-default#pep425tags.py-83:89

It comes from the wheel project which needs to check if the python is compiled with ucs-2 or ucs-4 because it will change the name of the binary file generated. 它来自wheel项目,它需要检查python是否使用ucs-2或ucs-4编译,因为它将更改生成的二进制文件的名称。

Another way is to create an Unicode array and look at the itemsize: 另一种方法是创建一个Unicode数组并查看itemsize:

import array
bytes_per_char = array.array('u').itemsize

Quote from the array docs : array文档引用:

The 'u' typecode corresponds to Python's unicode character. 'u'类型代码对应Python的unicode字符。 On narrow Unicode builds this is 2-bytes, on wide builds this is 4-bytes. 在窄的Unicode版本中,这是2字节,在宽版本上,这是4字节。

Note that the distinction between narrow and wide Unicode builds is dropped from Python 3.3 onward, see PEP393 . 请注意,从3.3之后的版本中删除了窄版本和宽版本之间的区别,请参阅PEP393 The 'u' typecode for array is deprecated since 3.3 and scheduled for removal in Python 4.0. array'u'类型代码自3.3以来不推荐使用,并计划在Python 4.0中删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM