简体   繁体   English

ValueError:unichr()arg不在范围内(0x10000)(窄Python构建)

[英]ValueError: unichr() arg not in range(0x10000) (narrow Python build)

I am trying to convert the html entity to unichar, the html entity is 󮠖 我试图将html实体转换为unichar,html实体是󮠖 when i try to do the following: 当我尝试执行以下操作时:

unichr(int(976918))

I got error that: 我得到的错误是:

ValueError: unichr() arg not in range(0x10000) (narrow Python build)

seems like it is out of range conversion for unichar. 似乎它超出了unichar的范围转换。

You can decode a string that has a Unicode escape ( \\U followed by 8 hex digits, zero-padded) using the "unicode-escape" encoding: 您可以使用"unicode-escape"编码解码具有Unicode转义的字符串( \\U后跟8个十六进制数字,零填充):

>>> s = "\\U%08x" % 976918
>>> s
'\\U000ee816'

>>> c = s.decode('unicode-escape')
>>> c
u'\U000ee816'

On a narrow build it's stored as a UTF-16 surrogate pair: 在狭窄的构建中,它存储为UTF-16代理对:

>>> list(c)
[u'\udb7a', u'\udc16']

This surrogate pair is processed correctly as a code unit during encoding: 在编码期间,此代理项对作为代码单元正确处理:

>>> c.encode('utf-8')
'\xf3\xae\xa0\x96'

>>> '\xf3\xae\xa0\x96'.decode('utf-8')
u'\U000ee816'

Here's an alternate workaround that I developed with the struct module. 这是我使用struct模块开发的替代解决方法。

def unichar(i):
    try:
        return unichr(i)
    except ValueError:
        return struct.pack('i', i).decode('utf-32')

>>> unichar(int('976918'))
u'\U000ee816'

In order for this to work, you either need to build Python yourself, specifying 为了实现这一点,您需要自己构建Python,并指定

./configure --enable-unicode=ucs4

before compiling, or else you need to move to Python 3. 在编译之前,或者你需要转到Python 3。

Even if you do this, there are apparently problems on Windows, which will be fixed in the next version of Python (3.3). 即使你这样做,Windows上也存在明显的问题,将在下一版本的Python(3.3)中修复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM