[英]Python How to convert 8-bit ASCII string to 16-Bit Unicode
Although Python 3.x solved the problem that uppercase and lowercase for some locales (for example tr_TR.utf8) Python 2.x branch lacks this. 尽管Python 3.x解决了某些语言环境(例如tr_TR.utf8)大写和小写的问题,但Python 2.x分支缺少这一点。 Several workaround for this issuse like https://github.com/emre/unicode_tr/ but did not like this kind of a solution.
这个问题的几个解决方法如https://github.com/emre/unicode_tr/,但不喜欢这种解决方案。
So I am implementing a new upper/lower/capitalize/title methods for monkey-patching unicode class with string.maketrans method. 所以我正在使用string.maketrans方法为猴子修补unicode类实现一个新的upper / lower / capitalize / title方法。
The problem with maketrans is the lenghts of two strings must have same lenght. maketrans的问题是两个字符串的长度必须具有相同的长度。 The nearest solution came to my mind is "How can I convert 1 Byte char to 2 bytes?"
我最近的解决方案是“如何将1字节字符转换为2个字节?”
Note: translate
method does work only ascii encoding, when I pass u'İ'
(1 byte length \İ) as arguments to translate
gives ascii encoding error. 注意:
translate
方法只能用ascii编码,当我传递u'İ'
(1字节长度\\ u0130)作为translate
参数时会给出ascii编码错误。
from string import maketrans
import unicodedata
c1 = unicodedata.normalize('NFKD',u'i').encode('utf-8')
c2 = unicodedata.normalize('NFKD',u'İ').encode('utf-8')
c1,len(c1)
('\xc4\xb1', 2)
# c2,len(c2)
# ('I', 1)
'istanbul'.translate( maketrans(c1,c2))
ValueError: maketrans arguments must have same length
Unicode objects allow multicharacter translation via a dictionary instead of two byte strings mapped through maketrans
. Unicode对象允许通过字典进行多字符转换,而不是通过
maketrans
映射的两个字节字符串。
#!python2
#coding:utf8
D = {ord(u'i'):u'İ'}
print u'istanbul'.translate(D)
Output: 输出:
İstanbul
If you start with an ASCII byte string and want the result in UTF-8, simply decode/encode around the translation: 如果您以ASCII字节字符串开头并希望结果为UTF-8,只需对翻译进行解码/编码:
#!python2
#coding:utf8
D = {ord(u'i'):u'İ'}
s = 'istanbul'.decode('ascii')
t = s.translate(D)
s = t.encode('utf8')
print repr(s)
Output: 输出:
'\xc4\xb0stanbul'
The following technique can do the job of maketrans
. 以下技术可以完成
maketrans
的工作。 Note that the dictionary keys must be Unicode ordinals, but the value can be Unicode ordinals, Unicode strings or None
. 请注意,字典键必须是Unicode序号,但值可以是Unicode序号,Unicode字符串或
None
。 If None
, the character is deleted when translated. 如果为
None
,则在翻译时删除该字符。
#!python2
#coding:utf8
def maketrans(a,b):
return dict(zip(map(ord,a),b))
D = maketrans(u'àáâãäå',u'ÀÁÂÃÄÅ')
print u'àbácâdãeäfåg'.translate(D)
Output: 输出:
ÀbÁcÂdÃeÄfÅg
Reference: str.translate 参考: str.translate
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.