简体   繁体   中英

Python How to convert 8-bit ASCII string to 16-Bit Unicode

Although Python 3.x solved the problem that uppercase and lowercase for some locales (for example tr_TR.utf8) Python 2.x branch lacks this. Several workaround for this issuse like https://github.com/emre/unicode_tr/ but did not like this kind of a solution.

So I am implementing a new upper/lower/capitalize/title methods for monkey-patching unicode class with string.maketrans method.

The problem with maketrans is the lenghts of two strings must have same lenght. The nearest solution came to my mind is "How can I convert 1 Byte char to 2 bytes?"


Note: translate method does work only ascii encoding, when I pass u'İ' (1 byte length \İ) as arguments to translate gives ascii encoding error.

from string import maketrans

import unicodedata
c1 = unicodedata.normalize('NFKD',u'i').encode('utf-8')
c2 = unicodedata.normalize('NFKD',u'İ').encode('utf-8')
c1,len(c1)
('\xc4\xb1', 2)

# c2,len(c2)
# ('I', 1)
'istanbul'.translate( maketrans(c1,c2))
ValueError: maketrans arguments must have same length

Unicode objects allow multicharacter translation via a dictionary instead of two byte strings mapped through maketrans .

#!python2
#coding:utf8
D = {ord(u'i'):u'İ'}
print u'istanbul'.translate(D)

Output:

İstanbul

If you start with an ASCII byte string and want the result in UTF-8, simply decode/encode around the translation:

#!python2
#coding:utf8
D = {ord(u'i'):u'İ'}
s = 'istanbul'.decode('ascii')
t = s.translate(D)
s = t.encode('utf8')
print repr(s)

Output:

'\xc4\xb0stanbul'

The following technique can do the job of maketrans . Note that the dictionary keys must be Unicode ordinals, but the value can be Unicode ordinals, Unicode strings or None . If None , the character is deleted when translated.

#!python2
#coding:utf8
def maketrans(a,b):
    return dict(zip(map(ord,a),b))
D = maketrans(u'àáâãäå',u'ÀÁÂÃÄÅ')
print u'àbácâdãeäfåg'.translate(D)

Output:

ÀbÁcÂdÃeÄfÅg

Reference: str.translate

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM