简体   繁体   English

Python如何将8位ASCII字符串转换为16位Unicode

[英]Python How to convert 8-bit ASCII string to 16-Bit Unicode

Although Python 3.x solved the problem that uppercase and lowercase for some locales (for example tr_TR.utf8) Python 2.x branch lacks this. 尽管Python 3.x解决了某些语言环境(例如tr_TR.utf8)大写和小写的问题,但Python 2.x分支缺少这一点。 Several workaround for this issuse like https://github.com/emre/unicode_tr/ but did not like this kind of a solution. 这个问题的几个解决方法如https://github.com/emre/unicode_tr/,但不喜欢这种解决方案。

So I am implementing a new upper/lower/capitalize/title methods for monkey-patching unicode class with string.maketrans method. 所以我正在使用string.maketrans方法为猴子修补unicode类实现一个新的upper / lower / capitalize / title方法。

The problem with maketrans is the lenghts of two strings must have same lenght. maketrans的问题是两个字符串的长度必须具有相同的长度。 The nearest solution came to my mind is "How can I convert 1 Byte char to 2 bytes?" 我最近的解决方案是“如何将1字节字符转换为2个字节?”


Note: translate method does work only ascii encoding, when I pass u'İ' (1 byte length \İ) as arguments to translate gives ascii encoding error. 注意: translate方法只能用ascii编码,当我传递u'İ' (1字节长度\\ u0130)作为translate参数时会给出ascii编码错误。

from string import maketrans

import unicodedata
c1 = unicodedata.normalize('NFKD',u'i').encode('utf-8')
c2 = unicodedata.normalize('NFKD',u'İ').encode('utf-8')
c1,len(c1)
('\xc4\xb1', 2)

# c2,len(c2)
# ('I', 1)
'istanbul'.translate( maketrans(c1,c2))
ValueError: maketrans arguments must have same length

Unicode objects allow multicharacter translation via a dictionary instead of two byte strings mapped through maketrans . Unicode对象允许通过字典进行多字符转换,而不是通过maketrans映射的两个字节字符串。

#!python2
#coding:utf8
D = {ord(u'i'):u'İ'}
print u'istanbul'.translate(D)

Output: 输出:

İstanbul

If you start with an ASCII byte string and want the result in UTF-8, simply decode/encode around the translation: 如果您以ASCII字节字符串开头并希望结果为UTF-8,只需对翻译进行解码/编码:

#!python2
#coding:utf8
D = {ord(u'i'):u'İ'}
s = 'istanbul'.decode('ascii')
t = s.translate(D)
s = t.encode('utf8')
print repr(s)

Output: 输出:

'\xc4\xb0stanbul'

The following technique can do the job of maketrans . 以下技术可以完成maketrans的工作。 Note that the dictionary keys must be Unicode ordinals, but the value can be Unicode ordinals, Unicode strings or None . 请注意,字典键必须是Unicode序号,但值可以是Unicode序号,Unicode字符串或None If None , the character is deleted when translated. 如果为None ,则在翻译时删除该字符。

#!python2
#coding:utf8
def maketrans(a,b):
    return dict(zip(map(ord,a),b))
D = maketrans(u'àáâãäå',u'ÀÁÂÃÄÅ')
print u'àbácâdãeäfåg'.translate(D)

Output: 输出:

ÀbÁcÂdÃeÄfÅg

Reference: str.translate 参考: str.translate

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将OpenCV中的16位图像转换为8位图像? - How to convert a 16-bit to an 8-bit image in OpenCV? 如何在 python 中从字符串转换为 16 位无符号 integer? - How to convert from string to 16-bit unsigned integer in python? 将 16 位 Tiff 图像转换为 8 位 RGB - Convert 16-bit Tiff image to 8-bit RGB Python:在PIL和/或pygame中操作16位.tiff图像:以某种方式转换为8位? - Python: Manipulating a 16-bit .tiff image in PIL &/or pygame: convert to 8-bit somehow? ASCII字符串到16位值的序列 - ascii string to sequence of 16-bit values scikit-image将8位图像读取为16位 - scikit-image read 8-bit image as 16-bit 使用struct模块在python中将2位字符的字符串中的16位ASCII数据解码为整数时,如何将其解码? - How to decode a 16-bit ASCII data to an integer when you have it in a string of 2 characters in python using struct module? Python PIL将我的16位灰度图像截断为8位 - Python PIL cut off my 16-bit grayscale image at 8-bit 将整数转换为 8 位 ASCII 字符,而不是 Python 3 中的 Unicode - Converting integer to 8-bit ASCII characters, NOT Unicode in Python 3 8 位和 16 位图像在视觉上怎么可能没有区别? - How is it possible that there is visually no difference between 8-bit and 16-bit images?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM