简体   繁体   English

在python中将unicode字符串近似转换为ascii字符串

[英]Approximately converting unicode string to ascii string in python

don't know wether this is trivial or not, but I'd need to convert an unicode string to ascii string, and I wouldn't like to have all those escape chars around. 不知道这是否是微不足道的,但我需要将一个unicode字符串转换为ascii字符串,我不想让所有那些逃避字符。 I mean, is it possible to have an "approximate" conversion to some quite similar ascii character? 我的意思是,是否有可能对一些非常相似的ascii字符进行“近似”转换?

For example: Gavin O'Connor gets converted to Gavin O\\x92Connor, but I'd really like it to be just converted to Gavin O'Connor. 例如:Gavin O'Connor被转换为Gavin O \\ x92Connor,但我真的希望它被转换为Gavin O'Connor。 Is this possible? 这可能吗? Did anyone write some util to do it, or do I have to manually replace all chars? 有没有人写一些工具来做,或者我是否必须手动更换所有的字符?

Thank you very much! 非常感谢你! Marco 马尔科

Use the Unidecode package to transliterate the string. 使用Unidecode包来音译字符串。

>>> import unidecode
>>> unidecode.unidecode(u'Gavin O’Connor')
"Gavin O'Connor"
b = str(a.encode('utf-8').decode('ascii', 'ignore'))

应该工作正常。

import unicodedata

unicode_string = u"Gavin O’Connor"
print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore')

Output: 输出:

Gavin O'Connor

Here's the document that describes the normalization forms: http://unicode.org/reports/tr15/ 以下是描述规范化表单的文档: http//unicode.org/reports/tr15/

There is a technique to strip accents from characters, but other characters need to be directly replaced. 有一种技术可以去除字符的重音,但是需要直接替换其他字符。 Check this article: http://effbot.org/zone/unicode-convert.htm 查看这篇文章: http//effbot.org/zone/unicode-convert.htm

Try simple character replacement 尝试简单的角色替换

str1 = "“I am the greatest”, said Gavin O’Connor"
print(str1)
print(str1.replace("’", "'").replace("“","\"").replace("”","\""))

PS: add # -*- coding: utf-8 -*- to the top of your .py file if you get error PS:如果出现错误,请将# -*- coding: utf-8 -*-.py文件的顶部

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM