[英]Approximately converting unicode string to ascii string in python
don't know wether this is trivial or not, but I'd need to convert an unicode string to ascii string, and I wouldn't like to have all those escape chars around. 不知道这是否是微不足道的,但我需要将一个unicode字符串转换为ascii字符串,我不想让所有那些逃避字符。 I mean, is it possible to have an "approximate" conversion to some quite similar ascii character?
我的意思是,是否有可能对一些非常相似的ascii字符进行“近似”转换?
For example: Gavin O'Connor gets converted to Gavin O\\x92Connor, but I'd really like it to be just converted to Gavin O'Connor. 例如:Gavin O'Connor被转换为Gavin O \\ x92Connor,但我真的希望它被转换为Gavin O'Connor。 Is this possible?
这可能吗? Did anyone write some util to do it, or do I have to manually replace all chars?
有没有人写一些工具来做,或者我是否必须手动更换所有的字符?
Thank you very much! 非常感谢你! Marco
马尔科
b = str(a.encode('utf-8').decode('ascii', 'ignore'))
应该工作正常。
import unicodedata
unicode_string = u"Gavin O’Connor"
print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore')
Output: 输出:
Gavin O'Connor
Here's the document that describes the normalization forms: http://unicode.org/reports/tr15/ 以下是描述规范化表单的文档: http : //unicode.org/reports/tr15/
There is a technique to strip accents from characters, but other characters need to be directly replaced. 有一种技术可以去除字符的重音,但是需要直接替换其他字符。 Check this article: http://effbot.org/zone/unicode-convert.htm
查看这篇文章: http : //effbot.org/zone/unicode-convert.htm
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.