在python中将unicode字符串近似转换为ascii字符串

Question

don't know wether this is trivial or not, but I'd need to convert an unicode string to ascii string, and I wouldn't like to have all those escape chars around. 不知道这是否是微不足道的，但我需要将一个unicode字符串转换为ascii字符串，我不想让所有那些逃避字符。 I mean, is it possible to have an "approximate" conversion to some quite similar ascii character? 我的意思是，是否有可能对一些非常相似的ascii字符进行“近似”转换？

For example: Gavin O'Connor gets converted to Gavin O\\x92Connor, but I'd really like it to be just converted to Gavin O'Connor. 例如：Gavin O'Connor被转换为Gavin O \\ x92Connor，但我真的希望它被转换为Gavin O'Connor。 Is this possible? 这可能吗？ Did anyone write some util to do it, or do I have to manually replace all chars? 有没有人写一些工具来做，或者我是否必须手动更换所有的字符？

Thank you very much! 非常感谢你！ Marco 马尔科

Answer 1

Use the Unidecode package to transliterate the string. 使用Unidecode包来音译字符串。

>>> import unidecode
>>> unidecode.unidecode(u'Gavin O’Connor')
"Gavin O'Connor"

Answer 2

b = str(a.encode('utf-8').decode('ascii', 'ignore'))

应该工作正常。

Answer 3

import unicodedata

unicode_string = u"Gavin O’Connor"
print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore')

Output: 输出：

Gavin O'Connor

Here's the document that describes the normalization forms: http://unicode.org/reports/tr15/ 以下是描述规范化表单的文档： http ： //unicode.org/reports/tr15/

Answer 4

There is a technique to strip accents from characters, but other characters need to be directly replaced. 有一种技术可以去除字符的重音，但是需要直接替换其他字符。 Check this article: http://effbot.org/zone/unicode-convert.htm 查看这篇文章： http ： //effbot.org/zone/unicode-convert.htm

Answer 5

Try simple character replacement 尝试简单的角色替换

str1 = "“I am the greatest”, said Gavin O’Connor"
print(str1)
print(str1.replace("’", "'").replace("“","\"").replace("”","\""))

PS: add # -*- coding: utf-8 -*- to the top of your .py file if you get error PS：如果出现错误，请将# -*- coding: utf-8 -*-到.py文件的顶部

在python中将unicode字符串近似转换为ascii字符串

问题描述

5 个解决方案

解决方案1
28 2011-11-10 22:49:24

解决方案2
8 2011-11-10 22:50:26

解决方案3
3 2011-11-10 22:48:39

解决方案4
1 2011-11-10 22:47:48

解决方案5
0 2018-01-01 12:33:14

在python中将unicode字符串近似转换为ascii字符串

问题描述

5 个解决方案

解决方案1 28 2011-11-10 22:49:24

解决方案2 8 2011-11-10 22:50:26

解决方案3 3 2011-11-10 22:48:39

解决方案4 1 2011-11-10 22:47:48

解决方案5 0 2018-01-01 12:33:14

解决方案1
28 2011-11-10 22:49:24

解决方案2
8 2011-11-10 22:50:26

解决方案3
3 2011-11-10 22:48:39

解决方案4
1 2011-11-10 22:47:48

解决方案5
0 2018-01-01 12:33:14