在python中將unicode字符串近似轉換為ascii字符串

Question

不知道這是否是微不足道的，但我需要將一個unicode字符串轉換為ascii字符串，我不想讓所有那些逃避字符。 我的意思是，是否有可能對一些非常相似的ascii字符進行“近似”轉換？

例如：Gavin O'Connor被轉換為Gavin O \\ x92Connor，但我真的希望它被轉換為Gavin O'Connor。 這可能嗎？ 有沒有人寫一些工具來做，或者我是否必須手動更換所有的字符？

非常感謝你！ 馬爾科

Answer 1

使用Unidecode包來音譯字符串。

>>> import unidecode
>>> unidecode.unidecode(u'Gavin O’Connor')
"Gavin O'Connor"

Answer 2

b = str(a.encode('utf-8').decode('ascii', 'ignore'))

應該工作正常。

Answer 3

import unicodedata

unicode_string = u"Gavin O’Connor"
print unicodedata.normalize('NFKD', unicode_string).encode('ascii','ignore')

輸出：

Gavin O'Connor

以下是描述規范化表單的文檔： http ： //unicode.org/reports/tr15/

Answer 4

有一種技術可以去除字符的重音，但是需要直接替換其他字符。 查看這篇文章： http ： //effbot.org/zone/unicode-convert.htm

Answer 5

嘗試簡單的角色替換

str1 = "“I am the greatest”, said Gavin O’Connor"
print(str1)
print(str1.replace("’", "'").replace("“","\"").replace("”","\""))

PS：如果出現錯誤，請將# -*- coding: utf-8 -*-到.py文件的頂部