從字符串中刪除特殊字符（¡）

Question

我正在嘗試從集合中寫入文件。 集合中有特殊字符¡ 例如，集合中的內容具有以下詳細信息：

{...，名稱：¡嗨！，...}

現在我試圖將相同內容寫入文件，但出現錯誤

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

我嘗試使用這里提供的解決方案但徒勞。 如果有人可以幫助我，這將是很棒的:)

因此，示例如下所示：

我有一個收藏，其中包含以下詳細信息

{ "_id":ObjectId("5428ead854fed46f5ec4a0c9"), 
   "author":null,
   "class":"culture",
   "created":1411967707.356593,
   "description":null,
   "id":"eba9b4e2-900f-4707-b57d-aa659cbd0ac9",
   "name":"¡Hola!",
   "reviews":[

   ],
   "screenshot_urls":[

   ]
}

現在，我嘗試從集合中訪問此處的name條目，並通過在集合中進行迭代來做到這一點，即

f = open("sample.txt","w");

for val in exampleCollection:
   f.write("%s"%str(exampleCollection[val]).encode("utf-8"))

f.close();

Answer 1

刪除不需要的字符的最簡單方法是指定要使用的字符。

>>> import string
>>> validchars = string.ascii_letters + string.digits + ' '
>>> s = '¡Hi there!'
>>> clean = ''.join(c for c in s if c in validchars)
>>> clean
'Hi there'

如果某些形式的標點符號可以，請將它們添加到有效字符中。

Answer 2

這將刪除字符串中所有無效ASCII字符。

>>> '¡Hola!'.encode('ascii', 'ignore').decode('ascii')
'Hola!'

或者，您可以將文件編寫為UTF-8 ，它可以表示地球上幾乎所有的字符。

Answer 3

正如一位用戶在此頁面上發布的那樣，您應該看一下以下文檔中的Unicode教程： https : //docs.python.org/2/howto/unicode.html

您正在嘗試使用超出ASCII范圍（僅128個符號）的字符。 我在不久前發現了一篇非常好的文章，我將嘗試在此處找到並發布。

編輯：啊，這是： http : //www.joelonsoftware.com/articles/Unicode.html

Answer 4

您正在嘗試以“嚴格”模式將unicode轉換為ascii：

>>> help(str.encode)
Help on method_descriptor:

encode(...)
    S.encode([encoding[,errors]]) -> object

    Encodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
    'xmlcharrefreplace' as well as any other name registered with
    codecs.register_error that is able to handle UnicodeEncodeErrors.

您可能想要以下內容之一：

s = u'¡Hi there!'

print s.encode('ascii', 'ignore')    # removes the ¡
print s.encode('ascii', 'replace')   # replaces with ?
print s.encode('ascii','xmlcharrefreplace') # turn into xml entities
print s.encode('ascii', 'strict')    # throw UnicodeEncodeErrors

從字符串中刪除特殊字符（¡）

問題描述

4 個解決方案

解決方案1
2 已采納 2015-10-23 18:52:51

解決方案2
1 2015-10-23 19:07:27

解決方案3
0 2015-10-23 18:45:49

解決方案4
0 2015-10-23 19:18:06

從字符串中刪除特殊字符（¡）

問題描述

4 個解決方案

解決方案1 2 已采納 2015-10-23 18:52:51

解決方案2 1 2015-10-23 19:07:27

解決方案3 0 2015-10-23 18:45:49

解決方案4 0 2015-10-23 19:18:06

解決方案1
2 已采納 2015-10-23 18:52:51

解決方案2
1 2015-10-23 19:07:27

解決方案3
0 2015-10-23 18:45:49

解決方案4
0 2015-10-23 19:18:06