![](/img/trans.png)
[英]UnicodeDecodeError when using a Python string handling function
[英]UnicodeDecodeError in function python 3.5.2
UnicodeDecodeError
def getWordFreqs(textPath, stopWordsPath):
wordFreqs = dict()
#open the file in read mode and open stop words
file = open(textPath, 'r')
stopWords = set(line.strip() for line in open(stopWordsPath))
#read the text
text = file.read()
#exclude punctuation and convert to lower case; exclude numbers as well
punctuation = set('!"#$%&\()*+,-./:;<=>?@[\\]^_`{|}~')
text = ''.join(ch.lower() for ch in text if ch not in punctuation)
text = ''.join(ch for ch in text if not ch.isdigit())
#read through the words and add to frequency dictionary
#if it is not a stop word
for word in text.split():
if word not in stopWords:
if word in wordFreqs:
wordFreqs[word] += 1
else:
wordFreqs[word] = 1
每當我嘗試在python 3.5.2中運行此函數時,都會收到以下錯誤,但是在3.4.3中它可以正常工作,我無法弄清楚是什么原因導致了此錯誤。
line 9, in getWordFreqs
text = file.read()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position 520: ordinal not in range(128)
在Python 3中,默認open
是使用locale.getpreferredencoding(False)
返回的編碼。 它通常不是ascii
,但是如果在某種錯誤消息指示的某種框架下運行則可以。
而是,指定要嘗試讀取的文件的編碼。 如果文件是在Windows下創建的,則編碼可能是cp1252
,尤其是因為字節\\x97
是該編碼下的EM DASH
。
嘗試:
file = open(textPath, 'r', encoding='cp1252')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.