python UnicodeWarning：Unicode等同比較。如何解決這個錯誤？

Question

就像這里和這里一樣，我運行這段代碼：

with open(fin,'r') as inFile, open(fout,'w') as outFile:
  for line in inFile:
     line = line.replace('."</documents', '"').replace('. ', ' ')
     print(' '.join([word for word in line.lower().split() if len(word) >=3 and word not in stopwords.words('english')]), file = outFile)

我有以下錯誤：

**UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  print(' '.join([word for word in line.lower().split() if len(word) >=3 and word not in stopwords.words('english')]), file = outFile)**

我怎么解決這個問題？

Answer 1

word not in stopwords.words('english')使用比較。 stopwords.words('english') word或至少一個值不是Unicode值。

由於您正在閱讀文件，因此這里最有可能的候選人是word ; 解碼它，或使用在讀取數據時解碼數據的文件對象：

print(' '.join([word for word in line.lower().split()
                if len(word) >=3 and
                   word.decode('utf8') not in stopwords.words('english')]),
      file = outFile)**

要么

import io

with io.open(fin,'r', encoding='utf8') as inFile,\
        io.open(fout,'w', encoding='utf8') as outFile:

其中io.open()函數為您提供文本模式下的文件對象，可根據需要進行編碼或解碼。

后者不易出錯。 例如，你測試word的長度，但你真正測試的是字節數 。 任何包含ASCII碼點范圍之外的字符的單詞將導致每個字符有多個UTF-8字節，因此len(word)與len(word.decode('utf8')) 。

python UnicodeWarning：Unicode等同比較。如何解決這個錯誤？

問題描述

1 個解決方案

解決方案1
3 已采納 2015-01-19 11:51:25

python UnicodeWarning：Unicode等同比較。 如何解決這個錯誤？

問題描述

1 個解決方案

解決方案1 3 已采納 2015-01-19 11:51:25

python UnicodeWarning：Unicode等同比較。如何解決這個錯誤？

解決方案1
3 已采納 2015-01-19 11:51:25