Python编解码器错误除外？

Question

File "/usr/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 805: invalid start byte

Hi, I get this exception. 嗨，我得到了这个例外。 How do I catch it, and continue reading my files when I get this exception. 如何捕获它，并在我收到此异常时继续读取我的文件。

My program has a loop that reads a text file line-by-line and tries to do some processing. 我的程序有一个循环，逐行读取文本文件并尝试进行一些处理。 However, some files I encounter may not be text files, or have lines that are not properly formatted (foreign language etc). 但是，我遇到的某些文件可能不是文本文件，或者行格式不正确（外语等）。 I want to ignore those lines. 我想忽略这些界限。

The following is not working 以下不起作用

for line in sys.stdin:
   if line != "":
      try:
         matched = re.match(searchstuff, line, re.IGNORECASE)
         print (matched)
      except UnicodeDecodeError, UnicodeEncodeError:
         continue

Answer 1

Look at http://docs.python.org/py3k/library/codecs.html . 请查看http://docs.python.org/py3k/library/codecs.html 。 When you open the codecs stream, you probably want to use the additional argument errors='ignore' 当您打开编解码器流时，您可能希望使用其他参数errors='ignore'

In Python 3, sys.stdin is by default opened as a text stream (see http://docs.python.org/py3k/library/sys.html ), and has strict error checking. 在Python 3中，默认情况下sys.stdin作为文本流打开（请参阅http://docs.python.org/py3k/library/sys.html ），并且具有严格的错误检查。

You need to reopen it as an error-tolerant utf-8 stream. 您需要将其重新打开为容错的utf-8流。 Something like this will work: 这样的东西会起作用：

sys.stdin = codecs.getreader('utf8')(sys.stdin.detach(), errors='ignore')

Python编解码器错误除外？

问题描述

1 个解决方案

解决方案1
6 已采纳

Python编解码器错误除外？

问题描述

1 个解决方案

解决方案1 6 已采纳

解决方案1
6 已采纳