简体   繁体   中英

Except Python codec errors?

File "/usr/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 805: invalid start byte

Hi, I get this exception. How do I catch it, and continue reading my files when I get this exception.

My program has a loop that reads a text file line-by-line and tries to do some processing. However, some files I encounter may not be text files, or have lines that are not properly formatted (foreign language etc). I want to ignore those lines.

The following is not working

for line in sys.stdin:
   if line != "":
      try:
         matched = re.match(searchstuff, line, re.IGNORECASE)
         print (matched)
      except UnicodeDecodeError, UnicodeEncodeError:
         continue

Look at http://docs.python.org/py3k/library/codecs.html . When you open the codecs stream, you probably want to use the additional argument errors='ignore'

In Python 3, sys.stdin is by default opened as a text stream (see http://docs.python.org/py3k/library/sys.html ), and has strict error checking.

You need to reopen it as an error-tolerant utf-8 stream. Something like this will work:

sys.stdin = codecs.getreader('utf8')(sys.stdin.detach(), errors='ignore')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM