简体   繁体   中英

“Unicode error” when reading a file

This is my first post on here, so I don't hope this isn't in the wrong topic or something, but I've run into a somewhat unusual problem with a Python app I'm writing.

Basically, what I'm trying to get it to do is to read from a text file and insert part of it into a Tkinter text widget. The text file contains the usual "\\n" line breaks, but when I run the code I get this bizarre error that I haven't been able to cook up a workaround for:

(BTW, sorry for the lousy set-up here... not sure how to work this new code-entering system; it seems to "play by its own rules" and have its own syntax, so I just copied/pasted it below:

    Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Python33\lib\idlelib\run.py", line 107, in main
    seq, request = rpc.request_queue.get(block=True, timeout=0.05)
  File "C:\Python33\lib\queue.py", line 175, in get
    raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python33\lib\tkinter\__init__.py", line 1442, in __call__
    return self.func(*args)
  File "C:\Users\Owner\Desktop\Python projects\The Ultimate Joke Book.py", line 89, in search
    results.create()
  File "C:\Users\Owner\Desktop\Python projects\The Ultimate Joke Book.py", line 31, in create
    joke = linecache.getline('Jokes/jokelist.txt',x)
  File "C:\Python33\lib\linecache.py", line 15, in getline
    lines = getlines(filename, module_globals)
  File "C:\Python33\lib\linecache.py", line 41, in getlines
    return updatecache(filename, module_globals)
  File "C:\Python33\lib\linecache.py", line 127, in updatecache
    lines = fp.readlines()
  File "C:\Python33\lib\codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 627: invalid start byte

So the function that caused the problem -- just a "linecache.getline" used in a for loop -- works perfectly when there is no "\\" in the text, but for whatever reason it doesn't like the "\\" and starts spittin' errors. : /

So tonight I've spent nearly an hour on the "docs" ( http://docs.python.org/3/howto/unicode.html ), reading all the history and basic concept of unicode, but it was loaded with assumed knowledge and while it was informative and helpful on a concept-only level, it didn't seem to do much in terms of practical information and potential solutions.

The only solution I can come up with to defeat this annoying little bug is to use "/n" instead and programmatically split the strings into an array (or a "list" as they seem to be called in Python), then use a loop to break it up into more than 1 line... but that sounds like a lot of unnecessary steps, especially if there is a common workaround already in existence. So I would appreciate any insights on how to solve this particularly mysterious problem.

Thanks.

The data that the UTF-8 decoder has been given is not UTF-8. That's why you get the error. You need to give us the code that fails and some data examples to explain exactly what is happening.

The character in question is "¿" in Latin-1 and CP-1252. Perhaps this is a Spanish text written on a Windows machine? In that case, specify the encoding when opening the file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM