简体   繁体   English

读取文件时出现“ Unicode错误”

[英]“Unicode error” when reading a file

This is my first post on here, so I don't hope this isn't in the wrong topic or something, but I've run into a somewhat unusual problem with a Python app I'm writing. 这是我在这里的第一篇文章,所以我不希望这不是一个错误的话题,但是我编写的Python应用程序遇到了一个不寻常的问题。

Basically, what I'm trying to get it to do is to read from a text file and insert part of it into a Tkinter text widget. 基本上,我想要做的是从文本文件中读取并将其一部分插入Tkinter文本小部件中。 The text file contains the usual "\\n" line breaks, but when I run the code I get this bizarre error that I haven't been able to cook up a workaround for: 文本文件包含通常的“ \\ n”换行符,但是当我运行代码时,出现了这个奇怪的错误,我无法解决以下问题:

(BTW, sorry for the lousy set-up here... not sure how to work this new code-entering system; it seems to "play by its own rules" and have its own syntax, so I just copied/pasted it below: (顺便说一句,对于这里的糟糕设置感到抱歉...不确定如何使用这个新的代码输入系统;它似乎“按照自己的规则运行”并具有自己的语法,所以我在下面复制/粘贴了它) :

    Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Python33\lib\idlelib\run.py", line 107, in main
    seq, request = rpc.request_queue.get(block=True, timeout=0.05)
  File "C:\Python33\lib\queue.py", line 175, in get
    raise Empty
queue.Empty

During handling of the above exception, another exception occurred: 在处理上述异常期间,发生了另一个异常:

Traceback (most recent call last):
  File "C:\Python33\lib\tkinter\__init__.py", line 1442, in __call__
    return self.func(*args)
  File "C:\Users\Owner\Desktop\Python projects\The Ultimate Joke Book.py", line 89, in search
    results.create()
  File "C:\Users\Owner\Desktop\Python projects\The Ultimate Joke Book.py", line 31, in create
    joke = linecache.getline('Jokes/jokelist.txt',x)
  File "C:\Python33\lib\linecache.py", line 15, in getline
    lines = getlines(filename, module_globals)
  File "C:\Python33\lib\linecache.py", line 41, in getlines
    return updatecache(filename, module_globals)
  File "C:\Python33\lib\linecache.py", line 127, in updatecache
    lines = fp.readlines()
  File "C:\Python33\lib\codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 627: invalid start byte

So the function that caused the problem -- just a "linecache.getline" used in a for loop -- works perfectly when there is no "\\" in the text, but for whatever reason it doesn't like the "\\" and starts spittin' errors. 因此,导致问题的函数(仅用于for循环中的“ linecache.getline”)在文本中没有“ \\”但由于某种原因而不喜欢“ \\”而完美工作开始吐出错误。 : / :/

So tonight I've spent nearly an hour on the "docs" ( http://docs.python.org/3/howto/unicode.html ), reading all the history and basic concept of unicode, but it was loaded with assumed knowledge and while it was informative and helpful on a concept-only level, it didn't seem to do much in terms of practical information and potential solutions. 因此,今晚我在“文档”( http://docs.python.org/3/howto/unicode.html )上花费了近一个小时,阅读了unicode的所有历史和基本概念,但是其中充满了假设知识,尽管它在纯概念的水平上是有益的,但在实用信息和潜在解决方案方面似乎做得并不多。

The only solution I can come up with to defeat this annoying little bug is to use "/n" instead and programmatically split the strings into an array (or a "list" as they seem to be called in Python), then use a loop to break it up into more than 1 line... but that sounds like a lot of unnecessary steps, especially if there is a common workaround already in existence. 为了克服这个烦人的小错误,我唯一想出的解决方案是改用“ / n”并以编程方式将字符串拆分为数组(或在Python中似乎是“列表”的字符串),然后使用循环将其分解为多于1行...但这听起来像是很多不必要的步骤,尤其是如果已经有一个通用的解决方法。 So I would appreciate any insights on how to solve this particularly mysterious problem. 因此,对于在解决这个特别神秘的问题上的见解,我将不胜感激。

Thanks. 谢谢。

The data that the UTF-8 decoder has been given is not UTF-8. 已给定UTF-8解码器的数据不是UTF-8。 That's why you get the error. 这就是为什么您得到错误。 You need to give us the code that fails and some data examples to explain exactly what is happening. 您需要给我们提供失败的代码和一些数据示例,以准确说明正在发生的事情。

The character in question is "¿" in Latin-1 and CP-1252. 所讨论的字符在Latin-1和CP-1252中为“¿”。 Perhaps this is a Spanish text written on a Windows machine? 也许这是在Windows计算机上编写的西班牙语文本? In that case, specify the encoding when opening the file. 在这种情况下,请在打开文件时指定编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM