简体   繁体   English

无法在 Python 中打开文本文件 3:Unicode 错误

[英]Cannot open text files in Python 3: Unicode error

I am having a problem open a simple text file in Python 3.8.我在 Python 3.8 中打开一个简单的文本文件时遇到问题。 I setup a simple test.我设置了一个简单的测试。

Here is my test code:这是我的测试代码:

import os

file_path = "c:\Users\username\Documents\folder1\some_file.txt"

with open(file_path, 'r') as f:
    for line in f:
        print(line)

I get the following error: Unicode Error "unicodeescape" codec can't decode bytes in position 2-3.我收到以下错误: Unicode 错误“unicodeescape”编解码器无法解码 position 2-3 中的字节。

I have read other posts about putting an 'r' in front of the file path and when I do I get an "No such file or directory: 'c:\Users\username\Documents\folder1\some_file.txt'我已经阅读了其他关于在文件路径前面放置“r”的帖子,当我这样做时,我得到一个“没有这样的文件或目录:'c:\Users\username\Documents\folder1\some_file.txt'

import os

file_path = r"c:\Users\username\Documents\folder1\some_file.txt"

with open(file_path, 'r') as f:
    for line in f:
        print(line)

I have also tried using double backslash in the path c:\\Users\\username\\Documents\\folder1\\some_file.txt and that did not work either.我还尝试在路径c:\\Users\\username\\Documents\\folder1\\some_file.txt中使用双反斜杠,但这也不起作用。

I have tried a test using pathlib and still get unicode error.我已经尝试使用 pathlib 进行测试,但仍然出现 unicode 错误。

from pathlib import Path

file_path = "c:\Users\username\Documents\folder1\some_file.txt"

file_path = Path(file_path).absolute()

with open(fpath, 'r', encoding='utf-8') as f:
    line = f.readlines()
    for line in f:
        print(line)

In your first example, file_path = "c:\Users\username\Documents\folder1\some_file.txt" The \U in \Users represents a Unicode escape sequence and it is trying to decode sers as a Unicode character, which it is not.在您的第一个示例中, file_path = "c:\Users\username\Documents\folder1\some_file.txt" \Users 中的 \U 表示 Unicode 转义序列,它试图将 sers 解码为 Unicode 字符,它不是.

On my machine, the double backslash seems to work - but of course I do not have a text file at that path so I can not really test.在我的机器上,双反斜杠似乎可以工作 - 但当然我在该路径上没有文本文件,所以我无法真正测试。

Try first the double backslash for just the \U.首先尝试为 \U 使用双反斜杠。

Further information:更多信息:

As Andrew Kaluzniacki's answer rightly pointed out, the trouble was with the path string.正如Andrew Kaluzniacki 的回答正确指出的那样,问题出在路径字符串上。

Using pathlib.Path使用 pathlib.Path

To prevent hassle with Windows paths and delimiters, one could opt to always use forward slashes, as both open() and Windows have mechanisms of handling this.为了避免 Windows 路径和分隔符的麻烦,可以选择始终使用正斜杠,因为open()和 Windows 都有处理此问题的机制。

However, for the sake of backward compatibility and robustness it is arguably better to use pathlib.Path from the standard library (still using forward slashes as path separators).但是,为了向后兼容和健壮性,使用标准库中的pathlib.Path可能会更好(仍然使用正斜杠作为路径分隔符)。 pathlib automatically converts the filepath to one suited to your OS. pathlib自动将文件路径转换为适合您的操作系统的路径。

Lastly, os.path.filesep() returns the file separator for the host OS.最后, os.path.filesep()返回主机操作系统的文件分隔符。

Another potential cause for people having a similar issue人们遇到类似问题的另一个潜在原因

There may be a character in the file not covered by your operating system's default encoding.文件中可能存在操作系统默认编码未涵盖的字符。

Try other encodings by passing a different encoding to the encoding parameter of open() .通过将不同的编码传递给open()encoding参数来尝试其他编码。

You could try open(file_path, 'r', encoding='utf-8') , although that should be the default for Windows 10, assuming that is your OS based on you filepath examples.您可以尝试open(file_path, 'r', encoding='utf-8') ,尽管这应该是 Windows 10 的默认设置,假设这是基于文件路径示例的操作系统。

Without knowing what is in the file, it is hard to know which encoding would work, however.但是,如果不知道文件中的内容,就很难知道哪种编码会起作用。

from pathlib import Path

fpath = Path(fpath).absolute()
# ^^ absolute() is not necessary if
# the file is in the same directory
# as the calling Python script
# and you just pass a filename.

with open(fpath, 'r', encoding='utf-8') as filehandle:
    do_something_with(filehandle)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM