[英]Invalid start byte when reading multiple files in Python
My function reads multiple .sgm files.我的函数读取多个 .sgm 文件。 I get an error when reading the content from the file speficially at line contents = f.read()
从文件中读取内容时出现错误,特别是在 line contents = f.read()
def block_reader(path):
filePaths = []
for filename in os.listdir(path):
if filename.endswith(".sgm"):
filePaths.append(os.path.join(path, filename))
continue
else:
continue
for file in filePaths:
with open(file, 'r') as f:
print(f)
contents = f.read()
soup = BeautifulSoup(contents, "lxml")
return ["test content"]
Error message错误信息
Traceback (most recent call last):
File "./block-1-reader.py", line 32, in <module>
for reuters_file_content in solutions.block_reader(path):
File "/home/ragith/Documents/A-School/Fall-2020/COMP_479/Assignment_1/solutions.py", line 29, in block_reader
contents = f.read()
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1519554: invalid start byte
Try this: with open(path, 'rb') as f:
That b in the mode specifier in the open() states that the file shall be treated as binary, so contents will remain a bytes.试试这个: with open(path, 'rb') as f:
open() 中模式说明符中的那个 b 声明文件应被视为二进制文件,因此内容将保留一个字节。 No decoding attempt will happen this way.不会以这种方式进行解码尝试。 More details at: this link更多详情请访问: 此链接
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.