简体   繁体   English

在 Python 中读取多个文件时无效的起始字节

[英]Invalid start byte when reading multiple files in Python

My function reads multiple .sgm files.我的函数读取多个 .sgm 文件。 I get an error when reading the content from the file speficially at line contents = f.read()从文件中读取内容时出现错误,特别是在 line contents = f.read()

def block_reader(path):
    filePaths = []
    for filename in os.listdir(path):
        if filename.endswith(".sgm"):
            filePaths.append(os.path.join(path, filename))
            continue
        else:
            continue

    for file in filePaths:
        with open(file, 'r') as f:
            print(f)
            contents = f.read()
            soup = BeautifulSoup(contents, "lxml")

    return ["test content"]

Error message错误信息

    Traceback (most recent call last):
  File "./block-1-reader.py", line 32, in <module>
    for reuters_file_content in solutions.block_reader(path):
  File "/home/ragith/Documents/A-School/Fall-2020/COMP_479/Assignment_1/solutions.py", line 29, in block_reader
    contents = f.read()
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1519554: invalid start byte

Try this: with open(path, 'rb') as f: That b in the mode specifier in the open() states that the file shall be treated as binary, so contents will remain a bytes.试试这个: with open(path, 'rb') as f: open() 中模式说明符中的那个 b 声明文件应被视为二进制文件,因此内容将保留一个字节。 No decoding attempt will happen this way.不会以这种方式进行解码尝试。 More details at: this link更多详情请访问: 此链接

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 读取 PCD 文件时出现无效起始字节错误 - invalid start byte error while reading PCD files UnicodeDecodeError when reading CSV file in Pandas with Python “'utf-8' codec can't decode byte 0xff in position 0: invalid start byte” - UnicodeDecodeError when reading CSV file in Pandas with Python “'utf-8' codec can't decode byte 0xff in position 0: invalid start byte” Python 套接字:无效的起始字节 - Python socket: invalid start byte 'utf-8' 编解码器无法解码 position 中的字节 0x80 3131:无效的起始字节':在读取 xml 文件时 - 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte': while reading xml files UnicodeDecodeError:'utf-8'编解码器无法解码 position 0 中的字节 0xff:读取 csv 时 python 中的无效起始字节错误 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte error in python while reading a csv file 读取文件时无效的连续字节 - invalid continuation byte when reading file 使用 decode() 时起始字节无效 - Invalid start byte when using decode() 用python读取多个文件 - Reading multiple files in python “utf-8”编解码器无法解码位置 2912 中的字节 0xd5:在 Python 中读取 csv 文件时出现无效的连续字节错误 - 'utf-8' codec can't decode byte 0xd5 in position 2912: invalid continuation byte Error when reading csv file in Python 等同于在MATLAB for Python中读取原始字节文件 - Equivalent Of Reading In Raw Byte Files In MATLAB for Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM