简体   繁体   English

在读取模式下读取二进制文件Python 3-在Windows上通过,在Linux上失败

[英]Reading a binary file in read mode Python 3 - passes on Windows, fails on Linux

I am executing this piece of code against 我正在执行这段代码

Python on Windows Windows上的Python

'3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]'

and

Python on Linux Linux上的Python

'3.6.6 (default, Mar 29 2019, 00:03:27) \\n[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]'

The code writes some bytes into a file using wb mode and then reads it as r plain text. 该代码使用wb模式将一些字节写入文件,然后将其读取为r纯文本。 I understand that I should be reading as bytes ( rb ), but I am curious why does it break on Linux while passing on Windows? 我知道我应该以字节( rb )的形式读取,但是我很好奇为什么在Windows上传递时它在Linux上会中断?

import os
import tempfile
temp_dir = tempfile.mkdtemp()
temp_file = os.path.join(temp_dir, 'write_file')

expected_bytes = bytearray([123, 3, 255, 0, 100])
with open(temp_file, 'wb') as fh:
    fh.write(expected_bytes)

with open(temp_file, 'r', newline='') as fh:
    actual = fh.read()

Exception raised on Linux: 在Linux上引发的异常:

Traceback (most recent call last):
  File "<input>", line 11, in <module>
  File "/home/.../lib64/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid start byte

Getting default system encoding (with sys.getdefaultencoding() ) shows 'utf-8' on both machines. 获取默认系统编码(使用sys.getdefaultencoding() )在两台计算机上均显示'utf-8'

When opening a file in text mode, so with 'rt' (where both 'r' and 't' are the default), everything you read from the file gets transparently decoded on the fly and returned as str objects, as explained in Text I/O . 当以文本模式打开文件时,使用'rt' (默认为“ r”和“ t”)时,从文件中读取的所有内容都会进行即时透明解码并作为str对象返回,如Text中所述输入/输出

You can force the encoding to use when opening the file, like: 您可以在打开文件时强制使用编码,例如:

f = open("myfile.txt", "r", encoding="utf-8")

As explained in the documenation for open : open文档中所述:

The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. 默认编码取决于平台(无论locale.getpreferredencoding()返回什么),但是可以使用Python支持的任何文本编码。 See the codecs module for the list of supported encodings. 有关支持的编码列表,请参见编解码器模块。

(Note that sys.getdefaultencoding() is something unrelated: it returns the name of the current default string encoding used by the Unicode implementation) (请注意, sys.getdefaultencoding()是无关的:它返回Unicode实现使用的当前默认字符串编码的名称。)

As you stated in the comments, on your system, locale.getpreferredencoding() gives 'cp1252' on Windows and 'UTF-8' on Linux. 如注释中所述,在系统上, locale.getpreferredencoding()在Windows上为'cp1252',在Linux上为'UTF-8'。

CP-1252 is a single byte encoding in which each byte corresponds to a character. CP-1252是单字节编码,其中每个字节对应一个字符。 So, whatever file you read, the data it contains can be turned into a string. 因此,无论您读取什么文件,它包含的数据都可以转换为字符串。

UTF-8 , though, uses a variable width encoding in which not all sequences of bytes are valid and represent a character. 但是, UTF-8使用可变宽度编码,其中并非所有字节序列都有效并且代表字符。 That's why trying to read your file on your Linux system failed when some byte couldn't be decoded. 这就是为什么无法解码某些字节时尝试在Linux系统上读取文件失败的原因。

If you have written the file out as bytes, you should read it in as bytes. 如果已将文件写为字节,则应以字节读入。

f = open("myfile.txt", "rb")

If you read it in as text (using "r" or "rt" ) then an attempt will be made to decode it into Unicode. 如果以文本形式(使用"r""rt" )将其读取,则将尝试将其解码为Unicode。 What encoding is used by default is platform-dependent. 默认情况下使用的编码取决于平台。 But you clearly don't want it decoded at all. 但是您显然根本不希望将其解码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么Windows上的Python无法以二进制模式读取图像? - Why Python on Windows can't read an image in binary mode? 使用OLE工具从Python(Linux)中的VSD(Windows Visio二进制)文件读取数据非常不清楚,是否还有其他方法可以提取数据? - Reading data from a VSD (Windows Visio Binary) File in Python (Linux) with OLE Tools is very unclear, is there any other way to extract the data? 在python中读取二进制文件 - Reading binary file in python 在python中读取二进制文件 - reading a binary file in python 用python读取二进制文件 - Reading a binary file with python 以只读二进制模式打开文件时无法在python中关闭文件 - Unable to close a file in python when opens it in read only binary mode Python - 如何以二进制读取模式打开远程文件? - Python - how to a open remote file in binary read mode? Python:读取文件,直到行与二进制模式下的字符串匹配 - Python: Read a file until a line matches a string in binary mode 如何在Linux中使用严格的python以管理员权限读取二进制文件 - how to read binary file with administrator privileges using strictly python in linux 使用 python/grep 从 Windows 和 Linux 上的二进制文件中提取字符串 - Extract string from a binary file on Windows and Linux using python/grep
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM