用python读取二进制文件

Question

I find particularly difficult reading binary file with Python.我发现用 Python 读取二进制文件特别困难。 Can you give me a hand?你能帮我个忙吗？ I need to read this file, which in Fortran 90 is easily read by我需要阅读这个文件，它在 Fortran 90 中很容易被阅读

int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
read (*) (group_id(j),j=1,n_particles)

In detail, the file format is:详细来说，文件格式为：

Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N.

How can I read this with Python?我如何用 Python 读取它？ I tried everything but it never worked.我尝试了一切，但从未奏效。 Is there any chance I might use a f90 program in python, reading this binary file and then save the data that I need to use?我有没有可能在 python 中使用 f90 程序，读取这个二进制文件，然后保存我需要使用的数据？

Answer 1

Read the binary file content like this:像这样读取二进制文件内容：

with open(fileName, mode='rb') as file: # b is important -> binary
    fileContent = file.read()

then "unpack" binary data using struct.unpack :然后使用struct.unpack “解压”二进制数据：

The start bytes: struct.unpack("iiiii", fileContent[:20])起始字节： struct.unpack("iiiii", fileContent[:20])

The body: ignore the heading bytes and the trailing byte (= 24);正文：忽略标题字节和尾随字节（= 24）； The remaining part forms the body, to know the number of bytes in the body do an integer division by 4;剩下的部分构成正文，要知道正文中的字节数，进行整数除以 4； The obtained quotient is multiplied by the string 'i' to create the correct format for the unpack method:获得的商乘以字符串'i'以创建 unpack 方法的正确格式：

struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])

The end byte: struct.unpack("i", fileContent[-4:])结束字节： struct.unpack("i", fileContent[-4:])

Answer 2

In general, I would recommend that you look into using Python's struct module for this.一般来说，我建议您为此考虑使用 Python 的struct模块。 It's standard with Python, and it should be easy to translate your question's specification into a formatting string suitable for struct.unpack() .它是 Python 的标准，应该很容易将您的问题规范转换为适合struct.unpack()的格式字符串。

Do note that if there's "invisible" padding between/around the fields, you will need to figure that out and include it in the unpack() call, or you will read the wrong bits.请注意，如果字段之间/周围有“不可见”的填充，您需要弄清楚这一点并将其包含在unpack()调用中，否则您将读取错误的位。

Reading the contents of the file in order to have something to unpack is pretty trivial:读取文件的内容以便解压是非常简单的：

import struct

data = open("from_fortran.bin", "rb").read()

(eight, N) = struct.unpack("@II", data)

This unpacks the first two fields, assuming they start at the very beginning of the file (no padding or extraneous data), and also assuming native byte-order (the @ symbol).这将解压缩前两个字段，假设它们从文件的最开头开始（没有填充或无关数据），并且还假设本机字节顺序（ @符号）。 The I s in the formatting string mean "unsigned integer, 32 bits".格式化字符串中的I表示“无符号整数，32 位”。

Answer 3

To read a binary file to a bytes object:要将二进制文件读入bytes对象：

from pathlib import Path
data = Path('/path/to/file').read_bytes()  # Python 3.5+

To create an int from bytes 0-3 of the data:要从数据的字节 0-3 创建一个int ：

i = int.from_bytes(data[:4], byteorder='little', signed=False)

To unpack multiple int s from the data:要从数据中解压缩多个int ：

import struct
ints = struct.unpack('iiii', data[:16])

Answer 4

You could use numpy.fromfile , which can read data from both text and binary files.您可以使用numpy.fromfile ，它可以从文本文件和二进制文件中读取数据。 You would first construct a data type, which represents your file format, using numpy.dtype , and then read this type from file using numpy.fromfile .您将首先使用numpy.fromfile numpy.dtype文件中读取此类型。

Answer 5

I too found Python lacking when it comes to reading and writing binary files, so I wrote a small module (for Python 3.6+).我也发现 Python 在读取和写入二进制文件方面缺乏，所以我写了一个小模块（用于 Python 3.6+）。

With binaryfile you'd do something like this (I'm guessing, since I don't know Fortran):使用binaryfile你会做这样的事情（我猜，因为我不知道 Fortran）：

import binaryfile

def particle_file(f):
    f.array('group_ids')  # Declare group_ids to be an array (so we can use it in a loop)
    f.skip(4)  # Bytes 1-4
    num_particles = f.count('num_particles', 'group_ids', 4)  # Bytes 5-8
    f.int('num_groups', 4)  # Bytes 9-12
    f.skip(8)  # Bytes 13-20
    for i in range(num_particles):
        f.struct('group_ids', '>f')  # 4 bytes x num_particles
    f.skip(4)

with open('myfile.bin', 'rb') as fh:
    result = binaryfile.read(fh, particle_file)
print(result)

Which produces an output like this:产生这样的输出：

{
    'group_ids': [(1.0,), (0.0,), (2.0,), (0.0,), (1.0,)],
    '__skipped': [b'\x00\x00\x00\x08', b'\x00\x00\x00\x08\x00\x00\x00\x14', b'\x00\x00\x00\x14'],
    'num_particles': 5,
    'num_groups': 3
}

I used skip() to skip the additional data Fortran adds, but you may want to add a utility to handle Fortran records properly instead.我使用 skip() 来跳过 Fortran 添加的其他数据，但您可能希望添加一个实用程序来正确处理 Fortran 记录。 If you do, a pull request would be welcome.如果你这样做了，欢迎提出拉取请求。

Answer 6

If the data is array-like, I like to use numpy.memmap to load it.如果数据是类似数组的，我喜欢使用numpy.memmap来加载它。

Here's an example that loads 1000 samples from 64 channels, stored as two-byte integers.下面是一个示例，它从 64 个通道加载 1000 个样本，存储为两字节整数。

import numpy as np
mm = np.memmap(filename, np.int16, 'r', shape=(1000, 64))

You can then slice the data along either axis:然后，您可以沿任一轴对数据进行切片：

mm[5, :] # sample 5, all channels
mm[:, 5] # all samples, channel 5

All the usual formats are available, including C- and Fortran-order, various dtypes and endianness, etc.所有常用格式都可用，包括 C 和 Fortran 顺序、各种数据类型和字节顺序等。

Some advantages of this approach:这种方法的一些优点：

No data is loaded into memory until you actually use it (that's what a memmap is for).在您实际使用数据之前，不会将数据加载到内存中（这就是 memmap 的用途）。
More intuitive syntax (no need to generate a struct.unpack string consisting of 64000 character)更直观的语法（无需生成由 64000 个字符组成的 struct.unpack 字符串）
Data can be given any shape that makes sense for your application.可以为数据赋予对您的应用程序有意义的任何形状。

For non-array data (eg, compiled code), heterogeneous formats ("10 chars, then 3 ints, then 5 floats, ..."), or similar, one of the other approaches given above probably makes more sense.对于非数组数据（例如编译代码）、异构格式（“10 个字符，然后 3 个整数，然后 5 个浮点数，...”）或类似的，上面给出的其他方法之一可能更有意义。

Answer 7

#!/usr/bin/python

import array
data = array.array('f')
f = open('c:\\code\\c_code\\no1.dat', 'rb')
data.fromfile(f, 5)
print(data)

Answer 8

import pickle
f=open("filename.dat","rb")
try:
    while True:
        x=pickle.load(f)
        print x
except EOFError:
    pass
f.close()

用python读取二进制文件

问题描述

8 个解决方案

解决方案1
201 已采纳 2012-01-03 10:46:19

解决方案2
24 2012-01-03 10:18:02

解决方案3
22 2018-11-02 21:00:00

解决方案4
18 2012-01-03 10:41:29

解决方案5
1 2020-07-10 18:27:50

解决方案6
0 2022-12-15 20:02:45

解决方案7
-2 2022-03-22 15:53:51

解决方案8
-3 2017-12-13 16:26:16

用python读取二进制文件

问题描述

8 个解决方案

解决方案1 201 已采纳 2012-01-03 10:46:19

解决方案2 24 2012-01-03 10:18:02

解决方案3 22 2018-11-02 21:00:00

解决方案4 18 2012-01-03 10:41:29

解决方案5 1 2020-07-10 18:27:50

解决方案6 0 2022-12-15 20:02:45

解决方案7 -2 2022-03-22 15:53:51

解决方案8 -3 2017-12-13 16:26:16

解决方案1
201 已采纳 2012-01-03 10:46:19

解决方案2
24 2012-01-03 10:18:02

解决方案3
22 2018-11-02 21:00:00

解决方案4
18 2012-01-03 10:41:29

解决方案5
1 2020-07-10 18:27:50

解决方案6
0 2022-12-15 20:02:45

解决方案7
-2 2022-03-22 15:53:51

解决方案8
-3 2017-12-13 16:26:16