简体   繁体   English

使用 Numpy fromfile 和给定的偏移量读取二进制文件

[英]Read a binary file using Numpy fromfile and a given offset

I have a binary file which contains records of position of a plane.我有一个二进制文件,其中包含一架飞机的 position 记录。 Each record look like:每条记录看起来像:

0x00: Time, float32
0x04: X, float32 // X axis position
0x08: Y, float32 // Y axis position
0x0C: Elevation, float32
0x10: float32*4 = Quaternion (x,y,z axis and w scalar)
0x20: Distance, float32 (unused)

So each record is 32 bytes long.所以每条记录都是 32 字节长。

I would like to get a Numpy array.我想得到一个 Numpy 数组。

At offset 1859 there is an unsigned int 32 (4 bytes) which indicates the number of elements of the array.在偏移量 1859 处有一个 unsigned int 32(4 个字节),它指示数组的元素数。 12019 in my case. 12019 在我的例子中。

I don't care (for now) header data (before offset 1859)我不关心(现在)header 数据(偏移量 1859 之前)

Array only start at offset 1863 (=1859+4).数组仅从偏移量 1863 (=1859+4) 开始。

I defined my own Numpy dtype like我定义了自己的 Numpy dtype

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

And I'm reading file using fromfile :我正在使用fromfile读取文件:

a_bytes = np.fromfile(filename, dtype=dtype)

But I don't see any parameter to provide to fromfile to pass offset.但我没有看到任何参数提供给fromfile来传递偏移量。

You can open the file with a standard python file open, then seek to skip the header, then pass in the file object to fromfile . 您可以在打开标准python文件的情况下打开文件,然后尝试跳过标头,然后将文件对象传递给fromfile Something like this: 像这样:

import numpy as np
import os

dtype = np.dtype([
    ("time", np.float32),
    ("PosX", np.float32),
    ("PosY", np.float32),
    ("Alt", np.float32),
    ("Qx", np.float32),
    ("Qy", np.float32),
    ("Qz", np.float32),
    ("Qw", np.float32),
    ("dist", np.float32),
])

f = open("myfile", "rb")
f.seek(1863, os.SEEK_SET)

data = np.fromfile(f, dtype=dtype)
print x 

I faced a similar problem, but none of the answers above satisfied me. 我遇到了类似的问题,但是上面的答案都没有令我满意。 I needed to implement something like virtual table with a very big number of binary records that potentially occupied more memory than I can afford in one numpy array. 我需要用大量二进制记录来实现类似虚拟表的东西,这些二进制记录可能占用比我在一个numpy数组中所能承受的更多的内存。 So my question was how to read and write a small set of integers from/to a binary file - a subset of a file into a subset of numpy array. 因此,我的问题是如何从二进制文件读取一小部分整数并将其写入二进制文件-一个文件的子集到numpy数组的子集。

This is a solution that worked for me: 这是对我有用的解决方案:

import numpy as np
recordLen = 10 # number of int64's per record
recordSize = recordLen * 8 # size of a record in bytes
memArray = np.zeros(recordLen, dtype=np.int64) # a buffer for 1 record

# Create a binary file and open it for write+read
with open('BinaryFile.dat', 'w+b') as file:
    # Writing the array into the file as record recordNo:
    recordNo = 200 # the index of a target record in the file
    file.seek(recordSize * recordNo)
    bytes = memArray.tobytes()
    file.write(bytes)

    # Reading a record recordNo from file into the memArray
    file.seek(recordSize * recordNo)
    bytes = file.read(recordSize)
    memArray = np.frombuffer(bytes, dtype=np.int64).copy()
    # Note copy() added to make the memArray mutable

I suggest using numpy frombuffer:我建议使用 numpy frombuffer:

with open(file_path, 'rb') as file_obj:
    file_obj.seek(seek_to_position)
    data_ro = np.frombuffer(file_obj.read(total_num_bytes), dtype=your_dtype_here)
    data_rw = data_ro.copy() #without copy(), the result is read-only

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM