简体   繁体   English

快速'记录更新'到二进制文件?

[英]Fast 'Record Update' To Binary Files?

I have 3000 binary files (each of size 40[MB]) of known format (5,000,000 'records' of 'int32,float32' each). 我有3000个已知格式的二进制文件(每个大小为40 [MB])(每个都有5,000,000''记录''int32,float32')。 they were created using numpy tofile() method. 它们是使用numpy tofile()方法创建的。

A method that I use, WhichShouldBeUpdated() , determines which file (out of the 3000) should be updated, and also, which records in this file should be changed. 我使用的方法, WhichShouldBeUpdated() ,确定应该更新哪个文件(3000个),以及应该更改此文件中的哪些记录。 The method's output is the following: 方法的输出如下:

(1) path_to_file_name_to_update (1) path_to_file_name_to_update

(2) a numpy record array with N records ( N is the number of records to update), in the following format: [(recordID1, newIntValue1, newFloatValue1), (recordID2, newIntValue2, newFloatValue2), .....] (2)具有N条记录的numpy记录数组( N是要更新的记录数),格式如下: [(recordID1, newIntValue1, newFloatValue1), (recordID2, newIntValue2, newFloatValue2), .....]

As can be seen: 可以看出:

(1) the file to update is known only at running time (1)要更新的文件仅在运行时才知道

(2) the records to update are also only known at running time (2)更新的记录也只在运行时知道

what would be the most efficient approach to updating the file with the new values for the records? 使用记录的新值更新文件的最有效方法是什么?

Since the records are of fixed length you can just open the file and seek to the position, which is a multiple of the record size and record offset. 由于记录具有固定长度,您只需打开文件并seek位置,该位置是记录大小和记录偏移的倍数。 To encode the ints and floats as binary you can use struct.pack . 要将int和float作为二进制编码,可以使用struct.pack Update : Given that the files are originally generated by numpy, the fastest way may be numpy.memmap . 更新 :鉴于文件最初是由numpy生成的,最快的方法可能是numpy.memmap

You're probably not interested in data conversion, but I've had very good experiences with HDF5 and pytables for large binary files. 您可能对数据转换不感兴趣,但我对HDF5和大型二进制文件的pytables有很好的体验。 HDF5 is designed for large scientific data sets, so it is quick and efficient. HDF5专为大型科学数据集而设计,因此速度快,效率高。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM