简体繁体 English

在二进制文件中查找最接近的值

[英]Finding closest value in a binary file

原文 2017-05-19 16:39:41 1 1 python

I have a large binary file (~4 GB) containing a series of image and time stamp data. 我有一个大的二进制文件（〜4 GB），其中包含一系列图像和时间戳数据。 I want to find the image that most closely corresponds to a user-given time stamp. 我想找到与用户给定的时间戳最接近的图像。 There are millions of time stamps in the file, though. 但是，文件中有数百万个时间戳。 In Python 2.7, using seek, read, struct.unpack, it took over 900 seconds just to read all the time stamps into an array. 在Python 2.7中，使用seek，read，struct.unpack仅仅花了900秒以上的时间才将所有时间戳读取到一个数组中。 Is there an efficient algorithm for finding the closest value that doesn't require reading all of the values? 是否有一种不需要查找所有值的有效算法来查找最接近的值？ They monotonically increase, though at very irregular intervals. 它们以非常不规则的间隔单调增加。

1 个解决方案

First attempt. 第一次尝试。 It works, seemingly every time, but I don't know if it's the most efficient way: 它似乎每次都有效，但是我不知道这是否是最有效的方法：

Take first and last time stamps and number of frames to calculate an average time step. 取第一个和最后一个时间戳和帧数以计算平均时间步长。

Use average time step and difference between target and beginning timestamps to find approximate index. 使用平均时间步长和目标时间戳记与开始时间戳记之间的差来找到近似索引。

Check for approximate and 2 surrounding timestamps against target. 根据目标检查大概的时间戳和2个周围的时间戳。

If target falls between, then take index with minimum difference. 如果目标介于两者之间，则取差异最小的指标。 If not, set approximate index as new beginning or end, accordingly, and repeat. 如果不是，则相应地将近似索引设置为新的开始或结束，然后重复。