简体   繁体   English

解析/跳转带有大小元素的大型二进制文件的最快方法

[英]fastest way to parse/jump through large binary file with sized elements

I need to parse binary files which contain a sequence of elements. 我需要解析包含一系列元素的二进制文件。 The format of an element is as follows: 4 bytes: name of element 4 bytes: size of the element variable size: data for the element 元素的格式如下:4字节:元素名称4字节:元素变量的大小大小:元素的数据

I just need to parse through the file and extract the name, position and size of each element. 我只需要解析文件并提取每个元素的名称,位置和大小。 Typical element size is around 100kb, and typical file size is around 10GB. 典型的元素大小约为100kb,典型的文件大小约为10GB。

What is the fastest way of going through such a file? 处理此类文件的最快方法是什么? Read all of the file's data, seek to the next element, other approach? 读取文件的所有数据,寻求下一个元素,其他方法?

Does it make a difference if the file is local or over the network? 文件是本地文件还是网络文件,会有所不同吗?

The one thing you do not want to do is to use unbuffered reads (ie OS calls) to read every individual element. 您不想做的一件事是使用无缓冲读取(即OS调用)来读取每个单独的元素。 You can get an OK performance by the naive approach of buffered reads. 您可以通过幼稚的缓冲读取方法获得不错的性能。 If memory is not a concern whatsoever, you might squeeze some time by using memory-mapped files, and having a pre-fetcher thread to populate the mapping. 如果不考虑内存问题,则可以通过使用内存映射文件并使用预取线程填充映射来节省一些时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM