简体   繁体   English

如何读/写特定数量的字节到文件

[英]How to read/write a specific number of bytes to file

I am looking to create a file by structuring it in size blocks. 我想通过按大小块结构来创建文件。 Essentially I am looking to create a rudimentary file system. 本质上,我希望创建一个基本的文件系统。

I need to write a header, and then an "infinite" possible number of entries of the same size/structure. 我需要写一个标题,然后写一个“无限”可能的相同大小/结构的条目。 The important parts are: 重要的部分是:

  • Each block of data needs to be read/writable individually 每个数据块需要单独读取/写入
  • Header needs to be readable/writable as its own entity 标头需要作为其自己的实体可读/写
  • Need a way to store this data and be able to determine its location in the file quickly 需要一种存储此数据并能够快速确定其在文件中位置的方法

The would imagine the file would resemble something like: 可以想象该文件类似于以下内容:

[HEADER][DATA1][DATA2][DATA3][...]

What is the proper way to handle something like this? 处理这样的事情的正确方法是什么? Lets say I want to read DATA3 from the file, how do I know where that data chunk starts? 可以说我想从文件中读取DATA3,我怎么知道该数据块从哪里开始?

If I understand you correctly and you need a way to assign a kind of names/IDs to your DATA chunks, you can try to introduce yet another type of chunk. 如果我对您的理解正确,并且需要一种为DATA块分配一种名称/ ID的方法,则可以尝试引入另一种类型的块。

Let's call it TOC (table of contents). 我们称其为TOC (目录)。 So, the file structure will look like [HEADER][TOC1][DATA1][DATA2][DATA3][TOC2][...] . 因此,文件结构看起来像[HEADER][TOC1][DATA1][DATA2][DATA3][TOC2][...]

TOC chunk will contain names/IDs and references to multiple DATA chunks. TOC块将包含名称/ ID和对多个DATA块的引用。 Also, it will contain some internal data such as pointer to the next TOC chunk (so, you might consider each TOC chunk as a linked-list node). 而且,它将包含一些内部数据,例如指向下一个TOC块的指针(因此,您可以将每个TOC块视为一个链表节点)。

At runtime all TOC chunks could be represented as a kind of HashMap , where key is a name/ID of the DATA chunk and value is its location in the file. 在运行时,所有TOC块都可以表示为一种HashMap ,其中key是DATA块的名称/ ID,值是其在文件中的位置。

We can store in the header the size of chunk. 我们可以在标题中存储块的大小。 If the size of chunks are variable, you can store pointers which points to actual chunk. 如果块的大小可变,则可以存储指向实际块的指针。 An interesting design for variable size is in postgres heap file page. postgres堆文件页面中有一个有趣的可变大小设计。 http://doxygen.postgresql.org/bufpage_8h_source.html http://doxygen.postgresql.org/bufpage_8h_source.html

I am working in reverse but this may help. 我正在反向工作,但这可能会有所帮助。

I write decompilers for binary files. 我为二进制文件编写反编译器。 Generally there is a fixed header of a known number of bytes. 通常,存在一个已知字节数的固定标头。 This contains specific file identification so we can recognize the file type we are dealing with. 这包含特定的文件标识,因此我们可以识别正在处理的文件类型。

Following that will be a fixed number of bytes containing the number of sections (groups of data) This number then tells us how many data pointers there will be. 接下来是一个固定的字节数,其中包含节(数据组)的数目。该数目然后告诉我们将有多少个数据指针。 Each data pointer may be four bytes (or whatever you need) representing the start of the data block. 每个数据指针可以是代表数据块开始的四个字节(或您需要的任何字节)。 From this we can work out the size of each block. 由此我们可以算出每个块的大小。 The decompiler then reads the blocks one at a time to get the size and location in the file of each data block. 然后,反编译器一次读取一个块,以获取每个数据块文件中的大小和位置。 The job then is to extract that block of bytes and do whatever is needed. 然后的工作是提取该字节块并执行所需的任何操作。

We step through the file one block at a time. 我们一次一步地浏览文件。 The size of the last block is the start pointer to the end of the file. 最后一块的大小是指向文件末尾的开始指针。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM