简体   繁体   English

具有Java库的随机访问结构化归档文件格式

[英]Random-access structured archive file format with Java library

My team and I have been given a requirement of a file format, with Java library support, that holds various metadata about some larger file. 我和我的团队被要求具有Java库支持的文件格式,其中包含有关某个较大文件的各种元数据。 In fact, the powers that be would like us to wrap the large file (maybe 100MB) and the other related file (metadata, non-destructive edits, etc.) into one bundled archive file. 实际上,我们希望将大文件(可能是100MB)和其他相关文件(元数据,无损编辑等)包装到一个捆绑的存档文件中。

For a one-off creation, that's a breeze: just throw everything in a Zip file. 对于一次性创建,这很容易:只需将所有内容都放入Zip文件即可。 But we want to be able to constantly update the metadata, the non-destructive edits, etc. We don't want dump the whole >100MB contents to a temporary directory and then zip everything back up just to add a line to one of the metadata files. 但是我们希望能够不断更新元数据,非破坏性编辑等。我们不希望将整个100MB以上的内容转储到一个临时目录中,然后将所有内容压缩在一起只是为了在其中一个添加一行元数据文件。

There are some projects (eg TrueVFS ) that on the surface sound ideal by claiming to abstract a zip file or other archive file format as a file system. 有一些项目(例如TrueVFS )通过声称将zip文件或其他存档文件格式抽象为文件系统而在表面上听起来很理想。 But on closer inspection it would seem that the only in-place update functionality we get is simple appending new files and not actually changing or appending to individual files. 但是仔细检查看来,我们获得的唯一就地更新功能只是简单地添加新文件,而不是实际更改或附加到单个文件。

What we need is some file format that's in between a Zip file and a relational database. 我们需要的是Zip文件和关系数据库之间的某种文件格式。 Something with a hierarchical structure would be great. 具有层次结构的东西会很棒。 It must efficiently support reasonably large files (over 100MB) and allow random access to add, remove, and change individual files within the archive. 它必须有效地支持相当大的文件(超过100MB),并允许随机访问来添加,删除和更改存档中的单个文件。 I was surprised to be unable to find anything. 我很惊讶找不到任何东西。 Any suggestions? 有什么建议么?

PS I've had bad experiences years ago with the Microsoft compound file format getting corrupted. PS我几年前经历过糟糕的经历,因为Microsoft复合文件格式被破坏了。 I don't know if something like Apache POIFS is reliable and efficient with large files. 我不知道像Apache POIFS这样的文件对于大型文件是否可靠且高效。

I do not believe that what you are asking for is easily doable for one simple reason: filesystems do not generally support inserting data in the middle of a file - not without truncating and rewriting the remainder. 我不相信您的要求很容易实现,原因有一个简单的原因:文件系统通常不支持在文件中间插入数据-不能不截断并重写其余部分。 This means that a simple append on a plain file turns into a truncate-rewrite operation when that file is stored in an archive. 这意味着,当一个普通文件上的简单追加存储在归档文件中时,它就会变成截断-重写操作。

You would have to find some block-based format that would essentially replicate much of the functionality of an actual filesystem, in order to allow such operations. 您将必须找到某种基于块的格式,该格式本质上将复制实际文件系统的许多功能,以便允许此类操作。

I would look at refactoring the whole system to enforce some structure on that big data file. 我将研究整个系统的重构,以在该大数据文件上实施某种结构。 That would allow you to turn it into something that can be stored in a database. 这样您就可以将其转换为可以存储在数据库中的内容。 For example, line based text could be stored in a table with two columns - a line number as a primary key and the line text. 例如,基于行的文本可以存储在具有两列的表中-行号作为主键,行文本。 Any line based operation would easily turn into a DB-based one. 任何基于行的操作都很容易变成基于数据库的操作。

You could then just use an embedded database such as SQLite to keep everything in the same file without depending on an external server. 然后,您可以仅使用嵌入式数据库(例如SQLite)将所有内容保留在同一文件中,而无需依赖外部服务器。

Depending on what platforms you want to run your application on, you can use our Solid File System - this is a virtual file system backed by an automatically resizable container file. 根据您要在其上运行应用程序的平台,可以使用我们的Solid File System-这是一个虚拟文件系统,由可自动调整大小的容器文件支持。 It's written in Ansi C and has Java JNI wrapper for Android (and this wrapper can be brought to other platforms on request - we just didn't have such goal before). 它是用Ansi C编写的,并且具有适用于Android的Java JNI包装器(并且该包装器可以应要求提供给其他平台-我们之前没有这个目标)。

There also exists Codebase File System, which as I understand also offers a JNI for Java. 还存在代码库文件系统,据我所知,它还提供了Java的JNI。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM