简体   繁体   English

Java:优化文件定界符以提高子文档的读取速度

[英]Java: Optimizing file delimiters for the read speed of subdocuments

Say I have a file that has many subdocuments in it 说我有一个包含很多子文档的文件

//file.txt

BEGIN_FILE_1
loremipsumloremipsumloremipsum
loremipsumloremipsum
END_FILE_1

BEGIN_FILE_2
cupcakeipsum
cupcakeipsumcupcakeipsum
END_FILE_2

What kind of delimitation (or some alterate strategy) could be used such that the reads of said subdocuments are fast (ie interpreting the delimitation are fast) but even more importantly, the writing of the subdocument is fast. 可以使用哪种定界(或某些替代策略),以使所述子文档的读取速度快(即,解释该定界速度快),但更重要的是,子文档的编写速度也很快。 Note that the container file will be very large (100MB or so). 请注意,容器文件将非常大(大约100MB)。

I am planning to use FileWriter to write the file. 我打算使用FileWriter写入文件。

Thanks! 谢谢!

Generally, optimal strategy depends on the context - how many sub-documents is there, will each document be written only once or updated/modified, is size of each subdocument known/at least max size of each subdocument known, which operation prevails (for eac h write operation there would be roughly 10 reads, or the opposite)? 通常,最佳策略取决于上下文-存在多少个子文档,每个文档仅被写入一次或更新/修改,每个子文档的大小是否已知/至少每个子文档的最大大小,以哪种操作为准?每次写操作大约将进行10次读取,或者相反)?

On assumption that subdocuments will be added and read but not modified, optimal strategy may be to use header specifying number of files, and line where each file starts/ends inside your file. 假设将添加和读取但未修改子文档,最佳策略可能是使用标头指定文件数,并在文件内每个文件的开始/结束行。 Something like - first line always header, then lines 1..N FILE1, N+1..M FILE2, and so on: 类似于-第一行始终为标头,然后行1..N FILE1,N + 1..M FILE2,依此类推:

NUMBER_OF_FILES FILE1_NAME FILE1_START FILE1_END FILE2_NAME FILE2_START FILE2_END NUMBER_OF_FILES个FILE1_NAME FILE1_START FILE1_END FILE2_NAME FILE2_START FILE2_END

This would allow read contents of any file by parsing header only and reading directly this file instead of searching for file through the document, and writing would require only modifying the header and writing to the end of file. 这将允许仅通过解析标头并直接读取该文件而不是通过文档搜索文件来读取任何文件的内容,而写入仅需要修改标头并写入文件末尾即可。

If files are modified/overwritten but have fixed size, this strategy may still be useful since overwrite operation would be fast 如果文件被修改/覆盖但大小固定,则此策略可能仍然有用,因为覆盖操作会很快

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM