简体   繁体   English

Java MergeSort二进制文件

[英]Java MergeSort Binary files

I have several sorted binary files which store information in some variable length format (meaning one of the segments contains the length of the variable length segment). 我有几个分类的二进制文件,这些文件以某种可变长度格式存储信息(这意味着这些段之一包含可变长度段的长度)。

I need to merge them into one sorted file. 我需要将它们合并到一个排序的文件中。 I can do so with BufferedInputStream successfully. 我可以使用BufferedInputStream成功完成此操作。 Nevertheless, it takes very long time on a mechanical disk. 但是,在机械磁盘上​​需要花费很长时间。 On a machine with SSD its much faster, as expected. 在配备SSD的计算机上,其速度比预期的要快得多。

What bothers me is the fact that even on SSD, the CPU utilization is very low, and makes me suspect there's a way to improve the speed. 令我困扰的是,即使在SSD上,CPU利用率也很低,这让我怀疑是否有提高速度的方法。 I assume this happens because most of the time the CPU waits on the disk. 我认为发生这种情况是因为大多数时候CPU等待磁盘。 I tried to increase the buffers to hundreds of MBs to no avail. 我试图将缓冲区增加到数百MB,无济于事。

I have tried to use MemoryMapped buffer and file channel but it didn't improve the runtime. 我试图使用MemoryMapped缓冲区和文件通道,但是并没有改善运行时间。

Any ideas? 有任何想法吗?

Edit: Using MemoryMappedByteBuffer failed because the merged file size is over 2 GB, which is the size limitation of MemoryMappedByteBuffer. 编辑:使用MemoryMappedByteBuffer失败,因为合并的文件大小超过2 GB,这是MemoryMappedByteBuffer的大小限制。 But even before having merged the smaller files into GB files, I didn't notice an improvement in speed or CPU utilization. 但是,即使在将较小的文件合并为GB文件之前,我也没有注意到速度或CPU利用率的提高。

Thanks 谢谢

Perhaps you can compress the files better or is that not an option? 也许您可以更好地压缩文件,或者这不是一种选择吗? If the bottleneck is I/O then reducing the amount is a good attack angle. 如果瓶颈是I / O,则减少数量是一个不错的选择。 http://www.oracle.com/technetwork/articles/java/compress-1565076.html http://www.oracle.com/technetwork/articles/java/compress-1565076.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM