简体   繁体   English

有限量环境中的智能缓冲 memory Java

[英]Smart buffering in an environment with limited amount of memory Java

Dear StackOverflowers,亲爱的 StackOverflowers,

I am in the process of writing an application that sorts a huge amount of integers from a binary file.我正在编写一个从二进制文件中对大量整数进行排序的应用程序。 I need to do it as quickly as possible and the main performance issue is the disk access time, since I make a multitude of reads it slows down the algorithm quite significantly.我需要尽快完成,主要的性能问题是磁盘访问时间,因为我进行了大量读取,它会显着降低算法速度。

The standard way of doing this would be to fill ~50% of the available memory with a buffered object of some sort (BufferedInputStream etc) then transfer the integers from the buffered object into an array of integers (which takes up the rest of free space) and sort the integers in the array. The standard way of doing this would be to fill ~50% of the available memory with a buffered object of some sort (BufferedInputStream etc) then transfer the integers from the buffered object into an array of integers (which takes up the rest of free space ) 并对数组中的整数进行排序。 Save the sorted block back to disk, repeat the procedure until the whole file is split into sorted blocks and then merge the blocks together.将已排序的块保存回磁盘,重复该过程,直到整个文件被拆分为已排序的块,然后将这些块合并在一起。 The strategy for sorting the blocks utilises only 50% of the memory available since the data is essentially duplicated (50% for the cache and 50% for the array while they store the same data).对块进行排序的策略仅使用了可用的 memory 的 50%,因为数据基本上是重复的(50% 用于缓存,50% 用于阵列,而它们存储相同的数据)。

I am hoping that I can optimise this phase of the algorithm (sorting the blocks) by writing my own buffered class that allows caching data straight into an int array, so that the array could take up all of the free space not just 50% of it, this would reduce the number of disk accesses in this phase by a factor of 2. The thing is I am not sure where to start.我希望我可以通过编写我自己的缓冲 class 来优化算法的这个阶段(对块进行排序),它允许将数据直接缓存到一个 int 数组中,这样数组就可以占用所有的可用空间,而不仅仅是 50%它,这将使这个阶段的磁盘访问次数减少 2 倍。问题是我不知道从哪里开始。

EDIT: Essentially I would like to find a way to fill up an array of integers by executing only one read on the file.编辑:本质上我想找到一种方法来填充整数数组,只对文件执行一次读取。 Another constraint is the array has to use most of the free memory.另一个限制是阵列必须使用大部分免费的 memory。

If any of the statements I made are wrong or at least seem to be please correct me,如果我所做的任何陈述是错误的或至少看起来是错误的,请纠正我,

any help appreciated,任何帮助表示赞赏,

Regards问候

when you say limited, how limited... <1mb <10mb <64mb?当您说有限时,有限... <1mb <10mb <64mb?

It makes a difference since you won't actually get much benefit if any from having large BufferedInputStreams in most cases the default value of 8192 (JDK 1.6) is enough and increasing doesn't ussually make that much difference.它有所不同,因为在大多数情况下,如果使用大型BufferedInputStreams并不会真正获得太多好处,那么默认值 8192 (JDK 1.6) 就足够了,而且增加通常不会产生太大的影响。

Using a smaller BufferedInputStream should leave you with nearly all of the heap to create and sort each chunk before writing them to disk.使用较小的BufferedInputStream应该让您几乎可以在将每个块写入磁盘之前创建和排序所有堆。

You might want to look into the Java NIO libraries , specifically File Channels and Int Buffers .您可能想查看Java NIO 库,特别是File ChannelsInt 缓冲区

You dont give many hints.你没有给出很多提示。 But two things come to my mind.但是我想到了两件事。 First, if you have many integers, but not that much distinctive values, bucket sort could be the solution.首先,如果您有很多整数,但没有那么多独特的值,则桶排序可能是解决方案。

Secondly, one word (ok term), screams in my head when I hear that: external tape sorting .其次,当我听到这个词时,一个词(好的术语)在我的脑海中尖叫:外部磁带分类 In early computer days (ie stone age) data relied on tapes, and it was very hard to sort data spread over multiple tapes.在早期的计算机时代(即石器时代),数据依赖于磁带,很难对分布在多个磁带上的数据进行分类。 It is very similar to your situation.这和你的情况非常相似。 And indeed merge sort was the most often used sorting that days, and as far as I remember, Knuths TAOCP had a nice chapter about it.确实,归并排序是当时最常用的排序方式,据我所知,Knuths TAOCP 有一篇关于它的精彩章节。 There might be some good hints about the size of caches, buffers and similar.可能有一些关于缓存、缓冲区和类似大小的好提示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM