简体   繁体   English

Java memory 优化了 [Key:Long, Value:Long] 存储非常大(500M)的并发读取访问

[英]Java memory optimized [Key:Long, Value:Long] store of very large size (500M) for concurrent read-access

I have a use-case where I need to store Key - Value pairs of size approx.我有一个用例,我需要存储大小约为的键值对。 500 Million entries in sinle VM of size 8 GB.大小为 8 GB 的单个 VM 中有 5 亿个条目。 Key and Value are of type Long. Key 和 Value 的类型为 Long。 Key is auto incremented starting from 1, 2,3, so on..键从 1、2、3 开始自动递增,依此类推..

Only once I build this Map[KV] structure at the start of program as a exclusive operation, Once this is build, used only for lookup, No update or delete is performed in this structure.只有一次我在程序开始时将这个 Map[KV] 结构构建为独占操作,一旦构建,仅用于查找,在此结构中不执行更新或删除。

I have tried this with java.util.hashMap but as expected it consumes a lot of memory and program give OOM: Heap usage exceeds Error.我已经用 java.util.hashMap 试过这个,但正如预期的那样,它消耗了大量的 memory 并且程序给出了 OOM:堆使用超过错误。

I need some guidance on following which helps in reducing the memory footprint, I am Ok with some degradation in access performance.我需要一些有助于减少 memory 占用空间的指导,但访问性能会有所下降。

  1. What are the other alternative (from java collection or other libraries) that can be tried here.可以在此处尝试的其他替代方案(来自 java 集合或其他库)是什么。
  2. What is a recommended way to get the memory footprint by this Map, for comparison purpose.出于比较目的,通过此 Map 获得 memory 足迹的推荐方法是什么。

Just use a long[] or long[][] .只需使用long[]long[][]

500 million ascending keys is less than 2^31. 5 亿个升序键小于 2^31。 And if you go over 2^31, use a long[][] where the first dimension is small and the second one is large.如果您的 go 超过 2^31,请使用long[][] ,其中第一个维度很小,第二个维度很大。

(When the key type is an integer, you only need a complicated "map" data structure if the key space is sparse.) (当键类型为 integer 时,如果键空间稀疏,则只需要复杂的“映射”数据结构。)

The space wastage in a 1D array is insignificant.一维数组中的空间浪费是微不足道的。 Every Java array node has 12 byte header, and the node size is rounded up to a multiple of 8 bytes.每个Java数组节点有12字节header,节点大小向上取整为8字节的倍数。 So a 500 million entry long[] will take so close to 500 million x 8 bytes == 4 billion bytes that it doesn't matter.因此,一个 5 亿个条目long[]将需要接近 5 亿 x 8 字节 == 40 亿字节,这无关紧要。

However, a JVM typically cannot allocate a single object that takes up the entire available heap space.但是,JVM 通常不能分配占用整个可用堆空间的单个 object。 If virtual address space is at a premium, it would be advisable to use a 2-D array;如果虚拟地址空间非常宝贵,建议使用二维数组; eg new long[4][125_000_000] .例如new long[4][125_000_000] This makes the lookups slightly more complicated, but you will most likely reduce the memory footprint by doing this.这会使查找稍微复杂一些,但这样做很可能会减少memory 的占用空间。


If you don't know beforehand the number of keys to expect, you could do the same thing with a combination of arrays and ArrayList objects.如果您事先不知道预期的键数,您可以使用 arrays 和ArrayList对象的组合来做同样的事情。 But an ArrayList has the problem that if you don't set an (accurate) capacity, the memory utilization is liable to be suboptimal.但是ArrayList的问题是,如果您不设置(准确的)容量,则 memory 的利用率可能会不理想。 And if you populate an ArrayList by appending to it, the instantaneous memory demand for the append can be as much as 3 times the list's current space usage.如果您通过附加来填充ArrayList ,则对append的瞬时 memory 需求可能是列表当前空间使用量的 3 倍。

There is no reason for using a Map in your case.在您的情况下,没有理由使用Map

If you just have a start index and further indizes are just constant increments, just use a List :如果您只有一个起始索引并且进一步的索引只是恒定增量,只需使用List

List<Long> data=new ArrayList<>(510_000_000);//capacity should ideally not be reached, if it is reached, the array behind the ArrayList needs to be reallocated, the allocated memory would be doubled by that

data.add(1337L);//inserting, how often you want

long value=data.get(1-1);//1...your index that starts with 1, -1...because your index starts with 1, you should subtract one from the index.

If you don't even add more elements and know the size from the start, an array will be even better:如果您甚至不添加更多元素并且从一开始就知道大小,那么数组会更好:

long[] data=long[510_000_000];//capacity should surely not be reached, you will need to create a new array and copy all data if it is higher
int currentIndex=0;

data[currentIndex++]=1337L//inserting, as often as it is smaller than the size

long value=data[1-1];//1...your index that starts with 1, -1...because your index starts with 1, you should subtract one from the index.

Note that you should check the index ( currentIndex ) before inserting so that it is smaller than the array length.请注意,您应该在插入之前检查索引( currentIndex ),使其小于数组长度。

When iterating, use currentIndex+1 as length instead of .length .迭代时,使用currentIndex+1作为长度而不是.length

Create an array with the size you need and whenever you need to access it, use arr[i-1] ( -1 because your indizes start with 1 instead of zero).创建一个具有您需要的大小的数组,并在您需要访问它时使用arr[i-1]-1因为您的 indizes 以1而不是零开头)。

If you "just" have 500 million entries, you will not reach the integer limit and a simple array will be fine.如果你“只是”有 5 亿个条目,你将不会达到 integer 限制,一个简单的数组就可以了。

If you need more entries and you have sufficient memories, use an array of arrays.如果您需要更多条目并且有足够的内存,请使用 arrays 数组。

The memory footprint of using an array this big is the memory footprint of the data and a bit more.使用这么大的阵列的 memory 占用空间是数据的 memory 占用空间等等。

However, if you don't know the size, you should use a higher length/capacity then you may need.但是,如果您不知道尺寸,则应使用可能需要的更高长度/容量。 If you use an ArrayList , the memory footprint will be doubled (temporarily tripled) whenever the capacity is reached because it needs to allocate a bigger array.如果您使用ArrayList ,则只要达到容量,memory 的占用空间就会增加一倍(暂时增加三倍),因为它需要分配更大的阵列。

A Map would need an object for each entry and an array of lists for all those object that would highly increase the memory footprint. Map需要每个条目的 object 和所有那些 object 的列表数组,这将大大增加 ZCD69B4957F06CD8298D7BF 占用空间。 The increasing of the memory footprint (using HashMap ) is even worse than with ÀrrayList s as the underlaying array is reallocated even if the Map is not completely filled up. memory 占用空间的增加(使用HashMap )甚至比使用ÀrrayList更糟糕,因为即使Map未完全填满,也会重新分配底层数组。

But consider saving it to the HDD/SSD if you need to store that much data.但如果您需要存储这么多数据,请考虑将其保存到 HDD/SSD。 In most cases, this works much better.在大多数情况下,这会更好。 You can use RandomAccessFile in order to access the data on the HDD/SSD on any point.您可以使用RandomAccessFile在任何点访问 HDD/SSD 上的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM