从HashMaps中删除未使用的已分配内存

Question

I want to read some XML-files and convert it to a graph (no graphics, just a model). 我想读取一些XML文件并将其转换为图形（没有图形，只是一个模型）。 But because the files are very large (2,2 GB) my model object, which holds all the information, becomes even larger (4x the size of the file...). 但是由于文件很大（2.2 GB），所以包含所有信息的模型对象变得更大（文件大小的4倍...）。

Googling through the net I tried to find ways to reduce the object size. 在网上搜寻时，我试图找到减小对象大小的方法。 I tried different collection types but would like to stick to a HashMap (because I have to have random access). 我尝试了不同的集合类型，但想坚持使用HashMap（因为我必须具有随机访问权限）。 The actuall keys and values make up just a small amount of the allocated memory. 实际的键和值仅占分配的内存的一小部分。 Most of the hash table is empty... 大多数哈希表是空的...

If I'm not totally wrong a garbage collection doesn't help me to free the allocated memory and reduce the size of the hashmap. 如果我不是完全错误，那么垃圾回收并不能帮助我释放分配的内存并减小哈希图的大小。 Is there and other way to release unused memory and shrink the hashmap? 是否有其他方法释放未使用的内存并缩小哈希图？ Or is there a way to do perfect hashing? 还是有办法进行完美的哈希处理？ Or shoud I just use another collection? 还是应该使用其他收藏集？

Thanks in advance, 提前致谢，

Sebastian 塞巴斯蒂安

Answer 1

A HashMap is typically just a large array of references filled to a certain percentage of capacity. HashMap通常只是大量引用，填充了一定百分比的容量。 If only 80% of the map is filled, the remaining 20% of the array cells are unused (ie, are null). 如果仅填充80％的映射，则其余20％的阵列单元未使用（即为空）。 The extra overhead is really only just the empty (null) cells. 额外的开销实际上只是空（null）单元格。

On a 32-bit CPU, each array cell is usually 4 bytes in size (although some JVM implementations may allocate 8 bytes). 在32位CPU上，每个数组单元的大小通常为4个字节（尽管某些JVM实现可能分配8个字节）。 That's not really that much unused space overall. 总体而言，这并不是真正的未使用空间。

Once your map is filled, you can copy it to another HashMap with a more appropriate (smaller) size giving a larger fill percentage. 填满地图后，您可以将其复制到另一个更合适（更小）的HashMap ，以提供更大的填充百分比。

Your question seems to imply that there are more allocated but unused objects that you're worried about. 您的问题似乎暗示您担心的还有更多已分配但未使用的对象。 But how is that the case? 但是，情况如何呢？

Addendum 附录

Once a map is filled almost to capacity (typically more than 95% or so), a larger array is allocated, the old array's contents are copied to the new array, and then the smaller array is left to be garbage collected. 一旦映射几乎被填满（通常超过95％），就会分配一个较大的数组，将旧数组的内容复制到新数组中，然后将较小的数组保留为垃圾回收。 This is obviously an expensive operation, so choosing a reasonably large initial size for the map is key to improving performance. 这显然是一项昂贵的操作，因此为地图选择合理的初始大小对于提高性能至关重要。

If you can (over)estimate the number of cells needed, preallocating a map can reduce or even eliminate the resizing operations. 如果可以（过度）估计所需的像元数，则预分配映射可以减少甚至消除调整大小的操作。

Answer 2

What you are asking is not so clear, it is not clear if memory is taken by the objects that you put inside the hasmap or by the hashmap itself, which shouldn't be the case since it only holds references. 您要问的内容不是很清楚，不清楚是由您放置在hasmap内的对象还是由hashmap本身占用了内存，事实并非如此，因为它仅保存引用。

In any case take a look at the WeakHashMap , maybe it is what you are looking for: it is an hashmap which doesn't guarantee that keys are kept inside it, it should be used as a sort of cache but from your description I don't really know if it is your case or not. 无论如何，请查看WeakHashMap ，也许它就是您要寻找的东西：它是一个哈希表，它不能保证密钥保留在其中，它应被用作一种缓存，但是根据您的描述，我不知道真的不知道是否是您的情况。

Answer 3

If you get nowhere with reducing the memory footprint of your hashmap, you could always put the data in a database. 如果无法减少哈希表的内存占用量，则始终可以将数据放入数据库中。 Depending on how the data is accessed, you might still get reasonable performance if you introduce a cache in front of the db. 根据访问数据的方式，如果在数据库前面引入缓存，您可能仍会获得合理的性能。

Answer 4

One thing that might come into play is that you might have substrings that are referencing old larger strings, and those substrings are then making it impossible for the GC to collect the char arrays that are too big. 可能起作用的一件事是，您可能有引用旧的较大字符串的子字符串，然后这些子字符串使GC无法收集太大的char数组。

This happens when you are using some XML parsers that are returning attributes/values as substring from a larger string. 当您使用某些XML解析器从较大字符串返回属性/值作为子字符串时，会发生这种情况。 (A substring is only a limited view of the larger string). （子字符串只是较大字符串的有限视图）。

Try to put your strings in the map by doing something like this: 尝试通过以下操作将字符串放入地图中：

map.put(new String(key), new String(value));

Note that the GC then might get more work to do when you are populating the map, and this might not help you if you don't have that many substrings that are referencing larger strings. 请注意，当您填充地图时，GC可能会做更多的工作，如果您没有那么多子字符串引用较大的字符串，这可能对您没有帮助。

Answer 5

If you're really serious about this and you have time to spare, you can make your own implementation of the Map interface based on minimal perfect hashing 如果您对此真的很认真并且有足够的时间，则可以基于最小的完美哈希值来自己实现Map接口

If your keys are Strings, then there apparently is a map available for you here . 如果你的键是字符串，那么显然是为您提供的地图在这里。 I haven't tried it myself but it brags about reduced memory usage. 我自己还没有尝试过，但是它吹嘘减少内存使用。

Answer 6

You might give the Trove collections a shot. 您可以试一下Trove系列。 They advertise it as a more time and space efficient drop-in replacement for the java.util Collections. 他们宣传它是为了节省时间和空间，从而替代java.util Collections。

从HashMaps中删除未使用的已分配内存

问题描述

6 个解决方案

解决方案1
1 2011-05-10 18:27:57

解决方案2
0 2011-05-10 18:38:01

解决方案3
0 2011-05-10 18:43:37

解决方案4
0 2011-05-10 18:52:42

解决方案5
0 2011-05-10 18:57:12

解决方案6
0 2011-05-10 20:27:42

从HashMaps中删除未使用的已分配内存

问题描述

6 个解决方案

解决方案1 1 2011-05-10 18:27:57

解决方案2 0 2011-05-10 18:38:01

解决方案3 0 2011-05-10 18:43:37

解决方案4 0 2011-05-10 18:52:42

解决方案5 0 2011-05-10 18:57:12

解决方案6 0 2011-05-10 20:27:42

解决方案1
1 2011-05-10 18:27:57

解决方案2
0 2011-05-10 18:38:01

解决方案3
0 2011-05-10 18:43:37

解决方案4
0 2011-05-10 18:52:42

解决方案5
0 2011-05-10 18:57:12

解决方案6
0 2011-05-10 20:27:42