简体   繁体   English

多图空间问题:番石榴

[英]Multimap Space Issue: Guava

In my Java code, I am using Guava's Multimap ( com.google.common.collect.Multimap ) by using this: 在我的Java代码中,我使用的是Guava的Multimap( com.google.common.collect.Multimap ):

 Multimap<Integer, Integer> Index = HashMultimap.create()

Here, Multimap key is some portion of a URL and value is another portion of the URL (converted into an integer). 这里,Multimap键是URL的一部分,值是URL的另一部分(转换为整数)。 Now, I assign my JVM 2560 Mb (2.5 GB) heap space (by using Xmx and Xms). 现在,我分配了我的JVM 2560 Mb(2.5 GB)堆空间(通过使用Xmx和Xms)。 However, it can only store 9 millions of such (key,value) pairs of integers (approx 10 million). 但是,它只能存储9百万这样的(键,值)整数对(大约1000万)。 But, theoretically (according to memory occupied by int ) it should store more. 但是,理论上(根据int占用的内存)它应该存储更多。

Can anybody help me, 有谁能够帮我,

  1. Why is Multimap using lots of memory? 为什么Multimap使用大量内存? I checked my code and without inserting pairs into the Multimap , it only uses 1/2 MB of memory. 我检查了我的代码,没有插入Multimap ,它只使用1/2 MB的内存。
  2. 2. 2。

Is there another way or home-baked solution to solve this memory issue? 有没有其他方法或自制的解决方案来解决这个内存问题? Means, Is there any way to reduce those object overheads as I want to store only int-int? 意味着,有没有办法减少那些对象开销,因为我只想存储int-int? In any other language ? 用其他语言? Or any other solution (home-baked preferred) to solve issue I faced, means DB based or something like that solution. 或者解决我遇到的问题的任何其他解决方案(自制首选),意味着基于数据库或类似的解决方案。

There's a huge amount of overhead associated with Multimap . Multimap相关的开销很大。 At a minimum: 最低限度:

  • Each key and value is an Integer object, which (at a minimum) doubles the storage requirements of each int value. 每个键和值都是一个Integer对象,它(至少)会使每个int值的存储要求加倍。
  • Each unique key value in the HashMultimap is associated with a Collection of values (according to the source , the Collection is a Hashset ). HashMultimap中的每个唯一键值都与一个值Collection相关联(根据Collection是一个Hashset )。
  • Each Hashset is created with default space for 8 values. 每个Hashset都使用8个值的默认空间创建。

So each key/value pair requires (at a minimum) perhaps an order of magnitude more space than you might expect for two int values. 因此,每个键/值对(至少)可能比两个int值所期望的空间大一个数量级。 (Somewhat less when multiple values are stored under a single key.) I would expect 10 million key/value pairs to take perhaps 400MB. (当多个值存储在单个密钥下时,会少一些。)我希望1000万个键/值对可能需要400MB。

Although you have 2.5GB of heap space, I wouldn't be all that surprised if that's not enough. 虽然你有2.5GB的堆空间,但如果这还不够,我也不会感到惊讶。 The above estimate is, I think, on the low side. 我认为上述估计偏低。 Plus, it only accounts for how much is needed to store the map once it is built. 此外,它只考虑了构建地图后存储地图所需的数量。 As the map grows, the table needs to be reallocated and rehashed, which temporarily at least doubles the amount of space used. 随着地图的增长,需要重新分配和重新分配表格,这暂时至少使用的空间量增加一倍。 Finally, all this assumes that int values and object references require 4 bytes. 最后,所有这些都假设int值和对象引用需要4个字节。 If the JVM is using 64-bit addressing, the byte count probably doubles. 如果JVM使用64位寻址,则字节数可能会翻倍。

Probably the simplest way to minimize the memory overhead would be to potentially mix Trove's primitive collection implementations (to avoid memory overhead of boxing) and Guava's Multimap , something like 可能最小化内存开销的最简单方法是潜在地混合Trove的原始集合实现(以避免装箱的内存开销)和Guava的Multimap ,类似于

SetMultimap<Integer, Integer> multimap = Multimaps.newSetMultimap(
  TDecorators.wrap(TIntObjectHashMap<Collection<Integer>>()),
  new Supplier<Set<Integer>>() {
    public Set<Integer> get() {
      return TDecorators.wrap(new TIntHashSet());
    }
  });

That still has the overhead of boxing and unboxing on queries, but the memory it consumes just sitting there would be significantly reduced. 这仍然有查询的装箱和拆箱的开销,但它只是坐在那里消耗的内存将大大减少。

It sounds like you need a sparse boolean matrix. 听起来你需要一个稀疏的布尔矩阵。 Sparse matrices / arrays in Java should provide pointers to library code. Java中的稀疏矩阵/数组应提供指向库代码的指针。 Then instead of putting (i, j) into the multimap, just put a 1 into the matrix at [i][j]. 然后,不要将(i,j)放入多图,只需将1放入[i] [j]的矩阵中。

You could use probably an ArrayListMultimap, which requires less memory than a HashMultimap, since ArrayLists are smaller than HashSets. 你可以使用一个ArrayListMultimap,它需要比HashMultimap更少的内存,因为ArrayLists小于HashSets。 Or, you could modify Louis's Trove solution, replacing the Set with a List, to reduce memory usage further. 或者,您可以修改路易斯的Trove解决方案,将Set替换为List,以进一步减少内存使用。

Some applications depend on the fact that HashMultimap satisfies the SetMultimap interface, but most don't. 一些应用程序依赖于HashMultimap满足SetMultimap接口的事实,但大多数不支持。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM