简体   繁体   English

java中具有良好(插入、迭代)性能特征的原始多图

[英]primitive multimap in java with good (insert, iteration) performance characteristics

I'm doing some heavy processing (building inverse indices) using ints/ longs in Java.我正在使用 Java 中的 ints/longs 进行一些繁重的处理(构建逆索引)。

I've determined that (un)boxing of standard java.collections maps takes a big portion of the total processing time.我已经确定标准 java.collections 地图的(取消)装箱占用了总处理时间的很大一部分。 (as compared to a similiar implementation using arrays, which I can't use due to memory constraints). (与使用数组的类似实现相比,由于内存限制我无法使用)。

I'm looking for a fast 3rd-party implementation (or any implementation at all for that matter) that could support the following structure:我正在寻找可以支持以下结构的快速 3rd 方实现(或任何与此相关的实现):

Map with characteristics:具有特征的地图:

-keys in the map are sparse (+/- 10.000.000 keys in range [0,2^64] -values are always appended to the end of the list -fast insert (amortized O(1) if possible) -fast iteration in key-order. - 映射中的键是稀疏的(范围 [0,2^64] 中的 +/- 10.000.000 个键 - 值总是附加到列表的末尾 - 快速插入(如果可能的话,分摊 O(1)) - 快速迭代按键顺序。

I've looked at trove, fastutil, etc. but couldn't find a multimap implementation using primitives (only normal maps)我看过 trove、fastutil 等,但找不到使用原语的多图实现(仅法线贴图)

any help is appreciated.任何帮助表示赞赏。

Thanks, Geert-Jan谢谢,吉尔特-简

您是否考虑过使用原始 long -> Object-map 和原始 int-set 作为值自己实现多部分?

What about Google collections library?谷歌收藏库怎么样? http://code.google.com/p/google-collections/ http://code.google.com/p/google-collections/

Depending on cardinality can use specific types of object Primitive Int/Long To where value:根据基数可以使用特定类型的对象 Primitive Int/Long To where value:

  • if (size == 1) => Long (can dedup if have huge number of duplicates); if (size == 1) => Long(如果有大量重复,可以去重);

  • if (size <= 13) => LogSet (16 elements in array); if (size <= 13) => LogSet(数组中有 16 个元素);

  • if (size > 13) => SparceLongBitSet.如果(大小> 13)=> SparceLongBitSet。 using eg 16 long as payload per block (can even reuse array)使用例如 16 长作为每个块的有效载荷(甚至可以重用数组)

for int can consider 26 as desision point.对于 int 可以将 26 视为决定点。 If performance is very important do benchmarking eg SparseLongBitSet only with specific sharding/block sizing.如果性能非常重要,请仅使用特定的分片/块大小进行基准测试,例如 SparseLongBitSet。 For memory locality consider reusing same memory blocks (eg arrays of 2M).对于内存局部性,请考虑重用相同的内存块(例如 2M 的数组)。

Last drop: Insted of Object consider useing index to payload (eg offheap pointer) and use static methods (Flightweith like) to operate on payload.最后一滴:Insted of Object 考虑使用索引到有效载荷(例如堆外指针)并使用静态方法(Flightweith 之类)对有效载荷进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM