简体   繁体   中英

space optimize a large array with many duplicates

I have an array where the index doubles as 'identifier for a collection of items' and the content of the array is a group-number. The group numbers fall into a finite range from 0..N, where N << length_of_the_array. Hence every is entry will be duplicated large number of times. Currently I have to use 2 bytes to represent group number (which can be > 1000 but < 6500 ), which due to the duplicated nature ends up consuming a lot of memory.

Are there ways to space optimize this array as the complete array can get into multiple MBs in size. Appreciate any pointers toward relevant optimization algorithm/technique. FYI: The programming language im using is cpp.

Do you still want efficient random-access to arbitrary elements? Or are you thinking about space-efficient serialization of the index->group map?

If you still want efficient random access, a single array lookup is not bad. It's at worst a single cache miss. Well really, at worst a page fault, or more likely a TLB miss, but that's unlikely if it's only a couple MB).

A sorted and run-length encoded list could be binary-searched (by searching an array of prefix-sums of the repeat-counts), but that only works if you can occasionally sort the list to keep duplicates together.

If the duplicates can't be at least somewhat grouped together, there's not much you can do that allows random access.

Packed 12-bit entries are probably not worth the trouble, unless that was enough to significantly reduce cache misses. A couple multiply instructions to generate the right address, and a shift and mask instruction on the 16b load containing the desired value, is not much overhead compared to a cache miss. Write access to packed bitfields is slower, and isn't atomic, so that's a serious downside. Getting a compiler to pack bitfields using structs can be compiler-specific. Maybe just using a char array would be best.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM