简体   繁体   中英

Memory-efficient way to store large List<Map<String,String>> where many map entries are identical

I'm looking for a memory-efficient way to store tabular data typically consisting of about 150000 rows x 200 columns. The cell values are Strings with lengths somewhere in the range of 0-200 characters.

The data rows are initially generated by taking all possible combinations of rows from smaller tables. So while all rows are unique, the columns contain many copies of the same value. The data is not read-only. Some of the columns (typically up to 20 of the 200) get updated with values that depend on the values of other columns. And new columns (also about 20 I'd expect) with computed values are going to be added to the table.

The existing legacy code heavily depends on the data being stored in a List of Map<String, String> s that map column name to cell value. But the current implementation, an ArrayList<HashMap<String,String>> , is taking many gigabytes of memory.

I tried calling String.intern() on the keys and values that get inserted into the HashMap . That halved the memory footprint. But it still seems horribly inefficient to keep all those identical Map.Entry s around.

So I was wondering: Can you suggest a more memory-efficient data structure to somehow share the identical column values but that would allow me to keep the external List<Map<String, String>> interface the same?

We already have guava on the class path so using collections from guava is fine.

I have found GS-Collections to be much better suited for memory efficient Maps/Sets. They get around a lot of the overhead of storing map entry objects by using some clever tricks with arrays behind the scenes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM