什么是在映射中存储大量字符串的最节省内存的方法？

Question

I want to store huge amounts of Strings in a Map<String, MagicObject> , so that the MagicObjects can be accessed quickly. 我想在Map<String, MagicObject>存储大量的Strings，以便可以快速访问MagicObjects 。 There are so many entries to this Map that memory is becoming a bottleneck. 此映射的条目太多，以至于内存成为瓶颈。 Assuming the MagicObjects can't be optimized, what is the most efficient type of map I could use for this situation? 假设无法优化MagicObjects ，在这种情况下我可以使用的最有效的地图类型是什么？ I am currently using the following: 我目前正在使用以下内容：

gnu.trove.map.hash.TCustomHashMap<byte[], MagicObject>

Answer 1

If your keys are long enough and have a lot of long enough common prefixes then you can save memory by using a trie (prefix tree) data structure. 如果您的密钥足够长并且具有足够长的通用前缀，那么您可以使用trie （前缀树）数据结构来节省内存。 Answers to this question point to aa couple of Java implementations of trie. 这个问题的答案指向了trie的几个Java实现。

Answer 2

为了开放您的想法，请考虑使用霍夫曼编码先将您的字符串压缩后再放入地图，只要您的字符串固定（字符串的数量和内容不变）即可。

Answer 3

I'm a little late to this party but this question came up in a related search and piqued my interest. 我参加这个聚会有点晚了，但是在相关搜索中提出了这个问题，激起了我的兴趣。 I don't usually answer Java questions. 我通常不回答Java问题。

There are so many entries to this Map that memory is becoming a bottleneck. 此映射的条目太多，以至于内存成为瓶颈。

I doubt it. 我对此表示怀疑。

For the storage of strings in memory to become a bottleneck you need an awfully large number of unique strings[1]. 为了使字符串在内存中的存储成为瓶颈，您需要大量的唯一字符串[1]。 To put things into perspective, I recently worked with a 1.8m word dictionary (1.8m unique english words) and they took up around 1.6MB in RAM at runtime. 为了使事情更直观，我最近使用了一个180万个单词的词典（180万个唯一的英语单词），它们在运行时占用了大约1.6MB的RAM。

If you used every word in the dictionary as a key you'll still only use 1.6MB of RAM[2] to store the keys, hence memory cannot be your bottleneck. 如果您将字典中的每个单词都用作键，那么您仍将仅使用1.6MB的RAM [2]来存储键，因此内存不会成为您的瓶颈。

What I suspect you are experiencing is the O(n^2) performance of string matching. 我怀疑您遇到的是字符串匹配的O（n ^ 2）性能。 By this I mean that as more keys are added performance slows down exponentially[3]. 我的意思是，随着添加更多密钥，性能将呈指数下降[3]。 This is unavoidable if you are using strings are keys. 如果您使用字符串作为键，这是不可避免的。

If you want to speed things up a bit, store each key into a hashtable that doesn't store duplicates and use the hash key as the key to your map. 如果您想加快速度，请将每个键存储到不存储重复项的哈希表中，并使用哈希键作为地图的键。

NOTES: 笔记：

[1] I'm assuming the strings are all unique or else you would not attempt to use them as a key into a map. [1]我假设字符串都是唯一的，否则您将不会尝试将它们用作映射的键。

[2] Even if Java uses 2 bytes per character, it still only comes to 3.2MB of memory, total. [2]即使Java每个字符使用2个字节，它总共仍然只有3.2MB的内存。

[3] It slows down even more if you choose the wrong data structure, such as an unbalanced binary tree, to store your values. [3]如果您选择错误的数据结构（例如，不平衡的二叉树）来存储值，则速度甚至会进一步降低。 I don't know how map stores values internally, but an unbalanced binary tree will have O(2^n) performance - pretty much the worst performance you can find. 我不知道map如何在内部存储值，但是不平衡的二叉树将具有O（2 ^ n）性能-几乎可以找到最差的性能。

什么是在映射中存储大量字符串的最节省内存的方法？

问题描述

3 个解决方案

解决方案1
4 2016-06-15 15:19:05

解决方案2
1 2016-06-15 15:34:47

解决方案3
-1 2018-04-04 20:54:45

什么是在映射中存储大量字符串的最节省内存的方法？

问题描述

3 个解决方案

解决方案1 4 2016-06-15 15:19:05

解决方案2 1 2016-06-15 15:34:47

解决方案3 -1 2018-04-04 20:54:45

解决方案1
4 2016-06-15 15:19:05

解决方案2
1 2016-06-15 15:34:47

解决方案3
-1 2018-04-04 20:54:45