简体   繁体   English

什么是在映射中存储大量字符串的最节省内存的方法?

[英]What is the most memory efficient method of storing a large number of Strings in a map?

I want to store huge amounts of Strings in a Map<String, MagicObject> , so that the MagicObjects can be accessed quickly. 我想在Map<String, MagicObject>存储大量的Strings,以便可以快速访问MagicObjects There are so many entries to this Map that memory is becoming a bottleneck. 此映射的条目太多,以至于内存成为瓶颈。 Assuming the MagicObjects can't be optimized, what is the most efficient type of map I could use for this situation? 假设无法优化MagicObjects ,在这种情况下我可以使用的最有效的地图类型是什么? I am currently using the following: 我目前正在使用以下内容:

gnu.trove.map.hash.TCustomHashMap<byte[], MagicObject>

If your keys are long enough and have a lot of long enough common prefixes then you can save memory by using a trie (prefix tree) data structure. 如果您的密钥足够长并且具有足够长的通用前缀,那么您可以使用trie (前缀树)数据结构来节省内存。 Answers to this question point to aa couple of Java implementations of trie. 这个问题的答案指向了trie的几个Java实现。

为了开放您的想法,请考虑使用霍夫曼编码先将您的字符串压缩后再放入地图,只要您的字符串固定(字符串的数量和内容不变)即可。

I'm a little late to this party but this question came up in a related search and piqued my interest. 我参加这个聚会有点晚了,但是在相关搜索中提出了这个问题,激起了我的兴趣。 I don't usually answer Java questions. 我通常不回答Java问题。

There are so many entries to this Map that memory is becoming a bottleneck. 此映射的条目太多,以至于内存成为瓶颈。

I doubt it. 我对此表示怀疑。

For the storage of strings in memory to become a bottleneck you need an awfully large number of unique strings[1]. 为了使字符串在内存中的存储成为瓶颈,您需要大量的唯一字符串[1]。 To put things into perspective, I recently worked with a 1.8m word dictionary (1.8m unique english words) and they took up around 1.6MB in RAM at runtime. 为了使事情更直观,我最近使用了一个180万个单词的词典(180万个唯一的英语单词),它们在运行时占用了大约1.6MB的RAM。

If you used every word in the dictionary as a key you'll still only use 1.6MB of RAM[2] to store the keys, hence memory cannot be your bottleneck. 如果您将字典中的每个单词都用作键,那么您仍将仅使用1.6MB的RAM [2]来存储键,因此内存不会成为您的瓶颈。

What I suspect you are experiencing is the O(n^2) performance of string matching. 我怀疑您遇到的是字符串匹配的O(n ^ 2)性能。 By this I mean that as more keys are added performance slows down exponentially[3]. 我的意思是,随着添加更多密钥,性能将呈指数下降[3]。 This is unavoidable if you are using strings are keys. 如果您使用字符串作为键,这是不可避免的。

If you want to speed things up a bit, store each key into a hashtable that doesn't store duplicates and use the hash key as the key to your map. 如果您想加快速度,请将每个键存储到不存储重复项的哈希表中,并使用哈希键作为地图的键。

NOTES: 笔记:

[1] I'm assuming the strings are all unique or else you would not attempt to use them as a key into a map. [1]我假设字符串都是唯一的,否则您将不会尝试将它们用作映射的键​​。

[2] Even if Java uses 2 bytes per character, it still only comes to 3.2MB of memory, total. [2]即使Java每个字符使用2个字节,它总共仍然只有3.2MB的内存。

[3] It slows down even more if you choose the wrong data structure, such as an unbalanced binary tree, to store your values. [3]如果您选择错误的数据结构(例如,不平衡的二叉树)来存储值,则速度甚至会进一步降低。 I don't know how map stores values internally, but an unbalanced binary tree will have O(2^n) performance - pretty much the worst performance you can find. 我不知道map如何在内部存储值,但是不平衡的二叉树将具有O(2 ^ n)性能-几乎可以找到最差的性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM