简体   繁体   English

巨大的字符串静态数组

[英]Huge Static Array of String

Is it a good idea to store words of a dictionary with 100.000 words in a static array of string. 将字典中有100.000个单词的单词存储在静态字符串数组中是个好主意。 I'm working on spellchecker and I thought that way would be faster. 我正在研究拼写检查器,我认为这样会更快。

Definitely its not a good idea to store so many strings as an array especially if you are using it for spell check which means you will have to search for and compare strings. 当然,将这么多字符串存储为数组绝对不是一个好主意,特别是如果您将其用于拼写检查,这意味着您必须搜索并比较字符串。 It would make it inefficient to search or compare a string through the array as it would always be a linear search 由于它将始终是线性搜索,因此在数组中搜索或比较字符串效率低下

You should generally prefer a Java Collections Framework class to a native Java array for anything non-trivial. 通常,对于任何不重要的事情,您都应该首选Java Collections Framework类而不是本机Java数组。 In this particular case, what you have is a Set<String> (since no words should appear more than once in the dictionary). 在这种特殊情况下,您拥有的是Set<String> (因为单词在词典中不应出现多次)。

A HashSet<String> offers constant time performance for the basic operations add , remove , and contains , and should work very well with String hashcode formula. HashSet<String>为基本操作addremovecontains提供恒定的时间性能,并且应与String哈希码公式配合使用非常好。

For larger dictionaries, you'd want to use more sophisticated data structures specialized for storing a set of strings (eg a trie ), but for 100K words, a HashSet should suffice. 对于较大的词典,您想使用专门用于存储一组字符串(例如trie )的更复杂的数据结构,但是对于100K个单词, HashSet应该足够。

See also 也可以看看

内存数据库技术(例如sqlite内存)这样的方法如何呢?这使您可以使用有效的查询而没有磁盘开销

I think 100 000 is not so large amount that search wolud be inefficent. 我认为100 000并不是很大,以至于搜索无效。 Of course it depends ... It would work nice if you are checking if a word exists in array - it's a linear complexity algorithm. 当然,这取决于...如果您要检查数组中是否存在单词,这会很好用-这是一种线性复杂度算法。 You can keep table ordered so you can use quicksort search algoritm and make it more efficent. 您可以使表格保持有序排列,以便可以使用quicksort搜索算法并使之更有效。

On the other hand - if you wold like to find, 5 most likely words (using N-gram method or something) you should consider using Lucene or other text database. 另一方面-如果您愿意查找5个最有可能的单词(使用N-gram方法),则应考虑使用Lucene或其他文本数据库。

Perhaps using an SQLite database would be more efficient ? 也许使用SQLite数据库会更有效? I think that's what firefox/thunderbird does for spell checking but I'm not entirely sure. 我认为这就是firefox / thunderbird进行拼写检查的方法,但我不确定。

You won't be able to store that amount of strings in a static variable. 您将无法在静态变量中存储该数量的字符串。 Java has a size limit for static code and even method bodies. Java对静态代码甚至方法主体都有大小限制。 Simply use a flatfile and read it upon class instanciation - Java is faster than most people think with those things. 只需使用一个平面文件并在类实例化时读取它-Java比大多数人认为的要快。

See Enum exeeding the 65535 bytes limit of static initializer... what's best to do? 请参见枚举超过了静态初始化程序的65535字节限制...最好怎么办? .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM