简体   繁体   English

有没有我可以下载java的字典?

[英]is there a dictionary i can download for java?

is there a dictionary i can download for java? 有没有我可以下载java的字典? i want to have a program that takes a few random letters and sees if they can be rearanged into a real word by checking them against the dictionary 我希望有一个程序,它需要一些随机字母,看看是否可以通过检查字典将它们重新排列成真正的单词

Is there a dictionary i can download for java? 有可以下载java的字典吗?

Others have already answered this... Maybe you weren't simply talking about a dictionary file but about a spellchecker? 其他人已经回答了这个问题......也许你不只是在谈论一个字典文件而是关于一个拼写检查器?

I want to have a program that takes a few random letters and sees if they can be rearranged into a real word by checking them against the dictionary 我希望有一个程序,它需要一些随机字母,并通过检查它们是否可以重新排列成一个真正的单词

That is different. 那是不同的。 How fast do you want this to be? 你想要多快这么快? How many words in the dictionary and how many words, up to which length, do you want to check? 字典中有多少单词,有多少单词,你要检查多长?

In case you want a spellchecker (which is not entirely clear from your question), Jazzy is a spellchecker for Java that has links to a lot of dictionaries. 如果你想要一个拼写检查器(你的问题并不完全清楚),Jazzy是一个Java拼写检查程序,可以链接到很多字典。 It's not bad but the various implementation are horribly inefficient (it's ok for small dictionaries, but it's an amazing waste when you have several hundred thousands of words). 这还不错,但是各种实现都非常低效(对于小字典来说这是好的,但是当你有几十万个单词时,这是一个惊人的浪费)。

Now if you just want to solve the specific problem you describe, you can: 现在,如果您只是想解决您描述的具体问题,您可以:

  • parse the dictionary file and create a map : (letters in sorted order, set of matching words) 解析字典文件并创建地图:( 按排序顺序排列的字母,匹配单词的集合)

  • then for any number of random letters: sort them, see if you have an entry in the map (if you do the entry's value contains all the words that you can do with these letters). 那么对于任意数量的随机字母:对它们进行排序,看看你是否在地图中有一个条目(如果你输入的值包含你可以用这些字母做的所有单词)。

    abracadabra : (aaaaabbcdrr, (abracadabra)) abracadabra:(aaaaabbcdrr,(abracadabra))

    carthorse : (acehorrst, (carthorse) ) carthorse :( acehorrst,(carthorse))

    orchestra : (acehorrst, (carthorse,orchestra) ) 管弦乐队:( acehorrst,(carthorse,orchestra))

etc... 等等...

Now you take, say, three random letters and get "hsotrerca", you sort them to get "acehorrst" and using that as a key you get all the (valid) anagrams... 现在你拿三个随机字母并获得“hsotrerca”,你将它们排序为“acehorrst”,然后用它作为关键,你得到所有(有效的)字谜......

This works because what you described is a special (easy) case: all you need is sort your letters and then use an O(1) map lookup. 这是有效的,因为你所描述的是一个特殊的(简单)案例:你需要的只是对你的字母进行排序,然后使用O(1)地图查找。

To come with more complicated spell checkings, where there may be errors, then you need something to come up with "candidates" (words that may be correct but mispelled) [like, say, using the soundex, metaphone or double metaphone algos] and then use things like the Levenhstein Edit-distance algorithm to check candidates versus known good words (or the much more complicated tree made of Levenhstein Edit-distance that Google use for its "find as you type"): 为了进行更复杂的拼写检查,可能存在错误,那么你需要一些东西来提出“候选人”(这些词可能是正确但拼写错误的)[比如说,使用soundex,metaphone或双metaphone algos]和然后使用诸如Levenhstein编辑距离算法之类的东西来检查候选人与已知的好词(或者由Levenhstein编辑距离组成的更复杂的树,Google将其用于“在您键入时查找”):

http://en.wikipedia.org/wiki/Levenshtein_distance http://en.wikipedia.org/wiki/Levenshtein_distance

As a funny sidenote, optimized dictionary representation can store hundreds and even millions of words in less than 10 bit per word (yup, you've read correctly: less than 10 bits per word) and yet allow very fast lookup. 作为一个有趣的旁注,优化的字典表示可以存储数百甚至数百万字,每个字不到10比特(是的,你已经正确读过:每个字少于10比特)并且允许非常快速的查找。

Dictionaries are usually programming language agnostic. 字典通常是编程语言不可知的。 If you try to google it without using the keyword "java", you may get better results. 如果您尝试在不使用关键字“java”的情况下进行谷歌搜索,则可能会获得更好的结果。 Eg free dictionary download gives under each dicts.info . 例如,每个dicts.info下的免费字典下载

OpenOffice dictionaries are easy to parse line-by-line. OpenOffice词典很容易逐行解析。

You can read it in memory (remember it's a lot of memory): 你可以在内存中读取它(记住它有很多内存):

List words = IOUtils.readLines(new FileInputStream("dicfile.txt")) (from commons-io ) List words = IOUtils.readLines(new FileInputStream("dicfile.txt")) (来自commons-io

Thus you get a List of all words. 因此,您将获得所有单词的List Alternatively you can use the Line Iterator, if you encounter memory prpoblems. 或者,如果遇到内存问题,可以使用Line Iterator。

Check out - http://sourceforge.net/projects/test-dictionary/ , it might give you some clue 退房 - http://sourceforge.net/projects/test-dictionary/ ,它可能会给你一些线索

I am not sure if there are any such libraries available for download! 我不确定是否有任何此类库可供下载! But I guess you can definitely digg through sourceforge.net to see if there are any or how people have used dictionaries - http://sourceforge.net/search/?type_of_search=soft&words=java+dictionary 但我猜你绝对可以通过sourceforge.net来查看是否有人或者如何使用词典 - http://sourceforge.net/search/?type_of_search=soft&words=java+dictionary

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM