简体   繁体   English

在这种情况下是否值得使用 HashMap 来提高速度?

[英]Is it worth using a HashMap in this instance for speed?

I have a function that calls a distance function.我有一个调用距离函数的函数。 The distance function computes Levenshtein distance algorithm between two input Strings.距离函数计算两个输入字符串之间的 Levenshtein 距离算法。 I'm trying to find the shortest distance between an inputed word (miss spelleed), and an english word to return (using this as a spellchecker), but I'm not sure if my HashMap is gaining me any ground in speed.我试图找到输入的单词(拼写错误)和要返回的英语单词(使用它作为拼写检查器)之间的最短距离,但我不确定我的 HashMap 是否在速度上获得了任何进展。 The wordContainer is an array containing n words, does this make my look up time stuck in O(n)? wordContainer 是一个包含 n 个单词的数组,这是否会使我的查找时间停留在 O(n) 中?

My Code Below我的代码如下

  private static String findClosestMatch(String word) {
        Map<Integer, String> wordAndDistanceMap = new HashMap<>();
        wordContainer.forEach(s -> wordAndDistanceMap.put(distance(s, word), s));
        return wordAndDistanceMap.get(Collections.min(wordAndDistanceMap.keySet()));
    }

While this has a reasonable time complexity it has a lot of over head doing work/creating objects you never need.虽然这具有合理的时间复杂度,但它有很多开销来做你永远不需要的工作/创建对象。 I suggest having a simple loop.我建议有一个简单的循环。

private static List<String> findClosestMatch(String word) {
    int min = Integer.MAX_VALUE;
    List<String> minWords = new ArrayList<>();
    for (String s : wordContainer) {
        int dist = distance(s, word);
        if (dist < min) {
           min = dist;
           minWords.clear();
        }
        if (dist == min)
           minWords.add(s);
    }
    return minWords;
}

You have to calculate the Levenshtein distance from word to N other words.您必须计算从word到 N 个其他单词的 Levenshtein 距离。 Calculation the distance N times is O(N).计算距离N次是O(N)。

The only way you can improve on O(N) is if you can devise a way to avoid having to calculate the distance O(N) times.改进O(N)的唯一方法是设计一种方法来避免计算距离O(N)次。

The HashMap can't help with that. HashMap对此HashMap What you would need to do (and I don't know if this is possible) is devise a way to avoid checking the distance for words that are "a long way away" from word .您需要做的(我不知道这是否可行)是设计一种方法来避免检查与word相距“很远”的word距离。

Well if you need a faster method than this then you have to use an indexing mechanism.好吧,如果您需要比这更快的方法,那么您必须使用索引机制。

What I can suggest you is Apache Lucene .我可以建议你是Apache Lucene It is an open source and widely used framework to index data.它是一个开源且广泛使用的数据索引框架。 Also, there are some developed versions as Apache SOLR and Elastic Search built on the Lucene core.此外,还有一些开发版本如Apache SOLRElastic Search构建在 Lucene 核心上。 You can read more on the provided links.您可以在提供的链接上阅读更多内容。

After indexing your static list, or indexing the values that you have calculated over them, you can retrieve them in a very short time that you are currently desiring.在为静态列表建立索引,或为通过它们计算的值建立索引后,您可以在很短的时间内检索它们,这是您当前所需的。

I hope this will help.我希望这将有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM