简体   繁体   中英

Is it worth using a HashMap in this instance for speed?

I have a function that calls a distance function. The distance function computes Levenshtein distance algorithm between two input Strings. I'm trying to find the shortest distance between an inputed word (miss spelleed), and an english word to return (using this as a spellchecker), but I'm not sure if my HashMap is gaining me any ground in speed. The wordContainer is an array containing n words, does this make my look up time stuck in O(n)?

My Code Below

  private static String findClosestMatch(String word) {
        Map<Integer, String> wordAndDistanceMap = new HashMap<>();
        wordContainer.forEach(s -> wordAndDistanceMap.put(distance(s, word), s));
        return wordAndDistanceMap.get(Collections.min(wordAndDistanceMap.keySet()));
    }

While this has a reasonable time complexity it has a lot of over head doing work/creating objects you never need. I suggest having a simple loop.

private static List<String> findClosestMatch(String word) {
    int min = Integer.MAX_VALUE;
    List<String> minWords = new ArrayList<>();
    for (String s : wordContainer) {
        int dist = distance(s, word);
        if (dist < min) {
           min = dist;
           minWords.clear();
        }
        if (dist == min)
           minWords.add(s);
    }
    return minWords;
}

You have to calculate the Levenshtein distance from word to N other words. Calculation the distance N times is O(N).

The only way you can improve on O(N) is if you can devise a way to avoid having to calculate the distance O(N) times.

The HashMap can't help with that. What you would need to do (and I don't know if this is possible) is devise a way to avoid checking the distance for words that are "a long way away" from word .

Well if you need a faster method than this then you have to use an indexing mechanism.

What I can suggest you is Apache Lucene . It is an open source and widely used framework to index data. Also, there are some developed versions as Apache SOLR and Elastic Search built on the Lucene core. You can read more on the provided links.

After indexing your static list, or indexing the values that you have calculated over them, you can retrieve them in a very short time that you are currently desiring.

I hope this will help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM