简体   繁体   中英

Why does my O(NLogN) algorithm to find anagrams run faster than my O(N) algorithm?

I have a hashset of words that are all the same length. I want to find all of the anagrams that exist in this hashset and collect them into another hashset called anagrams. Here is the loop that does that:

public HashSet<String> getUniqueAnagramsSlow(HashSet<String> paddedWords, int areAnagramsVersion){
    HashSet<String> anagrams = new HashSet<String>(); 
    Object[] paddedWordsArr = paddedWords.toArray();
    for(int i = 0; i < paddedWordsArr.length-1; i++){
        boolean foundAnagram = false;
        String wordOne = (String) paddedWordsArr[i];
        if(!anagrams.contains(wordOne)) 
            for(int j = i+1; j < paddedWordsArr.length; j++){
                String wordTwo = (String) paddedWordsArr[j];
                if(areAnagrams(wordOne, wordTwo, areAnagramsVersion)){
                    foundAnagram = true;
                    anagrams.add(wordTwo);
                }
            }
        if(foundAnagram){
            anagrams.add(wordOne);
        }
    }
    return anagrams;
}

My goals in writing this code is to see how different areAnagram() functions can affect run time. I wrote two versions of areAnagrams(). One that sorts two strings and compares them and another that uses hashmaps to compare character frequency. Here they are:

public boolean areAnagramsVersionOne(String first, String second){
    char[] arr1 = first.toCharArray();
    Arrays.sort(arr1);
    String fSorted = new String( arr1 );
    char[] arr2 = second.toCharArray();
    Arrays.sort(arr2);
    String sSorted = new String(arr2);
    return fSorted.equals(sSorted);
}
public boolean areAnagramsVersionTwo(String first, String second){
    HashMap<String, Integer> wordOne = new HashMap<String,Integer>();
    for(int i = 0; i < first.length(); i++){
        String letOne = first.substring(i, i+1);
        if(wordOne.containsKey(letOne)){
            int letOneFreq = wordOne.get(letOne);
            wordOne.put(letOne, letOneFreq + 1);
        }else{
            wordOne.put(letOne, 1);
        }
    }
    for(int i = 0; i < second.length(); i++){
        String letTwo = second.substring(i, i+1);
        if(!wordOne.containsKey(letTwo))
            return false;
        int freq = wordOne.get(letTwo);
        if(freq == 0)
            return false;
        wordOne.put(letTwo, freq-1);
    }
    return true;
}

From my understanding, areAnagramsVersionOne() will run in NlogN time and the areAnagramsVersionTwo() will run in N time. However, when I test these two versions of finding anagrams in my original loop, version two is is noticeably slower. Why is this?

Thank you.

This is an example of how I test run time:

long startTime = System.currentTimeMillis();
getUniqueAnagramsSlow(words, 2);
long endTime = System.currentTimeMillis();
System.out.println("exec time: " + (endTime - startTime) );

As far as I know O(NlogN) is guaranteed to be greater than O(N) only for sufficiently large values of N, because at small values coefficients and constants that are not represented in O() notation are still relevant. Consider 2 algorithms such that their cost is:

Algorithm 1 cost: 100*N: O(N)

Algorithm 2 cost: 10*NlogN: O(NlogN)

O(NlogN) > O(N) => 10*NlogN > 100*N => 10*logN > 100 => logN > 10

So in this case algorithm 2 will cost more than algorithm 1 when N > 2^10. For smaller values, algorithm 2 will be less costly, even if it is "less efficient" according to O() notation.

Read the wikipedia page for O() notation for more details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM