简体   繁体   English

为什么我的O(NLogN)算法查找字谜的运行速度比我的O(N)算法快?

[英]Why does my O(NLogN) algorithm to find anagrams run faster than my O(N) algorithm?

I have a hashset of words that are all the same length. 我有一个长度都相同的单词哈希。 I want to find all of the anagrams that exist in this hashset and collect them into another hashset called anagrams. 我想找到此哈希集中存在的所有字谜,并将它们收集到另一个称为字谜的哈希集中。 Here is the loop that does that: 这是执行此操作的循环:

public HashSet<String> getUniqueAnagramsSlow(HashSet<String> paddedWords, int areAnagramsVersion){
    HashSet<String> anagrams = new HashSet<String>(); 
    Object[] paddedWordsArr = paddedWords.toArray();
    for(int i = 0; i < paddedWordsArr.length-1; i++){
        boolean foundAnagram = false;
        String wordOne = (String) paddedWordsArr[i];
        if(!anagrams.contains(wordOne)) 
            for(int j = i+1; j < paddedWordsArr.length; j++){
                String wordTwo = (String) paddedWordsArr[j];
                if(areAnagrams(wordOne, wordTwo, areAnagramsVersion)){
                    foundAnagram = true;
                    anagrams.add(wordTwo);
                }
            }
        if(foundAnagram){
            anagrams.add(wordOne);
        }
    }
    return anagrams;
}

My goals in writing this code is to see how different areAnagram() functions can affect run time. 我编写此代码的目的是了解areAnagram()函数如何影响运行时间。 I wrote two versions of areAnagrams(). 我写了两个版本的areAnagrams()。 One that sorts two strings and compares them and another that uses hashmaps to compare character frequency. 一个对两个字符串进行排序并比较它们,另一个使用哈希图比较字符频率。 Here they are: 他们来了:

public boolean areAnagramsVersionOne(String first, String second){
    char[] arr1 = first.toCharArray();
    Arrays.sort(arr1);
    String fSorted = new String( arr1 );
    char[] arr2 = second.toCharArray();
    Arrays.sort(arr2);
    String sSorted = new String(arr2);
    return fSorted.equals(sSorted);
}
public boolean areAnagramsVersionTwo(String first, String second){
    HashMap<String, Integer> wordOne = new HashMap<String,Integer>();
    for(int i = 0; i < first.length(); i++){
        String letOne = first.substring(i, i+1);
        if(wordOne.containsKey(letOne)){
            int letOneFreq = wordOne.get(letOne);
            wordOne.put(letOne, letOneFreq + 1);
        }else{
            wordOne.put(letOne, 1);
        }
    }
    for(int i = 0; i < second.length(); i++){
        String letTwo = second.substring(i, i+1);
        if(!wordOne.containsKey(letTwo))
            return false;
        int freq = wordOne.get(letTwo);
        if(freq == 0)
            return false;
        wordOne.put(letTwo, freq-1);
    }
    return true;
}

From my understanding, areAnagramsVersionOne() will run in NlogN time and the areAnagramsVersionTwo() will run in N time. 据我了解,areAnagramsVersionOne()将在NlogN时间运行,而areAnagramsVersionTwo()将在NlogN时间运行。 However, when I test these two versions of finding anagrams in my original loop, version two is is noticeably slower. 但是,当我在原始循环中测试这两个版本的查找字谜的版本时,第二版本的速度明显慢一些。 Why is this? 为什么是这样?

Thank you. 谢谢。

This is an example of how I test run time: 这是我如何测试运行时间的示例:

long startTime = System.currentTimeMillis();
getUniqueAnagramsSlow(words, 2);
long endTime = System.currentTimeMillis();
System.out.println("exec time: " + (endTime - startTime) );

As far as I know O(NlogN) is guaranteed to be greater than O(N) only for sufficiently large values of N, because at small values coefficients and constants that are not represented in O() notation are still relevant. 据我所知,仅对于足够大的N值,才可以保证O(NlogN)大于O(N),因为在小值处,O()表示法中未表示的系数和常数仍然很重要。 Consider 2 algorithms such that their cost is: 考虑2种算法,其成本为:

Algorithm 1 cost: 100*N: O(N) 算法1费用:100 * N:O(N)

Algorithm 2 cost: 10*NlogN: O(NlogN) 算法2费用:10 * NlogN:O(NlogN)

O(NlogN) > O(N) => 10*NlogN > 100*N => 10*logN > 100 => logN > 10 O(NlogN)> O(N)=> 10 * NlogN> 100 * N => 10 * logN> 100 => logN> 10

So in this case algorithm 2 will cost more than algorithm 1 when N > 2^10. 因此,在这种情况下,当N> 2 ^ 10时,算法2的成本将高于算法1。 For smaller values, algorithm 2 will be less costly, even if it is "less efficient" according to O() notation. 对于较小的值,即使根据O()表示法“效率较低”,算法2的开销也较小。

Read the wikipedia page for O() notation for more details. 阅读Wikipedia页面上的O()表示法以获得更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM